https://0x41.cf/reversing/2021/07/21/reversing-x86-and-c-code-for-beginners.html
Before I got into reverse engineering, executables always seemed like black magic to me. I always wondered how stuff worked under the hood, and how binary code is represented inside .exe files, and how hard it is to modify this ‘compiled code’ without access to the original source code.
But one of the main intimidating hurdles always seemed to be the assembly language, it’s the thing that scares most people away from trying to learn about this field.
That’s the main reason why I thought of writing this straight-to-the-point article that only contains the essential stuff that you encounter the most when reversing, albeit missing crucial details for the sake of brevity, and assumes the reader has a reflex of finding answers online, looking up definitions, and more importantly, coming up with examples/ideas/projects to practice on.
The goal is to hopefully guide an aspiring reverse engineer and arouse motivation towards learning more about this seemingly elusive passion.
Note: This article assumes the reader has elementary knowledge regarding the hexadecimal numeral system, as well as the C programming language, and is based on a 32-bit Windows executable case study - results might differ across different OSes/architectures.
After writing code using a compiled language, a compilation takes place (duh), in order to generate the output binary file (an example of such is an .exe file).
Compilers are sophisticated programs which do this task. They make sure the syntax of your ugly code is correct, before compiling and optimizing the resulting machine code by minimizing its size and improving its performance, whenever applicable.
As we were saying, the resulting output file contains binary code, which can only be ‘understood’ by a CPU, it’s essentially a succession of varying-length instructions to be executed in order - here’s what some of them look like:
These instructions are predominantly arithmetical, and they manipulate CPU registers/flags as well as volatile memory, as they’re executed.
A CPU register is almost like a temporary integer variable - there’s a small fixed number of them, and they exist because they’re quick to access, unlike memory-based variables, and they help the CPU keep track of its data (results, operands, counts, etc.) during execution.
It’s important to note the presence of a special register called the FLAGS
register (EFLAGS
on 32-bit), which houses a bunch of flags (boolean indicators), which hold information about the state of the CPU, which include details about the last arithmetic operation (zero: ZF
, overflow: OF
, parity: PF
, sign: SF
, etc.).
CPU registers visualized while debugging a 32-bit process on x64dbg, a debugging tool.