You must be logged in to post messages.
Please login or register

AoW1 Mod Packs
Moderated by Enginerd, Ziggurat Mason

Hop to:    
Topic Subject: The Basics of Assembly Reverse Engineering: A Crash Course
posted 04-05-21 03:39 PM EDT (US)   
I was supposed to write this months ago, but better late than never, right?

Disclaimer: It is recommended that you have some experience with programming or scripting and a basic understanding of "computer logic" in general. In fact, for this primer I will assume that you are somewhat familiar with an object-oriented programming language such as C++, Python, Delphi (which AoW was written in), Java etc. because otherwise I'd need to start with Ada and Steve.

Furthermore, I'm assuming you're using PE Explorer or a similarly capable disassembler and have figured out how to disassemble AoWEPACK.dpl, which is really the main engine library of AoW. You should also have a means of modifying AoWEPACK.dpl, either manually with a hex editor or by using a script.

(Also it might be a good idea to make backups before messing around with any files.)

The following tools and resources will likely come in handy over the course of this crash course:
  • A hexadecimal calculator: calc.exe (set to "programmer")
  • A quick assembler/disassembler: (make sure to set to 32 bit)
  • A command line hex editor: (feel free to use a different one)
  • A visual explanation of x86-32 register bytes:
  • A list of jump instructons and flags:

    With that out of the way...


    Assembly (I will call it ASM from here on) is essentially a very low-level programming language that describes exactly what a program does. Unlike higher-level (more abstract) programming languages such as the ones mentioned earlier, ASM directly corresponds to the actual machine code which a program executes at runtime. Because of this, ASM can essentially be considered a more readable representation of machine code.

    Because ASM is a very low-level language consisting of only a few basic building blocks, ASM code can get very confusing for a complex program. Furthermore, machine code that has been generated by a compiler is generally optimised for runtime rather than human readability. Luckily, the Delphi Borland compiler used for AoW produces machine code that is much closer to the original code than most other compilers, and hence more readable, but understanding ASM is still a challenge.

    The first thing you need to understand about ASM is that everything is a memory operation. In a typical programming language you have abstract storage locations called variables where you can store values for later use. Such variables don't exist in ASM; instead you have only registers and the stack. I'll talk more about the stack (insert spooky noises here) another time; right now we'll just focus on registers.


    There are different types of registers, but the most important registers are general purpose registers or GPRs. These GPRs are essentially what you use in lieu of variables. There are 4 GPRs called A, B, C, and D. Each of these registers can hold up to 32 bits of information. You may be familiar with terms such as "32-bit architecture" or "x86-32". Simply put, this refers to the size of the registers, which in our case is 32 bits.

    32 bits is 4 bytes, which is important because bytes, not bits, are the building blocks of machine code. You can think of bytes as the underlying grid structure of a program, and in fact this is how hex editors typically display machine code. The visual representation of each byte is a hexadecimal two-digit number, with the first digit representing the upper 4 bits and the second digit representing the lower 4 bits. For example, if we store the decimal number 10 in a single byte, its representation is 0A. Accordingly, if we store the same value in a 32-bit register, its representation is 0000000A. To distinguish such hexadecimal representations from decimal values, they are usually written 0x0000000A or 0000000Ah, which not only gives information about the value itself, but also about the memory used to store it. As will be explained later, 0x0A and 0x0000000A are not necessarily identical.

    I should point out here that bytes do not actually have concrete values, as different instructions can interpret bytes in different ways. For example, the byte 0xFF can be interpreted as the decimal value 255, the decimal value -1, the UTF-8 character , or something else entirely. However, when it comes to reverse-engineering AoW, you can usually assume that the disassembler automatically chooses the correct representation for each case.

    This is especially important when it comes to integer values that consist of several bytes. As humans we are accustomed to reading numbers in big-endian format, i.e. with the most significant digit to the left and the least significant digit to the right. When looking at the AoW machine code, however, you will often find that integers are stored in little-endian format, with the least significant byte first and the most significant byte last. So our decimal number 10 will be stored in 4 bytes as 0x0A, 0x00, 0x00, and 0x00, which the decompiler automatically interprets as 0x0000000A.

    Of course, we don't always need 4 bytes of storage, as most smaller numbers can be stored in a single byte. With GPRs it's actually possible to access certain bytes directly, and in fact many instructions chosen by the compiler do so, so it's important to recognise and understand them.

    Back in 8-bit days, each GPR had exactly one byte and all was well, but with the advent of 16-bit there came a need to reference each byte of a register independently, and so the 16-bit GPRs were split into lower and higher bytes. For example, if we store the decimal value 10 in the 16-bit register A, the lower byte AL will have the value 0x0A, while the higher byte AH will have the value 0x00. The combination of both AL and AH is called AX (the X stands for extended, I guess). In 32-bit architecture, we can also use the entire 32-bit register A, which is then called EAX (the E stands for extended, I guess). The same of course also goes for the B, C, and D registers; in practice however, you will pretty much only need and encounter the lowest byte and the full 32-bit register, i.e. AL, EAX, BL, EBX, CL, ECX, DL, and EDX.

    Two bytes are called a word, and four bytes (e.g. EAX) are called a dword (double word). You can find the term dword occasionally in ASM code and documentation, where it simply means 32-bit, e.g. a DWORD PTR is a 32-bit pointer (more on pointers in the future). In the AoW code, and in fact in most 32-bit programs, the vast majority of values you deal with are either dwords or single bytes.


    When a program is executed, the operating system loads not only the executable file itself into memory, but also all its dependencies (libraries), and essentially creates one large virtual file that contains all the machine code the program ever needs. Each byte of a file then exists in memory, where it has a virtual address. These virtual addresses are the numbers displayed to the left in the main window of your disassembler.

    In a 32-bit program, addresses are always dwords. For example, if you look at AoWEPACK.dpl in PE Explorer and scroll to the top, you'll see that the address of the very first byte is 0x55701000. For reverse engineering, it's important to understand how compiled binary files (i.e. executables and libraries) map onto the virtual address space. To this end, a disassembler will display information about the virtual address space at the beginning of each file segment.

    Right above the first address 0x55701000 is a block of meta information labelled Code Section, of which two values are of particular interest to us: Virtual Address, which is 0x55701000, and Pointer to RawData, which is 0x00000400. We subtract the second from the first and get 0x55700C00, which is the difference between the virtual address and the file offset for all bytes in the code segment. This means that if you want to modify e.g. the byte at 0x55764D0D with a hex editor, you would actually need to look for the byte at 0x0006410D.

    If you scroll down to the virtual address 0x558E8000, you'll find another block of meta information, this one labelled Data Section. This is where the data segment starts, and using the same method we can calculate the difference between virtual address and file offset for the data segment, which is 0x55701200.

    For the moment, this is all we need to know about addresses. We'll look at addresses again in more detail once we've learned about pointers and the stack.

    [This message has been edited by And G (edited 06-19-2021 @ 10:35 PM).]

  • Replies:
    posted 04-06-21 06:15 PM EDT (US)     1 / 8  
    Great stuff, thanks! I still don't understand the assembly code much, would be nice to develop a list of how to input functions like 'mov' 'retn' 'inc' etc, and what they do.

    Anyhow, just posting to break up the doublepost restriction (hopefully), carry on!
    posted 04-07-21 10:03 AM EDT (US)     2 / 8  

    Now that we know what registers are, let's take a look at instructions, which, as the name suggests, are what we use to tell the processor what to do. Each ASM instruction consists of a short mnemonic and often one or more operands. If there are multiple operands, the first operand is generally the most important one and where any result is stored. For a simple yet important example, MOV AL, 0Ah sets the value of AL (the lowest byte of the A register) to 0x0A. Here, the mnemonic MOV stands for "move" as in "move the value 10 to AL". Switching the operands around is not possible here as that would mean to "move the value of AL to 10" which makes no sense. What's valid is e.g. MOV AL, BL which copies the value of BL to AL.

    Another important instruction is RETN, which requires no operands. It is used to "return" to a previous point (address) in the code and is mainly found at the end of functions (more on that in the future).

    The MOV instruction is especially useful for reverse engineering as it is frequently found where in the source code a variable is assigned an integer value. For example, when you look at the function called AoWE.TDurationAbility.GetTotalDuration, you'll see this:
    MOV EAX, 00000003h

    Clearly, EAX is where the duration of an ability is (at least temporarily) stored, and by changing the numeric value we can change the duration of certain abilities in the game.

    With a disassembler, we can also see how the instructions correspond to the actual machine code. The MOV instruction is B8 03 00 00 00, of which B8 is the so-called opcode, i.e. the non-data part of the instruction, while 03 00 00 00 is the data part, corresponding to 00000003h in ASM. In this case, the opcode already contains information about one operand; if instead of EAX we would have set ECX to 3 then the opcode would have been B9 instead of B8, with the data bytes unchanged.

    B8 03 00 00 00 takes up 5 bytes, while the RETN instruction is C3, which is 1 byte. So that's 6 bytes in total. Also, 32-bit compilers like to pad functions to 32-bit blocks, which is why the function AoWE.TDurationAbility.GetTotalDuration is followed by 2 "nonsense" bytes that contain a pointless instruction that does nothing (in this case MOV EAX, EAX) and is never even read by the processor because of the previous RETN instruction. What this all means is that if you wanted to rewrite the function in a slightly more complex way, you wouldn't have to worry about fitting it into 6 bytes, as you really have 8 bytes of memory space to work with.

    As we've already noted, 4 of the 5 bytes of the MOV instruction are used for storing the decimal value 3 as 0x00000003. This seems rather inefficient. We can't just change the ASM to MOV EAX, 03h because you can't store a byte value in EAX, a dword register. But we can absolutely store a byte value in AL, the lowest byte of EAX. So we can change the ASM to MOV AL, 03h, which is the machine code B0 03, and save 3 bytes!

    The issue with this is that we've now only overwritten the lowest byte of EAX. If the higher bytes are already set to zero then everything will work out fine. If they are not, however, then we might get a bug that could, for example, manifest itself as abilities having a duration of 54669824 turns instead of 3 turns. Luckily, there's an instruction for zeroing a register: The XOR instruction.

    The XOR instruction is actually part of a set of logic gate instructions that also includes AND, NOT, and OR. XOR stands for "exclusive or" and is used for bitwise comparison of two operands, e.g. XOR EAX, EBX where identical bits are set to 0 and differing bits are set to 1. This is hardly ever used for anything except zeroing registers, though. When used with identical operands, e.g. XOR EAX, EAX, all bits of both operands are of course identical and therefore all bits of EAX are set to zero. In fact this use of the XOR instruction is so common that nowadays most processors are optimised to interpret XOR with identical operands as "just zero everything" rather than doing any actual bitwise comparison. And XOR EAX, EAX only requires 2 bytes, which means that zeroing EAX and then setting AL to 3 requires 4 bytes in total, instead of the 5 bytes for setting EAX to 3 directly. A byte saved is a byte earned.

    A very useful single-byte intruction is INC, which increases an operand by 1. This means that instead of MOV EAX, 00000001h (again requiring 5 bytes) we can write XOR EAX, EAX and INC EAX for only 3 bytes in total with the same result. Similarly, DEC decreases an operand by 1, ADD adds operands 2 to operand 1, and SUB subtracts operand 2 from operand 1. Then there's MUL and IMUL which multiply operands. For example, MUL EAX, 0x02 is identical to ADD EAX, EAX. (The difference between MUL and IMUL is in whether the operands are signed, which we'll get to later.)

    Then there's a special instruction called LEA (load effective address) which is technically an instruction for calculating memory addresses, but can also be used to multiply registers by values that are larger than a square number by one. For the time being I'll spare you the details, and you shouldn't ever use it in this fashion, but basically if you see LEA EAX, [EAX + EAX * 2] that means to multiply EAX by 3, and if you see LEA EAX, [EAX + EAX * 4] that means to multiply EAX by 5. So if you see something like this:
    LEA EAX, [EAX + EAX * 4]

    Then now you know that this is equivalent to MOV EAX, 0000000Ah. While this particular example is overly convoluted and made up, similarly confusing applications of arithmetic instructions can frequently be found in compiled binaries due to runtime optimisation.

    [This message has been edited by And G (edited 04-09-2021 @ 11:12 AM).]

    posted 04-07-21 11:21 AM EDT (US)     3 / 8  
    It's nice to have the right jargon to describe this stuff. So if I have it right:

    Registers: eax, ebx, ecx, edx, (and al, bl, cl ,dl) where data is stored.
    Instructions: mov, imul, inc, dec, retn and the like.
    Opcodes: the hexadecimal form of the instructions, which you need to know to write them using the hex editor.Would be nice to have a list of these.

    Changing B8 to B0 to access al instead of eax, because al can store values using 1 byte instead of 4, thus saving space for more code, sounds very useful! Even with XOR EAX, EAX taking up 2 of the 3 saved bytes.

    (just posting in case doublepost is still prohibited after 1st thread post)

    [This message has been edited by Ziggurat Mason (edited 04-07-2021 @ 06:36 PM).]

    posted 04-08-21 12:17 PM EDT (US)     4 / 8  
    So far, this seems to be the most human-readable asm course I have ever seen. Thanks a lot!

    AoW was written in Delphi? Wow, and they said Pascal was dead long before 1999 :-).
    posted 04-08-21 06:01 PM EDT (US)     5 / 8  

    Integer values can of course not only be positive, but also negative. As I've already pointed out, bytes don't actually have concrete numeric values, but rather can be interpreted in various ways depending on the instruction. In x86-32, signed integers are typically stored in a fashion known as two's complement which is easier to show than to explain:
    0x00 = 0
    0x01 = 1
    0x02 = 2
    0x7E = 126
    0x7F = 127
    0x80 = -128
    0x81 = -127
    0x82 = -126
    0xFE = -2
    0xFF = -1
    0x00 = 0

    The same concept, but with a dword:
    0x00000000 = 0
    0x00000001 = 1
    0x00000002 = 2
    0x7FFFFFFE = 2147483646
    0x7FFFFFFF = 2147483647
    0x80000000 = -2147483648
    0x80000001 = -2147483647
    0x80000002 = -2147483646
    0xFFFFFFFE = -2
    0xFFFFFFFF = -1
    0x00000000 = 0

    Easy, right? The beauty of this system is that for many arithmetic operations it doesn't matter whether integers are signed or not. E.g. if you increase 0xC0 by 1 you'll always get 0xC1, regardless of whether 0xC0 is interpreted as 192 or -64.


    You can't talk about signed integers without talking about overflow, so let's talk about overflow. Simply put, signed integer overflow occurs whenever the result of an arithmetic operation exceeds the bounds of a signed integer. For example, 0x7F increased by 1 is 0x80, but when interpreted as a signed integer, this becomes -128.

    There's also unsigned integer overflow, which is essentially the same thing but for integers interpreted as unsigned. For example, 0xFF increased by 1 is 0x00, because a single byte can't store a larger unsigned integer than 255.

    Signed and unsigned integer overflow are not mutually exclusive. Consider the following ASM:
    MOV EAX, 00000040h
    ADD AL, E0h

    Here, the added value is large enough for the result to exceed both the bounds of a single-byte signed integer (-128 to 127) and a single-byte unsigned integer (0 to 255). Since we're adding this value to AL rather than to EAX, it is of course not possible to make use of any of the higher bits of EAX as an arithmetic carry. So after these two instructions, the value of EAX is 0x00000020. If we replace the second instruction with ADD EAX, 000000E0h then the value of EAX becomes 0x00000120 instead, and no overflow occurs.

    An integer overflow may indicate unintended behaviour, but this is not generally the case. In unsigned arithmetic, signed integer overflow is irrelevant, while in signed arithmetic, unsigned integer overflow is irrelevant. However, information about integer overflow can be useful for conditional instructions. This is where flags come in.


    In addition to the general purpose registers, an x86-32 processor also has a status register that contains information bits known as flags. A subset of these flags, known as status flags, are updated by certain operations (primarily arithmetic operations such as INC, ADD, etc.) and of these status flags, four are of particular importance:
    The zero flag (Z flag or ZF) is set depending on whether the result of the last operation was zero. For example, if 0x02 is decreased by 1, the ZF is set to 0 (false) because 0x01 is not zero; if 0x01 is decreased by 1, the ZF is set to 1 (true) because 0x00 is zero.

    The sign flag (S flag or SF) is set depending on whether the result of the last operation was negative when interpreted as a signed integer. For example, if 0x7E is increased by 1, the SF is set to 0 (false) because 0x7F is not negative; if 0x7F is increased by 1, the SF is set to 1 (true) because 0x80 is negative.

    The overflow flag (O flag or OF) is set depending on whether the last operation resulted in a signed integer overflow. For example, if 0x81 is decreased by 1, the OF is set to 0 (false) because the signed overflow boundary between 0x7F and 0x80 wasn't crossed; if 0x80 is decreased by 1, the OF is set to 1 (true) because the signed overflow boundary between 0x7F and 0x80 was crossed.

    The carry flag (C flag or CF) is set depending on whether the last operation resulted in an unsigned integer overflow. For example, if 0x01 is decreased by 1, the CF is set to 0 (false) because the unsigned overflow boundary between 0xFF and 0x00 wasn't crossed; if 0x00 is decreased by 1, the CF is set to 1 (true) because the unsigned overflow boundary between 0xFF and 0x00 was crossed.

    So now that we know about flags, what can we actually do with them?

    [This message has been edited by And G (edited 04-12-2021 @ 06:19 PM).]

    posted 04-12-21 06:17 PM EDT (US)     6 / 8  

    In higher-level programming languages you have if-then-else statements but these don't exist in ASM; instead, there are conditional jumps. The basic principle is very simple: First, an operation is performed on a register, updating the status flags. Then, depending on whether certain flags are set, the processor either jumps to a different address, or not. Typically, such a jump simply skips a few lines of code.

    There are two important instructions that are used specifically to set flags:
    TEST performs a bitwise AND operation of two operands while only setting the flags and discarding the actual result, meaning it doesn't actually affect its operands. You don't actually need to understand bitwise operations at this point; what's important is that TEST EAX, EAX will set the zero flag iff the value of EAX is 0. In compiled binaries, this is the primary use of the TEST instruction.

    CMP performs a subtraction (equivalent to the SUB instruction) and discards the result while keeping the flags. This is useful because the status flags tell you everything you need to know about the relation between the two operands.

    Let's look at an example:
    JZ 01h

    What this does is check whether EAX is 0, and if so it skips the next byte of code. The JZ (jump if zero) instruction has as its operand the number of bytes to jump over, and this is where it gets slightly tricky because while in the machine code jump instructions actually have relative offsets, unlike what I wrote here assemblers and disassemblers instead work with concrete addresses. While this does make ASM more readable, it also creates a disparity between the ASM and the machine code.

    To illustrate what I mean by this, let's look at the same example with machine code and correct ASM code side by side. To the very left is the virtual address, which in this case I have chosen to start at 0x00000100, then the machine code, and then the ASM:
    0x00000100 85 C0 TEST EAX, EAX
    0x00000102 74 01 JZ L00000105
    0x00000104 48 DEC EAX
    0x00000105 L00000105:
    0x00000105 40 INC EAX

    As you can see, there's an unnamed label at 0x00000105, which is the address the JZ instruction jumps to. You can also see in the machine code that one byte is skipped, which I previously represented as JZ 01h, as there weren't any addresses.

    Let's look at another example, this time using the CMP instruction:
    0x00000200 39 D8 CMP EAX, EBX
    0x00000202 7F 02 JG L00000206
    0x00000204 89 D8 MOV EAX, EBX
    0x00000206 L00000206:
    0x00000206 40 INC EAX

    Here, the JG (jump if greater) instruction jumps if EAX is strictly greater than EBX. The jump is farther than in the previous example because the skipped instruction is a two-byte instruction, but otherwise everything is pretty much the same.

    It goes without saying that the program will crash if you jump to somewhere in the middle of a multi-byte instruction. In this case a disassembler will also not be able to correctly disassemble the machine code, and will instead display a bunch of nonsense ASM. If this happens to your code, check your jump offsets!

    You can also jump without specifying a condition. The simple JMP instruction works exactly like e.g. JZ but without checking any flags first.

    Here's an overview of the jump instructions you need to know:
    JMP = "jump"
    JZ = "jump if zero"
    JNZ = "jump if not zero"
    JG = "jump if greater" (assumes signed integers)
    JL = "jump if less" (assumes signed integers)
    JGE = "jump if greater or equal" (assumes signed integers)
    JLE = "jump if less or equal" (assumes signed integers)
    JA = "jump if above" (assumes unsigned integers)
    JB = "jump if below" (assumes unsigned integers)
    JAE = "jump if above or equal" (assumes unsigned integers)
    JBE = "jump if below or equal" (assumes unsigned integers)

    Since conditional jumps operate by checking flags (or combinations of flags) and flags can have different implications depending on context, conditional jumps actually have multiple mnemonics to better describe each use case. For example, the JZ (jump if zero) instruction is also known as the JE (jump if equal) instruction, and the JAE (jump if above or equal) instruction is also known as the JNC (jump if not carry) instruction. I've linked a list of jump instructions in the first post where you can look up any unknown conditional jumps.


    Both conditional and unconditional jumps can jump backwards, but the number of bytes to jump is always calculated from the end of the jump instruction, even for backwards jumps. The main purpose of backwards jumps is to create loops, i.e. segments of code that are executed repeatedly until a certain exit condition is met. Here's a very simple example loop that illustrates the typical use of backwards jumps:
    0x00000300 31 C0 XOR EAX, EAX
    0x00000302 BB 0A 00 00 00 MOV EBX, 0000000Ah
    0x00000307 L00000307:
    0x00000307 39 D8 CMP EAX, EBX
    0x00000309 7D 03 JGE L0000030E
    0x0000030B 40 INC EAX
    0x0000030C EB F9 JMP L00000307
    0x0000030E L0000030E:
    0x0000030E 31 DB XOR EBX, EBX

    What this does is set EAX to 0 and EBX to 10, and then increase EAX by 1 until it is at least as great as EBX (the exit condition). The last instruction that zeroes EBX is not part of the loop; I just put it there to have another address outside the actual loop. If you run this code, at the end EAX will be 0x0000000A.

    If you have a loop with an exit condition that is never met, the program will crash.

    [This message has been edited by And G (edited 04-12-2021 @ 06:24 PM).]

    posted 04-15-21 10:01 PM EDT (US)     7 / 8  

    Let's talk about jump distance. Short jumps take a (signed) single byte as the operand and are therefore limited to jumps as far as 127 bytes forward or 128 bytes (but really just 126 bytes) backwards. By contrast, long jumps take a dword as the operand and can effectively jump anywhere. Both conditional and unconditional jump instructions have long and short versions. For example, the machine code to skip one byte with a JMP instruction can be either EB 01 or E9 01 00 00 00. As you can see, the opcodes are different, and the long jump is a 5 byte instruction.

    Conditional long jumps are rarely necessary unless you have some really convoluted loops, but unconditional long jumps are very useful because they allow you to move parts of your code elsewhere if you don't have enough space available at the original location. If you've looked closely at AoWEPACK.dpl you'll have noticed that there's a lot of zeroed bytes that aren't used for anything. Remember the function AoWE.TDurationAbility.GetTotalDuration from earlier? Let's pretend we wanted to expand this function with some complicated calculations which don't fit into the 8 available bytes. The solution is to use a long jump to jump to some unused section of code, and simply do our calculations there. This is called hooking. So how do we actually do that?


    The first step is to figure out where to put the new code. The available memory space actually begins right after the entry point at 0x5580BDE8. There's a jump instruction there that tells the processor to go jump somewhere else, and after that it's free code real estate. But let's go slightly further down so we can pick a nice round number, 0x5580BE00. This will be the start of our code, i.e. the address we will jump to.

    Next, we'll need to set up the start of our jump. The function AoWE.TDurationAbility.GetTotalDuration starts at 0x55764D0C, so that's where we'll put the jump instruction. To figure out the distance of the jump, we take the destination address and subtract the origin address and the length of the jump instruction. So 0x5580BE00 - 0x55764D0C - 0x00000005 = 0x000A70EF which is our jump distance. The machine code at address 0x55764D0C is therefore E9 EF 70 0A 00. These 5 bytes happen to replace the 5 bytes of the original MOV EAX, 00000003h instruction, but really that doesn't matter. There's also a trailing RETN instruction that's no longer used; I'd overwrite it with a single-byte NOP (no operation) instruction (opcode 90) but that isn't really needed either, as this part of the code will never be executed.

    Alright. So far, we've replaced the original code with a jump, but we haven't actually written the new code yet. In fact, there really isn't anything to do other than setting EAX to a set value, so let's do that. So at address 0x5580BE00, we simply write MOV EAX, 0000000Ah and then RETN, which is the same as the original function, just with a longer duration. If you've actually made these changes with a hex editor, you should now be able to run your game and have some abilities (or rather status effects) last for 10 turns instead of only 3.


    However, there's a better way of doing this, namely by replacing the JMP instruction with a CALL instruction. The CALL instruction is very useful because it does the same as the JMP instruction (it jumps to a different point in the code) but in a way that retains information about the original position. This is what the RETN instruction is about: After using CALL to jump somewhere, RETN returns (hence the name) to the original jump instruction. This is great when you just want to add code in the middle of an existing function rather than rewrite the function completely, since you just need one CALL and a RETN rather than two JMP instructions.

    Let's rewrite our new code for the CALL instruction. The original JMP instruction can remain largely unchanged; we just need to replace the first byte E9, which is the (long) JMP opcode, with E8, which is the CALL opcode. The code at 0x5580BE00 can remain entirely unchanged; we already have a RETN at the end, but of course this RETN now serves a new purpose, as it returns us to where we jumped from. This means that following our CALL instruction, i.e. at 0x55764D11, there has to be a RETN instruction that corresponds to the RETN in the original code.

    When modifying AoWEPACK.dpl in this way, it's important to keep track of any added code so that you don't accidentally overwrite it later.


    So far, we've only done a few surgical edits to AoWEPACK.dpl, and it's possible to do these with a simple hex editor. Once you start working on a larger mod, however, it makes sense to employ a batch script that essentially serves as a compiler. The idea is to always start with an unmodifieded AoWEPACK.dpl, and use the script to generate a new AoWEPACK.dpl that contains your changes. This way, whenever something goes wrong, you can simply fix your script and run it again.

    In fact I use two scripts for this; one that handles all the file operations, and one that handles the actual hex editing. The first script (let's call it Mod.cmd) only involves some very basic batch scripting; here's a streamlined version that will suffice for the time being and will be expanded upon later on. It requires that you have a file called AoWEPACK.bac that is an unmodified AoWEPACK.dpl.
    @echo off

    REM initialise
    del AoWEPACK.dpl
    del AoWEPACK.tmp
    copy AoWEPACK.bac AoWEPACK.tmp

    REM patch
    call Patch.cmd

    REM finalise
    move /y AoWEPACK.tmp AoWEPACK.dpl
    start AoW.exe beatrix

    The actual patch script consists only of a list of bytes that will be modified. For this we need a command line utility that does the actual modifications; I use a program called hex.exe which I found who knows where years ago. I have linked it in my first post, but feel free to use any other utility with the same functionality.

    A typical line of Patch.cmd might look like this:
    hex AoWEPACK.dpl 0x00097013 0x31 > nul

    The first part is the name of the utility, in my case hex.exe. Following are the parameters for hex.exe according to the syntax hex.exe uses; these are (in order) the name of the file, the offset of the byte to modify, and the value to set it to. The > nul is to stop hex.exe from printing a message for each byte edit.

    Furthermore, it's a good idea to add comments to your script so that at some point in the future you can look at it again and get a vague idea of what the hell you were trying to do there. In a batch script, there are two ways of commenting:
    REM this is a normal comment
    hex AoWEPACK.dpl 0x00097013 0x31 > nul &:: this is an inline comment

    Using this format, we can now write the script for our patch:
    REM set default ability duration to 10 turns

    REM AoWE.TDurationAbility.GetTotalDuration
    hex AoWEPACK.dpl 0x0006410C 0xE8 > nul &:: hook
    hex AoWEPACK.dpl 0x0006410D 0xEF > nul &::
    hex AoWEPACK.dpl 0x0006410E 0x70 > nul &::
    hex AoWEPACK.dpl 0x0006410F 0x0A > nul &::
    hex AoWEPACK.dpl 0x00064110 0x00 > nul &::
    hex AoWEPACK.dpl 0x00064111 0xC3 > nul &:: retn

    REM hooked
    hex AoWEPACK.dpl 0x0010B200 0x31 > nul &:: zero eax
    hex AoWEPACK.dpl 0x0010B201 0xC0 > nul &::
    hex AoWEPACK.dpl 0x0010B202 0xB0 > nul &:: set al to 10
    hex AoWEPACK.dpl 0x0010B203 0x0A > nul &::
    hex AoWEPACK.dpl 0x0010B204 0xC3 > nul &:: retn

    If you did all this, you now have a batch script that will modify AoWEPACK.dpl to increase the duration of certain abilities. I haven't actually bothered to test it, but it should work.

    One of the most likely errors to make is messing up the distances of jumps/calls. If you disassemble the modified file and navigate to your edit, you can check whether the jump destination is correct.

    Another likely error is to mess up the counting of addresses for byte edits by mixing up decimal and hexadecimal or forgetting that certain numbers exist. There are hex editing utilities that allow modifying of an entire byte stream by defining a a single starting address, but I found that this method is actually more error prone since you still need to be aware of the actual length of your edits. This is why I prefer defining each byte edit separately, even though it involves more typing. Your mileage may vary.

    It's also a good idea to copy your machine code to a quick disassembler (see initial post) to verify that it is correctly assembled. In Notepad++ you can make rectangular selections by pressing Alt, so just copy all byte edits and paste them to the disassembler. If it produces garbage then you need to fix your machine code.

    Whatever you do, always thoroughly test whether your edits produce the desired effect in the game, and make sure there aren't any unintended side effects.

    This concludes the basics of how to set up scripts for modifying binaries. We'll revisit this topic in the future once we've learned more about how libraries work.
    posted 06-19-21 10:26 PM EDT (US)     8 / 8  

    Since we have a limited number of GPRs to work with, we run into problems when there are lots of values we'd like to store. This is where the stack comes in.

    Simply put, the stack is where data used by the program is stored, but for the moment we'll only concern ourselves with storing values from GPRs. There are two instructions for this purpose: The PUSH instruction "pushes" a dword value onto the stack, and the POP instruction "pops" the value at the top of the stack (actually the bottom, but never mind that right now) from the stack and puts it into a register. For a very basic example, consider this code:

    The first instruction puts the value of EAX onto the stack, and the second instruction takes that value from the stack and puts it into EBX. Since the PUSH instruction doesn't modify the register itself, at the end of the code, EAX and EBX have the same value, and the code is functionally identical to MOV EBX, EAX. Simple, right?

    With each PUSH and POP operation, the size of the stack is increased respectively decreased. As long as each PUSH operation has its corresponding POP operation, this isn't an issue. However, if you have a function where more data is pushed onto the stack than popped from the stack, the stack will eventually overflow and your program will crash. So when you write your own code, always make sure the number of added PUSH operations is exactly identical to the number of added POP operations.

    A common use of the stack is to temporarily free up GPRs by storing their values for later retrieval. You may have noticed that many longer functions in AoWEPACK.dpl have roughly this structure:
    actual function stuff here

    This simply temporarily frees up the registers B, C, and D for the function to use. Note that the order in which registers are popped is of course inverse to the order in which they were pushed, so that each value ends up in the same register it was originally stored in. When writing your own code, whenever you need to use a register that you're not 100% certain isn't being used by the program in some other way, always push it onto the stack and pop it once you no longer need it.

    Occasionally, you can find similar push/pop blocks even in the middle of functions if registers need to be freed up again. In this case the pop instruction is often only a few lines down from the corresponding push instruction. Note that as per the initial example, just because you pushed a certain register doesn't mean you have to pop that same register, since the stack can also be used to move values between registers.

    While pushing registers is the most common and important push operation, you can in fact push pretty much anything onto the stack, with the limitation that everything you push (and pop) has to be a dword. For example, you can PUSH 0000000Ah and POP EAX to set EAX to 10. What you cannot do is PUSH AL because AL is a byte, not a dword.

    One thing that's very important to understand is that PUSH and POP are not the only instructions that modify the size of the stack. Remember how I said earlier that the CALL instruction retains information about the original position of a jump that can then be used by the RETN instruction to return to that very position? This is because CALL pushes an address onto the stack and RETN pops it. So if you push a value onto the stack, and then call a function, you cannot pop the value you have just pushed before you've terminated the function with RETN instruction. If you need to temporarily free up registers, you need to do both the pushing and the popping within the same scope, either inside a function or outside of it.


    So far, we have only concerned ourselves with general purpose registers, but there are a number of other registers that are used frequently, one of them being the stack pointer. In x86-32, this stack pointer is called ESP, and you can (although you generally shouldn't) use it in pretty much the same way as any GPR such as EAX. For example, MOV EAX, ESP copies the value of ESP to EAX just as you would expect.

    What makes ESP special is that it is at all times used to store the address of the top (bottom) of the stack, and with each operation that modifies the size of the stack, ESP is updated accordingly. That's how the processor knows where to push and pop values in the first place. So unless you know exactly what you're doing, you should never modify the value of ESP. Realistically, there's only one case where modifying ESP makes sense anyway, and that's the following operation:
    ADD ESP, 00000004h

    I said earlier that the top of the stack is actually the bottom, and that's because the stack expands downwards, meaning that if the most recent PUSH operation added a value at the address 0x0087C208, then the next one will add a value at 0x0087C204, the next at 0x0087C200, then at 0x0087C1FC, and so on. And of course "the address of the most recently pushed value" is nothing else than ESP. So by adding 4 to ESP, we decrease the size of the stack by one dword. In other words, this is the same as popping, only that instead of storing the popped value in a GPR, we simply discard it.

    No, you will never actually need this. But it illustrates how the stack pointer works, and understanding the stack pointer is critical for many advanced modifications.

    So what can we actually use ESP for?


    It's no coincidence that in x86-32, both addresses and registers are dwords. After all, one of the uses of registers is to store addresses. When first explaining addresses I mentioned the term pointer, and simply put a pointer is really just anything that contains an address, such as the stack pointer ESP. General purpose registers can of course also be pointers, e.g. if you move the value of ESP to EAX, then EAX effectively becomes a pointer.

    Of course registers always hold binary values, and the difference between an address and data lies not in the value itself, but in what we do with it. And the obvious use of a memory address is to access the value that is stored there. In programming lingo this is called dereferencing.

    Dereferencing a pointer is such a basic operation that we don't even need any new instructions, only new syntax, because dereferencing is indicated by square brackets:
    MOV EAX, [ESP]

    Rather than copying the value stored in ESP to EAX, which is of course what would happen without square brackets, this code looks up the address stored in ESP and copies the dword stored at that address to EAX. This is similar to the POP instruction, but doesn't modify the stack in any way.

    We can do more than accessing the top (bottom) dword of the stack, though:
    MOV EAX, [ESP + 4]

    This instruction dereferences not ESP itself, but rather ESP + 4, which is the address not of the last value that was pushed onto the stack, but of the value that was pushed onto the stack before that. Provided you keep track of what you push and in which order, this allows you to easily access the original value of any GPR you have temporarily freed up.

    Instead of doing inline arithmetics, you can also store the address in a GPR first:
    ADD EAX, 00000004h
    MOV EAX, [EAX]

    This approach is generally advisable for more sophisticated calculations.

    It is also possible to do the inverse, i.e. instead of looking up a memory address and copying the value stored there to a GPR, you can also copy the value stored in a GPR to a memory address:
    MOV [ESP + 4], EAX

    This of course overwrites any previous value stored there. Also, you don't necessarily have to use a GPR, as you can also write data directly to memory:
    MOV [ESP + 4], 0000000Ah

    When dereferencing pointers, some disassemblers like to display additional information. Instead of the code above, PE Explorer would display this:
    MOV DWORD PTR [ESP + 4], 0000000Ah

    This information can simply be ignored as it tells us nothing new. We can see that 0x0000000A is a dword, and anything inside square brackets is always a pointer.

    (Note: There's one exception to this, namely the already mentioned instruction LEA, which uses square brackets for some pointless reason. Whenever you see LEA, just pretend the brackets aren't there.)

    Using what we have learned so far, we can write some code that does nothing whatsoever:
    ADD EBX, 00000004h
    MOV EAX, [EAX]
    MOV EBX, [ESP]
    ADD ESP, 00000008h

    If you don't understand why after executing this code all registers and the stack are in exactly the same state as before, please review this post until you do.

    [This message has been edited by And G (edited 06-19-2021 @ 11:01 PM).]

    Age of Wonders 2 Heaven » Forums » AoW1 Mod Packs » The Basics of Assembly Reverse Engineering: A Crash Course
    You must be logged in to post messages.
    Please login or register
    Hop to:    
    Age of Wonders 2 Heaven | HeavenGames