fsg_unpacker

A nearly functional tool for statically unpacking FSG 1.0 packed binaries -- this is a personal project I decided to take on after starting the Practical Malware Analysis book. Yes the book is a bit outdated, but the content is still relevant and good. The first chapter introduces a challenge that involves analyzing a binary packed with fsg1.0. After doing the challenges for a bit, I wasn't satisfied with the fact that it seems the only way to unpack an FSG packed binary was to do it dynamically. Surely a tool out there exists to unpack FSG packed binaries statically, right? If there is, I didn't find one. Thus, I decided to make my own.

This project is nearly complete. I have entirely reversed the functionality of the depacking stub and have figured out how to pull the decompressed/depacked data out of the original packed binary and place the unpacked contents into a new binary that is free of the FSG depacking stub -- all of this done statically without the need to run the FSG packed binary! Any roadblocks are described in my write up, but the jist of it is that I need to correct some errors I have made with creating the PE header from scratch and then the tool should be good to go!

VERY IMPORTANT NOTE I WANT YOU TO READ

Windows defender will pick up just about any FSG packed binary as malware. The samples I provide with this ARE NOT MALWARE. I provide the source code so you can build and pack them yourself as well if you want. Windows will likely see them, toss them out, and screech at you about the binaries being "malware". They are NOT.

Building

You can build the project in two ways:

Open the Visual Studio solution and build.
Run the build.bat script with no arguments. Running the script with "clean" as the argument will delete the build folder. Running with "samples" as the argument will build all the sample code so you can have original executables to compare against the packed binaries and the binaries unpacked by the tool.

Usage

To use the tool:

fsg_unpacker.exe <input_filename> [output_filename]_

Input filename is required, output is optional. If no output file name is given, it defaults to unpacked.exe.

Write-Up: How I Reversed FSG 1.0

What you are going to read here is almost entirely the step-by-step process I took while working through reverse engineering the FSG 1.0 depacking stub functionality and developing the tool to unpack it statically. I rarely went back and corrected any of my notes, and this is on purpose. When reversing, sometimes I'm on point, sometimes I back track, sometimes I'm just plain wrong, but I wanted to keep detailed notes of each step that I took so that future me and others can see what exactly I did and what path I took to get there. Hopefully others can learn from my mistakes, my successes, and maybe even my methodologies I used to for this project. With that being said, know that not everything I write is 100% accurate -- sometimes I make assumptions, write those assumptions down as if they are truth (such as saying something will "always" or "never" happen) and then I will learn more information later on and realize what I thought/assumed/said was not quite right. Live and learn, yeah? Have fun parsing through my notes :D

Initial Recon

I found at https://defacto2.net/f/b224796 that FSG features aPLib compression. Perhaps I can use the aPLib library to decompress what was compressed. I still need to know more about exactly how it works. Lets throw it in Ghidra.

Upon opening the binary in Ghidra, entry looks like the following:

entry    
00405000 bb d0 01        MOV        EBX,DAT_004001d0               
            40 00
00405005 bf 00 10        MOV        EDI,DAT_00401000               
            40 00
0040500a be 00 40        MOV        ESI,DAT_00404000               
            40 00

LAB_0040500f                                  
0040500f 53              PUSH       EBX=>DAT_004001d0              
00405010 e8 0a 00        CALL       FUN_0040501f                   
            00 00
00405015 02 d2           ADD        DL,DL
00405017 75 05           JNZ        LAB_0040501e
00405019 8a 16           MOV        DL,byte ptr [ESI]=>DAT_00404000
0040501b 46              INC        ESI
0040501c 12 d2           ADC        DL,DL

LAB_0040501e                                  
0040501e c3              RET

There are a few things that immediately stand out:

There are three registers that have addresses moved into them -- EBX, EDI, ESI. I'll keep my eye out for these to make sure I don't miss something important.
Only one value is being pushed onto the stack prior to the call -- EBX. That suggests that EBX is the only parameter to the function. I don't know of any calling conventions that use EDI or ESI so I'm assuming they are not parameters to the function.
Since I know this binary is packed using FSG 1.0 (atleast based on what PEiD told me), it seems to me that FUN_0040501f is likely what contains or sets off the logic responsible for unpacking the binary. This also means that the data in EBX is likely important to the function.

With that in mind, lets see what data resides at the address 0x004001d0:

DAT_004001d0                        XREF[2]:     entry:00405000(*), 
                                                 entry:0040500f(*)  
004001d0 04              ??         04h
004001d1 04              ??         04h
004001d2 05              ??         05h
004001d3 04              ??         04h
004001d4 01              ??         01h
004001d5 00              ??         00h
004001d6 62              ??         62h    b                 ?  ->  00402162
004001d7 21              ??         21h    !
004001d8 40              ??         40h    @
004001d9 00              ??         00h
004001da 02              ??         02h
004001db 00              ??         00h
...

I can't seem to glean much information from what is here just by looking at it, but it does seem like there is some data here. I am going to jump into looking at FUN_0040501f -- perhaps it can give me some more insight into what DAT_004001d0 is used for.

The start of `FUN_0040501f`

Lets look at the start of the function:

FUN_0040501f                        XREF[1]:     entry:00405010(c)  
0040501f fc              CLD
00405020 b2 80           MOV        DL,0x80

LAB_00405022                        XREF[1]:     00405029(j)  
00405022 a4              MOVSB      ES:EDI,ESI
00405023 6a 02           PUSH       0x2
00405025 5b              POP        EBX

LAB_00405026                        XREF[2]:     00405048(j), 0040508e(j)  
00405026 ff 14 24        CALL       dword ptr [ESP]
00405029 73 f7           JNC        LAB_00405022
...

Whats happening here?

CLD -- Clearing the direction flag DF (https://www.felixcloutier.com/x86/cld). Essentially this means that string operations will INCREMENT ESI and EDI rather than decrement them.
MOV DL,0x80 -- This is putting 0x80 into DL which is the lower 8-bits of EDX. Since we don't know what EDX will be at this point, we just need to keep this in mind.
MOVSB ES:EDI,ESI -- This instruction is copying a byte located in ESI and placing it in EDI. ESI and EDI were set in the entry, so this is copying bytes from 0x00404000 and placing them in 0x00401000. This is likely paired with some kind of loop, and looking a bit further down (or just looking at the XREF) we can see that there is in fact a jump to this instruction.
PUSH 0x2 -- Pushing 0x2 onto the stack
POP EBX -- Removing 0x2 from the stack and placing it into EBX

Call [ESP] -- An address stored on the stack is being called. This one is interesting because there has been very little stack interaction so we can likely figure out what it is calling just by walking through what we have seen.

Lets look at every time we have interacted with the stack:

0040500f PUSH EBX
00405010 CALL FUN_0040501f
00405023 PUSH 0x2
00405025 POP EBX

With this, we can build what the stack looks like:

// PUSH EBX
            ┌──────────────────────┐
ESP ──►     │ 0x004001d0           │
            └──────────────────────┘

// CALL FUN_0040501F
            ┌──────────────────────┐
ESP ──►     │ 0x00405015 (ret addr)│
            ├──────────────────────┤
            │ 0x004001d0           │
            └──────────────────────┘

// PUSH 0x2
            ┌──────────────────────┐
ESP ──►     │ 0x2                  │
            ├──────────────────────┤
            │ 0x00405015 (ret addr)│
            ├──────────────────────┤
            │ 0x004001d0           │
            └──────────────────────┘

// POP EBX
            ┌──────────────────────┐
ESP ──►     │ 0x00405015 (ret addr)│
            ├──────────────────────┤
            │ 0x004001d0           │
            └──────────────────────┘

If I am following this correctly then call [ESP] is going to turn into call 0x00405015. How cool!

JNC LAB_00405022 -- JNC means "jump if not carry" (https://www.felixcloutier.com/x86/jcc). Essentially, if the carry flag CF equals 0, then we jump. The carry flag is set in a number of ways, but typically it is set in the following cases:
- Arithmetic that results in an overflow, (e.g. when the result of an ADD is unable to fit in the register).
- Shifting or rotating bits that results in a bit getting pushed out of the register.
So essentially, if this carry flag is NOT set due to some other instructions, then we will jump back up to the MOVSB instruction and repeat.

With that, we have almost reversed this first little loop in the binary. Hopefully its not totally useless! To complete this portion though, I need to look at the code found at 0x00405015 since this gets called during the loop. Lets look at that:

00405015 02 d2           ADD        DL,DL
00405017 75 05           JNZ        LAB_0040501e
00405019 8a 16           MOV        DL,byte ptr [ESI]=>DAT_00404000                  = 83h
0040501b 46              INC        ESI
0040501c 12 d2           ADC        DL,DL
                        LAB_0040501e                                    XREF[1]: 00405017(j)  
0040501e c3              RET

Easy enough, let's step by step it again:

ADD DL,DL -- Double DL; Remember this got set to 0x80 at 0x00405020 at the start of the loop. Doubling this means that the resultant value will be 0x100 which is too large to fit in DL, so DL will truncate to 0x00 and then set the CF flag to 1. Keep this in mind as I imagine it is directly important to the JNC instruction at 0x00405029. As another note, ZF (the zero flag) will be set to 1. I discovered this becomes pertinent to the following instruction after starting to look into it.
JNZ LAB_0040501e -- Jump if not zero (https://www.felixcloutier.com/x86/jcc). This will take the jump only if the ZF flag is set to 0. If the prior instruction produces a result of 0x00, then this jump will NOT be taken. If it produces a value other than 0x00, then this will jump directly to the RET instruction located at 0x0040501e.
MOV DL, [ESI] -- This one is interesting as it will take the value found at ESI and then copy it into DL. We need to remember that ESI at this point has already incremented because of the MOVSB located at 0x00405022. So logically, the 0th byte of ESI is moved into EDI during MOVSB. Then the values are incremented. Then at this point we are taking the 1st byte from ESI and placing it in DL. For what purpose... I know not... yet.
INC ESI -- Okay, so we are going to increment ESI AGAIN. This means (just from taking a brief glance at the rest of the instructions above), that ESI is now on the 2nd byte (I'm working off 0 indexing here, just as a reminder). This implies that the 0th, 2nd, 4th, 6th, etc. bytes are going to be moved using MOVSB into EDI while the others are used by DL for some currently unknown purpose.
ADC DL,DL -- Add with carry (https://www.felixcloutier.com/x86/adc). This adds the destination, source, and carry flag and stores the result inside the destination. Documentation specifies that "the state of the CF flag represents a carry from a previous addition." Essentially the value of CF is derived from what it is prior to this instruction being run. In our case, this is whatever value CF is set to after running ADD DL,DL at 0x00405015. Note that this will also set CF after it executes.
RET -- Yup, we're finally returning to 0x00405029 which is the jump instruction that determines whether we loop or not.

WOW! Alot to keep track of here -- and this is just the start! To summarize, so far, the code essentially clears DF so MOVSB increments rather than decrements. It then loads DL with 0x80 and enters a loop where it:

Copies one byte from ESI to EDI.
Calls 0x00405015.
Performs ADD DL,DL and, if that result is zero, "reloads" DL from the next byte and does ADC DL,DL, setting CF whenever DL overflows.
Returns to the loop and tests CF. If there’s no carry, it goes back and copies the next byte using MOVSB. If there is a carry, the jump fails and we fall through to the next chunk of code.

Just looking at this makes me feel like we are looking at some kind of compression algorithm which would make sense based on what I noted at the beginning -- FSG uses aPLib for compression.

Well... come to find out, I believe this is, in fact, the "depacking" function from aPLib.

aPLib `depack.asm`

Thankfully, we can go look at the source for aPLib. After downloading the library, I decided to look at the source a bit, and what do you know -- there is a depack.asm file. Upon opening the code, I immediately noticed something that looked... familiar:

aP_depack_asm:
    ; aP_depack_asm(const void *source, void *destination)

    _ret$  equ 7*4
    _src$  equ 8*4 + 4
    _dst$  equ 8*4 + 8

    pushad

    mov    esi, [esp + _src$] ; C calling convention
    mov    edi, [esp + _dst$]

    cld
    mov    dl, 80h
    xor    ebx,ebx

literal:
    movsb
    mov    bl, 2
nexttag:
    call   getbit
    jnc    literal
...

Wait... doesn't this look like the lines that we just reversed? the only difference is that in our code, there is no xor ebx,ebx and no mov bl, 2. Instead we have push 0x2 and pop ebx. I imagine this is due to the compiler making this decision, or perhaps the version of aPLib used to make fsg 1.0 just had different code at the time. Regardless, the functionality there is the same. This is awesome! Using the source, I was able to put names to variables and labels inside of Ghidra. I believe this find has saved me a substantial amount of time.

With that out of the way, it seems that a large portion of the depacking stub is just this compression code from aPLib. I believe from just a quick glance that the rest of the depacking stub is part that repairs the IAT and such so that the unpacked binary functions properly.

The next portion of `FUN_0040501f`

The following is the raw, uneditted disassembly from Ghidra of the next chunk of instructions we need to reverse:

LAB_004050a0                        XREF[1]:     0040505d(j)  
004050a0 5f              POP        EDI
004050a1 5b              POP        EBX
004050a2 0f b7 3b        MOVZX      EDI,word ptr [EBX]
004050a5 4f              DEC        EDI
004050a6 74 08           JZ         LAB_004050b0
004050a8 4f              DEC        EDI
004050a9 74 13           JZ         LAB_004050be
004050ab c1 e7 0c        SHL        EDI,0xc
004050ae eb 07           JMP        LAB_004050b7

LAB_004050b0                        XREF[1]:     004050a6(j)  
004050b0 8b 7b 02        MOV        EDI,dword ptr [EBX + 0x2]
004050b3 57              PUSH       EDI
004050b4 83 c3 04        ADD        EBX,0x4

LAB_004050b7                        XREF[1]:     004050ae(j)  
004050b7 43              INC        EBX
004050b8 43              INC        EBX
004050b9 e9 51 ff        JMP        LAB_0040500f
            ff ff

This to me seems quite manageable! Let's dig in!!!

Well... where do we start? Looking at the cross references for each of the labels, there is only 1 label that has a cross-reference from the aPLib code snippet -- LAB_004050a0. You can see that there is a JZ instruction at 0x0040505d that takes us to this label. In the original aPLib depack.asm, this instruction would normally take us to the donedepacking label but it appears that the FSG packer has replaced this with a jump to its own code! This is where we will start.

To make this easier for any future readers, myself included, here is the specific section we are going to focus on:

LAB_004050a0                        XREF[1]:     0040505d(j)  
004050a0 5f              POP        EDI
004050a1 5b              POP        EBX
004050a2 0f b7 3b        MOVZX      EDI,word ptr [EBX]
004050a5 4f              DEC        EDI
004050a6 74 08           JZ         LAB_004050b0
004050a8 4f              DEC        EDI
004050a9 74 13           JZ         LAB_004050be
004050ab c1 e7 0c        SHL        EDI,0xc
004050ae eb 07           JMP        LAB_004050b7

POP EDI -- Pop the top of the stack into EDI; Believe it or not, using x32dbg, I set a breakpoint at this instruction and the stack is still exactly the same as the diagram I made earlier. This means that the address 0x00405015 will be popped off the stack and placed into EDI. Remember that this address takes us to a little stub of code, but interestingly enough, we don't use this address at all, as you will see in a few instructions.
POP EBX -- Same as the last point, except it will be the address 0x004001D0. I believe this is the true reason for the POP instructions being run. I say that because, looking ahead, EDI is immediately overwritten in the next instruction with the value stored at the address inside of EBX. This to me suggests that the POP EDI instruction was purely meant to get rid of the top value of the stack so this address could be placed in EBX. Exploring to the address in memory shows that there is some data there. What this data is for I am unsure at this point, but the data looks like the following:
```
004001D0  04 04 05 04 01 00 62 21 40 00 02 00 00 00 00 00
```
Lets keep rolling so we can maybe figure out what this data is for -- but if I had to guess right now, I imagine this is some kind of header or struct information.
MOVZX EDI, word ptr [EBX] -- Move with zero extend; In this case, this is going to take the bytes 04 04, place them in EDI and then change the rest of the bytes to 00, so the resulting value should be EDI = 0x00000404. Since the assembly was particular about grabbing these two bytes specifically, I imagine they represent something important. 0x404 is 1028 in decimal.
DEC EDI -- Yup, reduce that value in EDI by one. Why? I know not... yet.
JZ LAB_004050b0 -- Ahhh here is why we decrement EDI -- at least part of the reason why. If The result of DEC EDI produces the result of 0, then ZF will be set to 1 and we will jump to this particular label. Glancing at the label, it appears that the instructions within this label will take us BACK to the beginning of the aPLib "depacking" routine! Take a look:
```
LAB_004050b0                        XREF[1]:     004050a6(j)  
004050b0 8b 7b 02        MOV        EDI,dword ptr [EBX + 0x2]
004050b3 57              PUSH       EDI
004050b4 83 c3 04        ADD        EBX,0x4

LAB_004050b7                        XREF[1]:     004050ae(j)  
004050b7 43              INC        EBX
004050b8 43              INC        EBX
004050b9 e9 51 ff        JMP        LAB_0040500f ---> Back to depacking routine
            ff ff
```
A few other points of interest in this label:
- We are grabbing 4-bytes at [EBX + 0x2] and moving them into EDI. This seems to confirm to me that there is something important about this data located at EBX.
- EBX is incremented by 6 bytes which makes sense -- the first two bytes were used previously then the next 4 bytes were moved into EDI here and placed on the stack. Thats a total of 6 bytes, so perhaps there are pairs of some kind of data here at this address? Perhaps a struct that contains a WORD followed by a DWORD?
- It is possible to jump directly to the second label here, skipping the moving of data from EBX into EDI and incrementing EBX by 4, so perhaps some kind of terminating condition will lead us to the second label?
Whatever the case might be, It seems that there is more data that is being "depacked". Considering that the source for aPLib has this function signature as aP_depack_asm(const void *source, void *destination), we can assume that the value being stored in EDI is likely becoming the destination and the value currently stored in EBX will become the source once it is pushed onto the stack at the beginning of the depacking routine.

Regardless, we can't really know for sure unless we run the code -- for now, we aren't taking this branch so lets march on.

DEC EDI; JZ LAB_004050be -- looks like similar logic to the previous instructions, decrement EDI and if the result is 0, jump to the given label. Taking a brief look at this label shows us the following:

LAB_004050be                        XREF[1]:     004050a9(j)  
004050be 5f              POP        EDI
004050bf bb 28 51        MOV        EBX, PTR_LoadLibraryA_00405128
            40 00

LAB_004050c4                        XREF[1]:     004050d3(j)  
004050c4 47              INC        EDI
004050c5 8b 37           MOV        ESI,dword ptr [EDI]
004050c7 af              SCASD      ES:EDI
004050c8 57              PUSH       EDI
004050c9 ff 13           CALL       dword ptr [EBX]=>KERNEL32.DLL::LoadLibraryA
004050cb 95              XCHG       EAX,EBP
                        
...

LAB_004050e8                        XREF[1]:     004050dd(j)  
004050e8 55              PUSH       EBP
004050e9 ff 53 04        CALL       dword ptr [EBX + 0x4]=>KERNEL32.DLL::GetProcAd
004050ec 09 06           OR         dword ptr [ESI],EAX
004050ee ad              LODSD      ESI
004050ef 75 db           JNZ        LAB_004050cc
004050f1 8b ec           MOV        EBP,ESP
004050f3 c3              RET

WOW! A lot to unpack here, but we are not going to do that right now. I put this code snippet here because it appears that taking this branch leads us straight to all the logic that handles calling LoadLibraryA and GetProcAddress! Quickly scanning the XREFs shows me that once you are in this branch, there is no exitting this part of the code -- it runs until it completes and hits the ret instruction at the end. As much as I want to get ahead of myself and start getting into this part of the binary, I'm going to go back.

SHL EDI,0xc; JMP LAB_004050b7 -- Shift EDI left by 0xc (12) bits, then jump to the label we discussed previously.

Based on this logic, it looks like we will NOT take the first two branches until the specific condition of DEC EDI produces 0 at some point. This suggests to me that the depacking routine will be run a handful of times.

Running this code in x32dbg gave me a lot more context to what is happening with this code but I am unsure the best way to explain what I have found. Here is what appears to be happening:

Two bytes are grabbed from 0x004001D0 (EBX) and stored in EDI
That value is decremented by 2, so, as an example, 0x0404 becomes 0x0402
EDI is shifted left by 12 bits. This means values such as 0x0402 will turn into values like 0x00402000 -- hmmmmm looks VERY much like an address
Increments EBX by 2 (so if it was 0x004001D0, it is now 0x004001D2)
Goes to the top of the depacking, using this address in EDI as the destination. (Perhaps building out the different sections??) and depacks into the address.
This repeats, grabbing the next two bytes, determining the destination address, depacking, etc. until the value that gets placed into EDI is 0x0001. This will trigger the first branch.
At this point, the DWORD value at ebx + 2 is grabbed and stored in EDI. In this case, that value is 0x00402162 and that is pushed onto the stack.
EBX is incremented by 6

When the depacker routine runs THIS time, using the new EDI, This is what gets depacked:

00402162  01 48 20 40 00 6F 6C 65 33 32 2E 64 6C 6C 00 52  .H @.ole32.dll.R  
00402172  6C 65 49 6E 69 74 69 61 6C 69 7A 65 00 46 6F 43  leInitialize.FoC  
00402182  72 65 61 74 65 49 6E 73 74 61 6E 63 65 00 52 6C  reateInstance.Rl  
00402192  65 55 6E 69 6E 69 74 69 61 6C 69 7A 65 00 01 38  eUninitialize..8  
004021A2  20 40 00 4F 4C 45 41 55 54 33 32 2E 64 6C 6C 00   @.OLEAUT32.dll.  
004021B2  02 08 00 00 00 00 02 02 00 00 00 00 02 06 00 00  ................  
004021C2  00 00 01 00 20 40 00 4D 53 56 43 52 54 2E 64 6C  .... @.MSVCRT.dl  
004021D2  6C 00 62 5F 67 65 74 6D 61 69 6E 61 72 67 73 00  l.b_getmainargs.  
004021E2  62 63 6F 6E 74 72 6F 6C 66 70 00 62 65 78 63 65  bcontrolfp.bexce  
004021F2  70 74 5F 68 61 6E 64 6C 65 72 33 00 62 5F 73 65  pt_handler3.b_se  
00402202  74 5F 61 70 70 5F 74 79 70 65 00 62 5F 70 5F 5F  t_app_type.b_p__  
00402212  66 6D 6F 64 65 00 62 5F 70 5F 5F 63 6F 6D 6D 6F  fmode.b_p__commo  
00402222  64 65 00 62 65 78 69 74 00 62 58 63 70 74 46 69  de.bexit.bXcptFi  
00402232  6C 74 65 72 00 68 78 69 74 00 62 5F 70 5F 5F 5F  lter.hxit.b_p___  
00402242  69 6E 69 74 65 6E 76 00 62 69 6E 69 74 74 65 72  initenv.binitter  
00402252  6D 00 62 5F 73 65 74 75 73 65 72 6D 61 74 68 65  m.b_setusermathe  
00402262  72 72 00 62 61 64 6A 75 73 74 5F 66 64 69 76 00  rr.badjust_fdiv.  
00402272  03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................

Looks like some kind of Import information -- we'll keep our eye on this. This information to me seems to suggest that it will be working with COM (ole32.dll) and possible automation and control of other services (OLEAUT32.dll).

The depacker hits this loop again, grabbing bytes from EBX and placing the value 0x0002 into EDI. You might be able to guess what this means...
EDI hits the first DEC instruction, but since the resulting value is 0x01, we do not take the JZ branch. The next DEC instruction takes us to the value 0x00 which means that...
We hit the JZ that jumps us to the instructions that handle LoadLibraryA and GetProcAddress calls.

Awesome! We have broken down this chunk of instructions. After working through all of this, it would seem that some of my initial thoughts were proven correct and some were proven wrong.

It does seem like the data stored at 0x004001D0 is in fact some kind of header/struct used by FSG to give it the information it needs to unpack the rest of the binary, so I'll take that as a win for making that assumption earlier.
I was wrong about the labels LAB_004050b0 and LAB_004050b7 -- I hypothesized that the first label was the code that would be run normally and that there may be 6-byte sized structs located at 0x004001D0 and that some terminating condition would get us to LAB_004050b7, but really I had it backward! The special condition where EDI contains 0x01 and then the decrement makes it hit 0 takes us to LAB_004050b0 and the terminating condition is actually having EDI contain 0x02 so that we can jump to the label LAB_004050be that contains the next logical portion of the code -- the LoadLibraryA and GetProcAddress loop.

The final portion of `FUN_0040501f`

Lets take a look at the last portion of instructions:

LAB_004050be                        XREF[1]:     004050a9(j)  
004050be 5f              POP        EDI
004050bf bb 28 51        MOV        EBX,PTR_LoadLibraryA_00405128
        40 00

LAB_004050c4                        XREF[1]:     004050d3(j)  
004050c4 47              INC        EDI
004050c5 8b 37           MOV        ESI,dword ptr [EDI]
004050c7 af              SCASD      ES:EDI
004050c8 57              PUSH       EDI
004050c9 ff 13           CALL       dword ptr [EBX]=>KERNEL32.DLL::LoadLibraryA
004050cb 95              XCHG       EAX,EBP

LAB_004050cc                        XREF[1]:     004050ef(j)  
004050cc 33 c0           XOR        EAX,EAX

LAB_004050ce                        XREF[1]:     004050cf(j)  
004050ce ae              SCASB      ES:EDI
004050cf 75 fd           JNZ        LAB_004050ce
004050d1 fe 0f           DEC        byte ptr [EDI]
004050d3 74 ef           JZ         LAB_004050c4
004050d5 fe 0f           DEC        byte ptr [EDI]
004050d7 75 06           JNZ        LAB_004050df
004050d9 47              INC        EDI
004050da ff 37           PUSH       dword ptr [EDI]
004050dc af              SCASD      ES:EDI
004050dd eb 09           JMP        LAB_004050e8

LAB_004050df                        XREF[1]:     004050d7(j)  
004050df fe 0f           DEC        byte ptr [EDI]
004050e1 0f 84 a9        JZ         LAB_00401090
        bf ff ff
004050e7 57              PUSH       EDI

LAB_004050e8                        XREF[1]:     004050dd(j)  
004050e8 55              PUSH       EBP
004050e9 ff 53 04        CALL       dword ptr [EBX + 0x4]=>KERNEL32.DLL::GetProcAd
004050ec 09 06           OR         dword ptr [ESI],EAX
004050ee ad              LODSD      ESI
004050ef 75 db           JNZ        LAB_004050cc
004050f1 8b ec           MOV        EBP,ESP
004050f3 c3              RET

The end is in sight! Let's take it from the top.

POP EDI -- When we hit this instruction the value at the top of the stack is 0x00402162. That address looks familiar... remember what was unpacked there? The data that looked like imports! EDI will contain this address.
MOV EBX, PTR_LoadLibraryA_00405128 -- The address for LoadLibraryA is contained at this pointer. EBX will contain 0x00405128. Looking at that address, we find the bytes 40 0E E2 75. Since this is little endian, that'll be 0x75E20E40 which resolves to -- you guessed it -- kernel32.LoadLibraryA.
INC EDI; MOV ESI,dword ptr [EDI] -- Take a look at the data located at 0x00402162. Incrementing EDI will move EDI to the next byte (0x00402163) and then the next instruction will grab the 4-byte value stored in EDI and place it into ESI. ESI will then have the value 0x00402048. This looks like an address to me!
SCASD ES:EDI; PUSH EDI; CALL dword ptr [EBX]=>KERNEL32.DLL::LoadLibraryA -- The instruction SCASD essentially runs CMP EAX, EDI and then increments EDI by 4 bytes. This means following SCASD, EDI will contain 0x00402167 -- This is the start of the string ole32.dll. This address is then pushed onto the stack and LoadLibraryA is then called. For reference, the function signature for LoadLibraryA is:
```
HMODULE LoadLibraryA([in] LPCSTR lpLibFileName);
```
LoadLibraryA returns the handle to the module on success or a NULL on failure inside of EAX.

So tl;dr: we are loading libraries. :)
XCHG EAX,EBP; XOR EAX,EAX; SCASB; JNZ LAB_004050ce -- After calling LoadLibraryA, we will store the return into EBP, then clear EAX. This prepares EAX for the next instruction, SCASB. SCASB will go byte by byte over the string at EDI (remember that it currently points to the string ole32.dll) and compare the byte to the value in EAX. If the bytes do not equal each other, this will jump back to the SCASB instruction and keep moving until it is equal. In this case, since EAX contains 0x00, this little loop is looking for a 0 byte -- its looking for the end of the string!
The next bit requires a bigger picture to understand. Check this out:
```
DEC        byte ptr [EDI]
JZ         LAB_004050c4
DEC        byte ptr [EDI]
JNZ        LAB_004050df
INC        EDI
PUSH       dword ptr [EDI]
SCASD      ES:EDI
JMP        LAB_004050e8
DEC        byte ptr [EDI]
JZ         LAB_00401090
PUSH       EDI
PUSH       EBP
CALL       dword ptr [EBX + 0x4]=>KERNEL32.DLL::GetProcAddress
OR         dword ptr [ESI],EAX
LODSD      ESI
JNZ        LAB_004050cc
MOV        EBP,ESP
RET
```
What in the world is happening here??? Interestingly enough, there is a little bit of obfuscation going on with the imported function names! Check it out:
```
RleInitialize
FoCreateInstance
RleUninitialize
```
If you look up these function names, they don't exist!, but you know what does exist?
```
OleInitialize
CoCreateInstance
OleUninitialize
```
The first letter of the imported function names has been shifted up by 3.

The first time this logic is hit EDI points to the string RleInitialize. DEC byte ptr [EDI] will change this string to QleInitialize. It performs a check -- if the result of this operation produced a 0, it would then jump back up to restart the LoadLibraryA loop. Since that is not the case it keeps going and executes DEC again, producing the string PleInitialize.

The next check will jump if the result is NOT zero. Of course the character P is not 0x00, so the result is not zero. We jump to 0x004050df which is ANOTHER DEC byte ptr [EDI]. This produces the string OleInitialize -- the string it actually needs!

With that string, it then moves on to setting up the stack for the call to GetProcAddress. The signature is:
```
FARPROC GetProcAddress([in] HMODULE hModule, [in] LPCSTR  lpProcName);
```
Remember that EBP contains the handle to the module and EDI is the address of the string containing the name of the function the code wants to use.

GetProcAddress gets called, and then upon return, performs an OR between the 4 bytes at [ESI] and EAX (the result of GetProcAddress). The 4 bytes located at the address contained in ESI are then placed into EAX by the LODSD instruction and ESI is incremented by 4 bytes.

With regards to the purpose of these instructions, the first time we hit these instructions ESI contains the address 00402048 which points to 4-bytes that contain no data. It would seem that GetProcAddress is being run, and then the result is being stored here. Since ESI is incrementing with each passthrough, I believe that this is meant to be an array of function addresses.

Note if the result of the OR instruction from earlier is not zero, then the JNZ instruction will loop us back to the LoadLibraryA functionality. If the result of the OR is zero, then we will RET which should cause the application to close (if I am looking at it correctly). This to me seems like an error catching condition -- if this process gets messed up somehow, then exit.

This process will continue to repeat until all the libraries have been loaded and the function addresses have been retrieved. Once this happens, there is a terminating entry in the list of functions to import that contains only the value 0x03. Remember how the logic works -- we decrement [EDI] 3 times to get the correct first letter for each of the function names. So what happens if this first byte is 0x03? Well the result will be 0x00 after this decrements occur. This means we will hit the final conditional that we haven't taken -- JZ LAB_00401090. Where the heck is that??? Let's take a look at whats there:
```
...
00401090 | 55                       | push ebp                                |
00401091 | 8BEC                     | mov ebp,esp                             |
00401093 | 6A FF                    | push FFFFFFFF                           |
00401095 | 68 78204000              | push 402078                             |
0040109A | 68 D0114000              | push <JMP.&_except_handler3>            |
0040109F | 64:A1 00000000           | mov eax,dword ptr fs:[0]                |
004010A5 | 50                       | push eax                                |
004010A6 | 64:8925 00000000         | mov dword ptr fs:[0],esp                |
004010AD | 83EC 20                  | sub esp,20                              |
004010B0 | 53                       | push ebx                                |
004010B1 | 56                       | push esi                                |
004010B2 | 57                       | push edi                                |
004010B3 | 8965 E8                  | mov dword ptr ss:[ebp-18],esp           |
004010B6 | 8365 FC 00               | and dword ptr ss:[ebp-4],0              |
004010BA | 6A 01                    | push 1                                  |
004010BC | FF15 0C204000            | call dword ptr ds:[<&__set_app_type>]   |
...
```
Look at that, it looks like the start of a real function! It even looks like we see some exception handling setup and we can see that the call to __set_app_type is setting this to a console app. This is code that you'd likely see run prior to main being executed in your code. For reference, here is the start of another binary's entry point:
```
00401820 | 55                       | push ebp                                |
00401821 | 8BEC                     | mov ebp,esp                             |
00401823 | 6A FF                    | push FFFFFFFF                           |
00401825 | 68 70204000              | push 402070                             |
0040182A | 68 60194000              | push <JMP.&_except_handler3>            |
0040182F | 64:A1 00000000           | mov eax,dword ptr fs:[0]                |
00401835 | 50                       | push eax                                |
00401836 | 64:8925 00000000         | mov dword ptr fs:[0],esp                |
0040183D | 83EC 20                  | sub esp,20                              |
00401840 | 53                       | push ebx                                |
00401841 | 56                       | push esi                                |
00401842 | 57                       | push edi                                |
00401843 | 8965 E8                  | mov dword ptr ss:[ebp-18],esp           |
00401846 | 8365 FC 00               | and dword ptr ss:[ebp-4],0              |
0040184A | 6A 01                    | push 1                                  |
0040184C | FF15 58204000            | call dword ptr ds:[<__set_app_type>]    |
```
They are essentially identical. Pretty cool!

I'd say it is safe to assume at this point that we have found the actual entry point of our packed binary! Now for the final part -- determining how to unpack this statically without running it and making the tool to do so.

The Breakdown

So what parts of this functionality are important for us to understand so that we can properly unpack the binary statically? Lets review the important overall steps of the FSG depacking stub:

Use aPLib to unpack the different data chunks. Note that all destinations after the first are derived from some kind of struct or header located at 4001D0:
- First destination (EDI) is 401000, first source is 404000
  - This the .text section
- Second EDI is 402000, second ESI is 404156
  - 402000 is derived by getting the two bytes at 4001d0 and then subtracting 2 and shifting left 12 bits
  - 402000 - 402053 contain addresses to imported functions. I believe this is the .rdata section!
  - 402058 - 402083 contains data that, after some further debugging and setting breakpoints on accessing the data, determined that these are variables used for various function calls and operations throughout the code. This is still a continuation of the .rdata section.
- Third EDI is 403000, third ESI is 40417F -- This contains string resources -- I imagine this is the .data or .rsrc section.
- Fourth EDI is 402162, fourth ESI is 4041BF -- I believe this is the rest of the .rdata section, as it contains the rest of the imports information essentially just as you would see it in a normal .rdata section.
To get more clarity on this, I threw the binary into the classic PEview tool. There are three IMAGE_SECTION_HEADERs. The sections are as follows:
1. 00401000 - 00403FFF - Based on what we found, this is where the main program will be after it is fully unpacked. Interestingly enough, the section header marks this as having a "size of raw data" of 0. This is one of those indicators that would have told us that the binary is likely packed (if we didn't already know).
2. 00404000 - 00404FFF - This is the section that is used to store all the packed data. In the binary file on disk, this raw data is located at 0x1000
3. 00405000 - 00405FFF - This is the section that contains the FSG depacking stub. In the binary file on disk, this raw data is located at 0xE00.
  - Just a note from my future self -- after working on this project further,
Note From My Future Self

I don't know why I didn't consider looking into it during this stage of my reversing, but how does the decompression algorithm know when to stop for each of these source/destination combos? Easy enough, take a look at an aPLib compressed binary and look at the section that contains the compressed/packed data. You might notice something -- the lack of 0x00 bytes. I'm not entirely sure what role null bytes play in the algorithm as a whole, but I noticed that seeing two null bytes (00 00) or a null byte followed by a 1 (00 01) were always the delimiters between the sections.

On another note, the depacking "header" seems to ALWAYS come after the IMAGE_SECTION_HEADERs. Every FSG 1.0 packed binary I have seen only has 3 IMAGE_SECTION_HEADERs. The DOS, DOS stub, and NT headers combined seem to end at 0x157, and the IMAGE_SECTIONS_HEADERs end at 0x1CF, which places the depacking "header" at 0x1D0 pretty consistently (as in I have never seen a case where this is not true, but doesn't mean that it is ALWAYS true). It seems the packer is taking advantage of the empty space at the end of the section mapped for the PE header.

In a similar vein, I noticed in every binary that I looked at that the packed data section always immediately follows the section mapped for the unpacked data. I also noticed that the depacking stub seems to always immediately follow the packed data section, so determining the location of the "header", packed data, and the stub can all be derived by using just the PE header of the packed binary.

Note that these observations are for the binary while in memory (while it is being run). When it comes to being on disk, the data is layed out with the PE header, then depacking stub, then the packed data.
Once unpacked, the depacker iterates over the list of imports located at 402162, calling LoadLibraryA on all the DLLs, demangling the import names, and callingGetProcAddress on the functions we are interested in. Note that these functions passed to GetProcAddress could be string names or ordinals.
Addressess retrieved using GetProcAddress are stored at 00402000 in groupings based on DLL. These groupings are delimited by a 4-byte null terminator.
Once all function addresses have been retrieved, the depacking stub jumps to the OEP.

With this, we have what we need to get started on a basic static depacker.

Confirming My Findings

After some digging, I was able to find an original fsg.exe binary -- maybe loaded with malware but who knows 🤷.

Either way, I created two quick sample applications that I can use to test the fsg packer. These are simple applications with no optimizations.

The first is a basic server that accepts connections, reports the connection, and then disconnects.

The second is an application that will spawn a command prompt terminal window.

I have specifically made a .def file for sample_02 so that I could import by ordinal rather than name and make sure I can catch that kind of functionality in the unpacker. I also removed any dependency on mvscrt or other libraries -- the binary only imports kernel32.dll and the necessary functions it needs to run.

Lets pack them and check results!

Using FSG and Results

The original sample_01.c file, when compiled on my machine, produces ~115KB binary. After running fsg on it, the binary is now ~61KB in size -- Wow! That is nearly a 50% compression rate right there. Impressive.

I had very similar results with sample_02.c.

When I attempted to run both binaries, I found that they were crashing! Opening up the binaries in x32dbg showed me that the entry point of the depacking stub is at 0042E000 in both binaries. Trying to run the depacking stub produces an EXCEPTION_ACCESS_VIOLATION on the very first instruction. Why?

Checking the logs, this is what I get:

Invalid relocation block for module sample_01.exe!
Breakpoint at 0042E000 (entry breakpoint) set!
DLL Loaded: 77310000 C:\Windows\SysWOW64\ntdll.dll
DLL Loaded: 74F70000 C:\Windows\SysWOW64\kernel32.dll
DLL Loaded: 75150000 C:\Windows\SysWOW64\KernelBase.dll
System breakpoint reached!
EXCEPTION_DEBUG_INFO:
           dwFirstChance: 1
           ExceptionCode: C0000005 (EXCEPTION_ACCESS_VIOLATION)
          ExceptionFlags: 00000000
        ExceptionAddress: <sample_01.OptionalHeader.AddressOfEntryPoint> (0042E000)
        NumberParameters: 2
ExceptionInformation[00]: 00000008 DEP Violation
ExceptionInformation[01]: <sample_01.OptionalHeader.AddressOfEntryPoint> (0042E000) Inaccessible Address
First chance exception on 0042E000 (C0000005, EXCEPTION_ACCESS_VIOLATION)!

Looks like we have a few problems here. First of all is the relocation block being invalid! I imagine that has something to do with fsg itself since the binary functions properly prior to being compressed. Lets get rid of that by adding the /FIXED flag to the linker when building. Now we get:

Breakpoint at 0042E000 (entry breakpoint) set!
DLL Loaded: 77310000 C:\Windows\SysWOW64\ntdll.dll
DLL Loaded: 74F70000 C:\Windows\SysWOW64\kernel32.dll
DLL Loaded: 75150000 C:\Windows\SysWOW64\KernelBase.dll
System breakpoint reached!
EXCEPTION_DEBUG_INFO:
           dwFirstChance: 1
           ExceptionCode: C0000005 (EXCEPTION_ACCESS_VIOLATION)
          ExceptionFlags: 00000000
        ExceptionAddress: <sample_01.OptionalHeader.AddressOfEntryPoint> (0042E000)
        NumberParameters: 2
ExceptionInformation[00]: 00000008 DEP Violation
ExceptionInformation[01]: <sample_01.OptionalHeader.AddressOfEntryPoint> (0042E000) Inaccessible Address
First chance exception on 0042E000 (C0000005, EXCEPTION_ACCESS_VIOLATION)!

That fix got rid of the relocation block error, but we are still left with the DEP problem. To fix this, lets add the NXCOMPAT:NO flag to the build and try again. After doing that we get....

Breakpoint at 0042E000 (entry breakpoint) set!
DLL Loaded: 77310000 C:\Windows\SysWOW64\ntdll.dll
DLL Loaded: 74F70000 C:\Windows\SysWOW64\kernel32.dll
DLL Loaded: 75150000 C:\Windows\SysWOW64\KernelBase.dll
System breakpoint reached!
INT3 breakpoint "entry breakpoint" at <sample_01.OptionalHeader.AddressOfEntryPoint> (0042E000)!
DLL Loaded: 76AE0000 C:\Windows\SysWOW64\ws2_32.dll
DLL Loaded: 75B20000 C:\Windows\SysWOW64\rpcrt4.dll
Thread 1264 created, Entry: ntdll.7734EF90, Parameter: 00583B90
DLL Loaded: 70270000 C:\Windows\SysWOW64\mswsock.dll
Thread 1264 exit
Process stopped with exit code 0x0 (0)

Lets go! It worked! We now have two test binaries we can examine. Lets look at sample_02.exe first and see if the pattern for the fsg depacker is as we described it above.

Everything looks the same except for some changes in the addresses. This means I can't bank on addresses like 00405000 being the entry point of the depacking stub and 00401090 being the OEP -- I'll have to get the entry point from the PE header and then locate the jmp that takes us to the OEP. Thankfully it appears that all the instructions for the depacking logic is still the same other than the swapped out addresses.

On to making our static depacker!

Making the Tool

So... how do we do this? I am going to be using C. Based on the overall important steps performed by the depacking stub, here is how I think I can approach this problem:

Map a view of the target executable file
Get all the information we need from that file such as NT header information and section table information that will help is in the unpacking of the binary
Create a new output file, map a view of it, and retrieve all information we need for it.
Find the FSG "header" (I'm not sure what else I would call it) and gather the information I need from it to unpack the binary.
Create the PE header, starting with DOS then NT
Create the section table and unpack the sections using aPLib.
We're done! At least I think...

I've run into a few issues as I have been implementing this though.

First, specifics. My original reversing got me through a somewhat specific but also generalized description of what the depacking stub did to unpack the binary.

Ends up that generalizations do not work too well in code. Who would have thought?

When I mean specifics, I mean SPECIFICS. We are rebuilding headers and section tables from scratch. Where is the raw data located in the binary? What is the RVA of all the different section? The Virtual size? The raw size? EVERYTHING. This alone isn't too bad, but add in that we actually don't know a lot of this informationbefore hand because all the sections are packed together in a big blob of data and then we start to have problems. Why? Well, to start...

We actually don't know how big each section is going to be until it has been decompressed. This means we can't reliably determine where to place all the sections in the binary or fixup the section headers until the data has been decompressed. For example, without knowing how big the .text section is, we don't know where we need to put the .rdata section and without the .rdata section, we don't know where to put the .data. This also means we need information from the first section to reliably place the second, and information from the second section to reliably place the third, and so on.
Second, given the blob of compressed data, we don't know where one section of compressed data ends and the other begins. You can call aP_depack_asm on the compressed data to decompress it. This will do its thing, unpacking the data from the source into the destination and then return the size of the decompressed data. Nice! Lets depack the next section! Oh wait... we can't because we don't know where the start of the next compressed section is. The depacking routine doesn't give us this information. Interestingly aP_depack_asm keeps track of the current source byte location using ESI throughout the entirety of the function running. I thought maybe we could snag it after the function ends, but the register gets wiped out at the end of the function during the popad instruction. Without this information, we would have to manually parse the packed data until we found the end of the current section/start of the next section so we can start actually depacking that section. I've got a work around for this though I'll talk about later.

There are some good things though! We can make some assumptions that will help us with the output binary such as:

All FSG 1.0 packed binaries I have seen only have 3 sections regardless of how many sections the original binary had. Why is this important? We don't need to worry about decompressing anything other than these 3 sections and since this is the case, we know how big our resultant binary's section header table will be.
Thankfully, section data typically falls directly after the section headers so this means if we know how big the section header table is (which we do due to it only ever containing three entries), we have a default location to put the decompressed data for the first section -- AKA the .text section. The alignment of the different sections of data in the binary is 0x200 bytes by default so we if our section header table ends around 0x230 - 0x250, we can put the text section at 0x400 in the binary and be pretty certain that we will not accidentally overwrite something important.
Oh yeah, aPLib is somewhat open-source. We don't have the packing code, but we definitely have the depacking code from depack.asm. With a little finagling, we can convert the FASM code to something that a MSVC declspec(naked) __asm block will be able to run. Not only that, we can add a few of our own instructions to make it return ESI to us which will help relieve a problem I mentioned earlier about not knowing where the end of one block and the start of the next is. One more addition I made here was allowing the functionality to run without actually doing the decompression, meaning we can get this information from the function without needing to actually decompress the data. Doing this also comes with the benefit of not needing to link aplib.lib or include aplib.h -- we can just add the ASM to the header file for this code.

After ALL of this work, something is still preventing the unpacked binary from running. I am assuming that I messed up during the unpacking and patching of the section header tables and/or the PE header.

The error I am getting is "The unpacked.exe application cannot be run in Win32 mode." PEView shows that I have definitely goofed up. For some reason, the IMPORT Name Table and Address Table are showing up before the DOS stub despite the addresses being correct in the rdata section header. Also, the IMPORT directory table in the rdata section is displaying wonky information. I definitely did something not quite right while building the executable.

TO BE CONTINUED

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
include		include
samples		samples
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.bat		build.bat
static_fsg_unpacker.sln		static_fsg_unpacker.sln
static_fsg_unpacker.vcxproj		static_fsg_unpacker.vcxproj
static_fsg_unpacker.vcxproj.filters		static_fsg_unpacker.vcxproj.filters

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

fsg_unpacker

VERY IMPORTANT NOTE I WANT YOU TO READ

Building

Usage

Write-Up: How I Reversed FSG 1.0

Initial Recon

The start of `FUN_0040501f`

aPLib `depack.asm`

The next portion of `FUN_0040501f`

The final portion of `FUN_0040501f`

The Breakdown

Confirming My Findings

Using FSG and Results

Making the Tool

About

Uh oh!

Releases

Packages

Uh oh!

Languages

License

mmvest/static_fsg_unpacker

Folders and files

Latest commit

History

Repository files navigation

fsg_unpacker

VERY IMPORTANT NOTE I WANT YOU TO READ

Building

Usage

Write-Up: How I Reversed FSG 1.0

Initial Recon

The start of FUN_0040501f

aPLib depack.asm

The next portion of FUN_0040501f

The final portion of FUN_0040501f

The Breakdown

Confirming My Findings

Using FSG and Results

Making the Tool

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

The start of `FUN_0040501f`

aPLib `depack.asm`

The next portion of `FUN_0040501f`

The final portion of `FUN_0040501f`

Packages