A nearly functional tool for statically unpacking FSG 1.0 packed binaries -- this is a personal project I decided to take on after starting the Practical Malware Analysis book. Yes the book is a bit outdated, but the content is still relevant and good. The first chapter introduces a challenge that involves analyzing a binary packed with fsg1.0. After doing the challenges for a bit, I wasn't satisfied with the fact that it seems the only way to unpack an FSG packed binary was to do it dynamically. Surely a tool out there exists to unpack FSG packed binaries statically, right? If there is, I didn't find one. Thus, I decided to make my own.
This project is nearly complete. I have entirely reversed the functionality of the depacking stub and have figured out how to pull the decompressed/depacked data out of the original packed binary and place the unpacked contents into a new binary that is free of the FSG depacking stub -- all of this done statically without the need to run the FSG packed binary! Any roadblocks are described in my write up, but the jist of it is that I need to correct some errors I have made with creating the PE header from scratch and then the tool should be good to go!
Windows defender will pick up just about any FSG packed binary as malware. The samples I provide with this ARE NOT MALWARE. I provide the source code so you can build and pack them yourself as well if you want. Windows will likely see them, toss them out, and screech at you about the binaries being "malware". They are NOT.
You can build the project in two ways:
- Open the Visual Studio solution and build.
- Run the
build.batscript with no arguments. Running the script with "clean" as the argument will delete the build folder. Running with "samples" as the argument will build all the sample code so you can have original executables to compare against the packed binaries and the binaries unpacked by the tool.
To use the tool:
fsg_unpacker.exe <input_filename> [output_filename]_
Input filename is required, output is optional. If no output file name is given, it defaults to unpacked.exe.
What you are going to read here is almost entirely the step-by-step process I took while working through reverse engineering the FSG 1.0 depacking stub functionality and developing the tool to unpack it statically. I rarely went back and corrected any of my notes, and this is on purpose. When reversing, sometimes I'm on point, sometimes I back track, sometimes I'm just plain wrong, but I wanted to keep detailed notes of each step that I took so that future me and others can see what exactly I did and what path I took to get there. Hopefully others can learn from my mistakes, my successes, and maybe even my methodologies I used to for this project. With that being said, know that not everything I write is 100% accurate -- sometimes I make assumptions, write those assumptions down as if they are truth (such as saying something will "always" or "never" happen) and then I will learn more information later on and realize what I thought/assumed/said was not quite right. Live and learn, yeah? Have fun parsing through my notes :D
I found at https://defacto2.net/f/b224796 that FSG features aPLib compression. Perhaps I can use the aPLib library to decompress what was compressed. I still need to know more about exactly how it works. Lets throw it in Ghidra.
Upon opening the binary in Ghidra, entry looks like the following:
entry
00405000 bb d0 01 MOV EBX,DAT_004001d0
40 00
00405005 bf 00 10 MOV EDI,DAT_00401000
40 00
0040500a be 00 40 MOV ESI,DAT_00404000
40 00
LAB_0040500f
0040500f 53 PUSH EBX=>DAT_004001d0
00405010 e8 0a 00 CALL FUN_0040501f
00 00
00405015 02 d2 ADD DL,DL
00405017 75 05 JNZ LAB_0040501e
00405019 8a 16 MOV DL,byte ptr [ESI]=>DAT_00404000
0040501b 46 INC ESI
0040501c 12 d2 ADC DL,DL
LAB_0040501e
0040501e c3 RET
There are a few things that immediately stand out:
-
There are three registers that have addresses moved into them --
EBX,EDI,ESI. I'll keep my eye out for these to make sure I don't miss something important. -
Only one value is being pushed onto the stack prior to the call --
EBX. That suggests thatEBXis the only parameter to the function. I don't know of any calling conventions that use EDI or ESI so I'm assuming they are not parameters to the function. -
Since I know this binary is packed using FSG 1.0 (atleast based on what PEiD told me), it seems to me that
FUN_0040501fis likely what contains or sets off the logic responsible for unpacking the binary. This also means that the data inEBXis likely important to the function.
With that in mind, lets see what data resides at the address 0x004001d0:
DAT_004001d0 XREF[2]: entry:00405000(*),
entry:0040500f(*)
004001d0 04 ?? 04h
004001d1 04 ?? 04h
004001d2 05 ?? 05h
004001d3 04 ?? 04h
004001d4 01 ?? 01h
004001d5 00 ?? 00h
004001d6 62 ?? 62h b ? -> 00402162
004001d7 21 ?? 21h !
004001d8 40 ?? 40h @
004001d9 00 ?? 00h
004001da 02 ?? 02h
004001db 00 ?? 00h
...
I can't seem to glean much information from what is here just by looking at it, but it does seem like there is some data here. I am going to jump into looking at FUN_0040501f -- perhaps it can give me some more insight into what DAT_004001d0 is used for.
Lets look at the start of the function:
FUN_0040501f XREF[1]: entry:00405010(c)
0040501f fc CLD
00405020 b2 80 MOV DL,0x80
LAB_00405022 XREF[1]: 00405029(j)
00405022 a4 MOVSB ES:EDI,ESI
00405023 6a 02 PUSH 0x2
00405025 5b POP EBX
LAB_00405026 XREF[2]: 00405048(j), 0040508e(j)
00405026 ff 14 24 CALL dword ptr [ESP]
00405029 73 f7 JNC LAB_00405022
...
Whats happening here?
-
CLD-- Clearing the direction flagDF(https://www.felixcloutier.com/x86/cld). Essentially this means that string operations will INCREMENTESIandEDIrather than decrement them. -
MOV DL,0x80-- This is putting0x80intoDLwhich is the lower 8-bits ofEDX. Since we don't know whatEDXwill be at this point, we just need to keep this in mind. -
MOVSB ES:EDI,ESI-- This instruction is copying a byte located inESIand placing it inEDI.ESIandEDIwere set in theentry, so this is copying bytes from0x00404000and placing them in0x00401000. This is likely paired with some kind of loop, and looking a bit further down (or just looking at the XREF) we can see that there is in fact a jump to this instruction. -
PUSH 0x2-- Pushing0x2onto the stack -
POP EBX-- Removing0x2from the stack and placing it intoEBX -
Call [ESP]-- An address stored on the stack is being called. This one is interesting because there has been very little stack interaction so we can likely figure out what it is calling just by walking through what we have seen.Lets look at every time we have interacted with the stack:
0040500f PUSH EBX00405010 CALL FUN_0040501f00405023 PUSH 0x200405025 POP EBX
With this, we can build what the stack looks like:
// PUSH EBX ┌──────────────────────┐ ESP ──► │ 0x004001d0 │ └──────────────────────┘ // CALL FUN_0040501F ┌──────────────────────┐ ESP ──► │ 0x00405015 (ret addr)│ ├──────────────────────┤ │ 0x004001d0 │ └──────────────────────┘ // PUSH 0x2 ┌──────────────────────┐ ESP ──► │ 0x2 │ ├──────────────────────┤ │ 0x00405015 (ret addr)│ ├──────────────────────┤ │ 0x004001d0 │ └──────────────────────┘ // POP EBX ┌──────────────────────┐ ESP ──► │ 0x00405015 (ret addr)│ ├──────────────────────┤ │ 0x004001d0 │ └──────────────────────┘If I am following this correctly then
call [ESP]is going to turn intocall 0x00405015. How cool! -
JNC LAB_00405022--JNCmeans "jump if not carry" (https://www.felixcloutier.com/x86/jcc). Essentially, if the carry flagCFequals 0, then we jump. The carry flag is set in a number of ways, but typically it is set in the following cases:- Arithmetic that results in an overflow, (e.g. when the result of an
ADDis unable to fit in the register). - Shifting or rotating bits that results in a bit getting pushed out of the register.
So essentially, if this carry flag is NOT set due to some other instructions, then we will jump back up to the
MOVSBinstruction and repeat. - Arithmetic that results in an overflow, (e.g. when the result of an
With that, we have almost reversed this first little loop in the binary. Hopefully its not totally useless! To complete this portion though, I need to look at the code found at 0x00405015 since this gets called during the loop. Lets look at that:
00405015 02 d2 ADD DL,DL
00405017 75 05 JNZ LAB_0040501e
00405019 8a 16 MOV DL,byte ptr [ESI]=>DAT_00404000 = 83h
0040501b 46 INC ESI
0040501c 12 d2 ADC DL,DL
LAB_0040501e XREF[1]: 00405017(j)
0040501e c3 RET
Easy enough, let's step by step it again:
-
ADD DL,DL-- DoubleDL; Remember this got set to0x80at0x00405020at the start of the loop. Doubling this means that the resultant value will be0x100which is too large to fit inDL, soDLwill truncate to0x00and then set theCFflag to 1. Keep this in mind as I imagine it is directly important to theJNCinstruction at0x00405029. As another note,ZF(the zero flag) will be set to 1. I discovered this becomes pertinent to the following instruction after starting to look into it. -
JNZ LAB_0040501e-- Jump if not zero (https://www.felixcloutier.com/x86/jcc). This will take the jump only if theZFflag is set to0. If the prior instruction produces a result of0x00, then this jump will NOT be taken. If it produces a value other than0x00, then this will jump directly to theRETinstruction located at0x0040501e. -
MOV DL, [ESI]-- This one is interesting as it will take the value found atESIand then copy it intoDL. We need to remember thatESIat this point has already incremented because of theMOVSBlocated at0x00405022. So logically, the 0th byte ofESIis moved intoEDIduringMOVSB. Then the values are incremented. Then at this point we are taking the 1st byte fromESIand placing it inDL. For what purpose... I know not... yet. -
INC ESI-- Okay, so we are going to incrementESIAGAIN. This means (just from taking a brief glance at the rest of the instructions above), thatESIis now on the 2nd byte (I'm working off 0 indexing here, just as a reminder). This implies that the 0th, 2nd, 4th, 6th, etc. bytes are going to be moved usingMOVSBintoEDIwhile the others are used byDLfor some currently unknown purpose. -
ADC DL,DL-- Add with carry (https://www.felixcloutier.com/x86/adc). This adds the destination, source, and carry flag and stores the result inside the destination. Documentation specifies that "the state of theCFflag represents a carry from a previous addition." Essentially the value ofCFis derived from what it is prior to this instruction being run. In our case, this is whatever valueCFis set to after runningADD DL,DLat0x00405015. Note that this will also setCFafter it executes. -
RET-- Yup, we're finally returning to0x00405029which is the jump instruction that determines whether we loop or not.
WOW! Alot to keep track of here -- and this is just the start! To summarize, so far, the code essentially clears DF so MOVSB increments rather than decrements. It then loads DL with 0x80 and enters a loop where it:
-
Copies one byte from
ESItoEDI. -
Calls
0x00405015. -
Performs
ADD DL,DLand, if that result is zero, "reloads"DLfrom the next byte and doesADC DL,DL, settingCFwheneverDLoverflows. -
Returns to the loop and tests
CF. If there’s no carry, it goes back and copies the next byte usingMOVSB. If there is a carry, the jump fails and we fall through to the next chunk of code.
Just looking at this makes me feel like we are looking at some kind of compression algorithm which would make sense based on what I noted at the beginning -- FSG uses aPLib for compression.
Well... come to find out, I believe this is, in fact, the "depacking" function from aPLib.
Thankfully, we can go look at the source for aPLib. After downloading the library, I decided to look at the source a bit, and what do you know -- there is a depack.asm file. Upon opening the code, I immediately noticed something that looked... familiar:
aP_depack_asm:
; aP_depack_asm(const void *source, void *destination)
_ret$ equ 7*4
_src$ equ 8*4 + 4
_dst$ equ 8*4 + 8
pushad
mov esi, [esp + _src$] ; C calling convention
mov edi, [esp + _dst$]
cld
mov dl, 80h
xor ebx,ebx
literal:
movsb
mov bl, 2
nexttag:
call getbit
jnc literal
...
Wait... doesn't this look like the lines that we just reversed? the only difference is that in our code, there is no xor ebx,ebx and no mov bl, 2. Instead we have push 0x2 and pop ebx. I imagine this is due to the compiler making this decision, or perhaps the version of aPLib used to make fsg 1.0 just had different code at the time. Regardless, the functionality there is the same. This is awesome! Using the source, I was able to put names to variables and labels inside of Ghidra. I believe this find has saved me a substantial amount of time.
With that out of the way, it seems that a large portion of the depacking stub is just this compression code from aPLib. I believe from just a quick glance that the rest of the depacking stub is part that repairs the IAT and such so that the unpacked binary functions properly.
The following is the raw, uneditted disassembly from Ghidra of the next chunk of instructions we need to reverse:
LAB_004050a0 XREF[1]: 0040505d(j)
004050a0 5f POP EDI
004050a1 5b POP EBX
004050a2 0f b7 3b MOVZX EDI,word ptr [EBX]
004050a5 4f DEC EDI
004050a6 74 08 JZ LAB_004050b0
004050a8 4f DEC EDI
004050a9 74 13 JZ LAB_004050be
004050ab c1 e7 0c SHL EDI,0xc
004050ae eb 07 JMP LAB_004050b7
LAB_004050b0 XREF[1]: 004050a6(j)
004050b0 8b 7b 02 MOV EDI,dword ptr [EBX + 0x2]
004050b3 57 PUSH EDI
004050b4 83 c3 04 ADD EBX,0x4
LAB_004050b7 XREF[1]: 004050ae(j)
004050b7 43 INC EBX
004050b8 43 INC EBX
004050b9 e9 51 ff JMP LAB_0040500f
ff ff
This to me seems quite manageable! Let's dig in!!!
Well... where do we start? Looking at the cross references for each of the labels, there is only 1 label that has a cross-reference from the aPLib code snippet -- LAB_004050a0. You can see that there is a JZ instruction at 0x0040505d that takes us to this label. In the original aPLib depack.asm, this instruction would normally take us to the donedepacking label but it appears that the FSG packer has replaced this with a jump to its own code! This is where we will start.
To make this easier for any future readers, myself included, here is the specific section we are going to focus on:
LAB_004050a0 XREF[1]: 0040505d(j)
004050a0 5f POP EDI
004050a1 5b POP EBX
004050a2 0f b7 3b MOVZX EDI,word ptr [EBX]
004050a5 4f DEC EDI
004050a6 74 08 JZ LAB_004050b0
004050a8 4f DEC EDI
004050a9 74 13 JZ LAB_004050be
004050ab c1 e7 0c SHL EDI,0xc
004050ae eb 07 JMP LAB_004050b7
-
POP EDI-- Pop the top of the stack intoEDI; Believe it or not, using x32dbg, I set a breakpoint at this instruction and the stack is still exactly the same as the diagram I made earlier. This means that the address0x00405015will be popped off the stack and placed intoEDI. Remember that this address takes us to a little stub of code, but interestingly enough, we don't use this address at all, as you will see in a few instructions. -
POP EBX-- Same as the last point, except it will be the address0x004001D0. I believe this is the true reason for thePOPinstructions being run. I say that because, looking ahead,EDIis immediately overwritten in the next instruction with the value stored at the address inside ofEBX. This to me suggests that thePOP EDIinstruction was purely meant to get rid of the top value of the stack so this address could be placed inEBX. Exploring to the address in memory shows that there is some data there. What this data is for I am unsure at this point, but the data looks like the following:004001D0 04 04 05 04 01 00 62 21 40 00 02 00 00 00 00 00Lets keep rolling so we can maybe figure out what this data is for -- but if I had to guess right now, I imagine this is some kind of header or struct information.
-
MOVZX EDI, word ptr [EBX]-- Move with zero extend; In this case, this is going to take the bytes04 04, place them inEDIand then change the rest of the bytes to00, so the resulting value should beEDI = 0x00000404. Since the assembly was particular about grabbing these two bytes specifically, I imagine they represent something important.0x404is1028in decimal. -
DEC EDI-- Yup, reduce that value inEDIby one. Why? I know not... yet. -
JZ LAB_004050b0-- Ahhh here is why we decrementEDI-- at least part of the reason why. If The result ofDEC EDIproduces the result of 0, thenZFwill be set to 1 and we will jump to this particular label. Glancing at the label, it appears that the instructions within this label will take us BACK to the beginning of the aPLib "depacking" routine! Take a look:LAB_004050b0 XREF[1]: 004050a6(j) 004050b0 8b 7b 02 MOV EDI,dword ptr [EBX + 0x2] 004050b3 57 PUSH EDI 004050b4 83 c3 04 ADD EBX,0x4 LAB_004050b7 XREF[1]: 004050ae(j) 004050b7 43 INC EBX 004050b8 43 INC EBX 004050b9 e9 51 ff JMP LAB_0040500f ---> Back to depacking routine ff ffA few other points of interest in this label:
- We are grabbing 4-bytes at
[EBX + 0x2]and moving them intoEDI. This seems to confirm to me that there is something important about this data located atEBX. EBXis incremented by 6 bytes which makes sense -- the first two bytes were used previously then the next 4 bytes were moved intoEDIhere and placed on the stack. Thats a total of 6 bytes, so perhaps there are pairs of some kind of data here at this address? Perhaps a struct that contains aWORDfollowed by aDWORD?- It is possible to jump directly to the second label here, skipping the moving of data from
EBXintoEDIand incrementingEBXby 4, so perhaps some kind of terminating condition will lead us to the second label?
Whatever the case might be, It seems that there is more data that is being "depacked". Considering that the source for aPLib has this function signature as
aP_depack_asm(const void *source, void *destination), we can assume that the value being stored inEDIis likely becoming thedestinationand the value currently stored inEBXwill become thesourceonce it is pushed onto the stack at the beginning of the depacking routine.Regardless, we can't really know for sure unless we run the code -- for now, we aren't taking this branch so lets march on.
- We are grabbing 4-bytes at
-
DEC EDI; JZ LAB_004050be-- looks like similar logic to the previous instructions, decrementEDIand if the result is 0, jump to the given label. Taking a brief look at this label shows us the following:LAB_004050be XREF[1]: 004050a9(j) 004050be 5f POP EDI 004050bf bb 28 51 MOV EBX, PTR_LoadLibraryA_00405128 40 00 LAB_004050c4 XREF[1]: 004050d3(j) 004050c4 47 INC EDI 004050c5 8b 37 MOV ESI,dword ptr [EDI] 004050c7 af SCASD ES:EDI 004050c8 57 PUSH EDI 004050c9 ff 13 CALL dword ptr [EBX]=>KERNEL32.DLL::LoadLibraryA 004050cb 95 XCHG EAX,EBP ... LAB_004050e8 XREF[1]: 004050dd(j) 004050e8 55 PUSH EBP 004050e9 ff 53 04 CALL dword ptr [EBX + 0x4]=>KERNEL32.DLL::GetProcAd 004050ec 09 06 OR dword ptr [ESI],EAX 004050ee ad LODSD ESI 004050ef 75 db JNZ LAB_004050cc 004050f1 8b ec MOV EBP,ESP 004050f3 c3 RETWOW! A lot to unpack here, but we are not going to do that right now. I put this code snippet here because it appears that taking this branch leads us straight to all the logic that handles calling
LoadLibraryAandGetProcAddress! Quickly scanning the XREFs shows me that once you are in this branch, there is no exitting this part of the code -- it runs until it completes and hits theretinstruction at the end. As much as I want to get ahead of myself and start getting into this part of the binary, I'm going to go back. -
SHL EDI,0xc; JMP LAB_004050b7-- ShiftEDIleft by0xc(12) bits, then jump to the label we discussed previously.
Based on this logic, it looks like we will NOT take the first two branches until the specific condition of DEC EDI produces 0 at some point. This suggests to me that the depacking routine will be run a handful of times.
Running this code in x32dbg gave me a lot more context to what is happening with this code but I am unsure the best way to explain what I have found. Here is what appears to be happening:
-
Two bytes are grabbed from
0x004001D0(EBX) and stored inEDI -
That value is decremented by 2, so, as an example,
0x0404becomes0x0402 -
EDIis shifted left by 12 bits. This means values such as0x0402will turn into values like0x00402000-- hmmmmm looks VERY much like an address -
Increments
EBXby 2 (so if it was0x004001D0, it is now0x004001D2) -
Goes to the top of the depacking, using this address in
EDIas the destination. (Perhaps building out the different sections??) and depacks into the address. -
This repeats, grabbing the next two bytes, determining the destination address, depacking, etc. until the value that gets placed into
EDIis0x0001. This will trigger the first branch. -
At this point, the
DWORDvalue atebx + 2is grabbed and stored inEDI. In this case, that value is0x00402162and that is pushed onto the stack. -
EBXis incremented by 6 -
When the depacker routine runs THIS time, using the new
EDI, This is what gets depacked:00402162 01 48 20 40 00 6F 6C 65 33 32 2E 64 6C 6C 00 52 .H @.ole32.dll.R 00402172 6C 65 49 6E 69 74 69 61 6C 69 7A 65 00 46 6F 43 leInitialize.FoC 00402182 72 65 61 74 65 49 6E 73 74 61 6E 63 65 00 52 6C reateInstance.Rl 00402192 65 55 6E 69 6E 69 74 69 61 6C 69 7A 65 00 01 38 eUninitialize..8 004021A2 20 40 00 4F 4C 45 41 55 54 33 32 2E 64 6C 6C 00 @.OLEAUT32.dll. 004021B2 02 08 00 00 00 00 02 02 00 00 00 00 02 06 00 00 ................ 004021C2 00 00 01 00 20 40 00 4D 53 56 43 52 54 2E 64 6C .... @.MSVCRT.dl 004021D2 6C 00 62 5F 67 65 74 6D 61 69 6E 61 72 67 73 00 l.b_getmainargs. 004021E2 62 63 6F 6E 74 72 6F 6C 66 70 00 62 65 78 63 65 bcontrolfp.bexce 004021F2 70 74 5F 68 61 6E 64 6C 65 72 33 00 62 5F 73 65 pt_handler3.b_se 00402202 74 5F 61 70 70 5F 74 79 70 65 00 62 5F 70 5F 5F t_app_type.b_p__ 00402212 66 6D 6F 64 65 00 62 5F 70 5F 5F 63 6F 6D 6D 6F fmode.b_p__commo 00402222 64 65 00 62 65 78 69 74 00 62 58 63 70 74 46 69 de.bexit.bXcptFi 00402232 6C 74 65 72 00 68 78 69 74 00 62 5F 70 5F 5F 5F lter.hxit.b_p___ 00402242 69 6E 69 74 65 6E 76 00 62 69 6E 69 74 74 65 72 initenv.binitter 00402252 6D 00 62 5F 73 65 74 75 73 65 72 6D 61 74 68 65 m.b_setusermathe 00402262 72 72 00 62 61 64 6A 75 73 74 5F 66 64 69 76 00 rr.badjust_fdiv. 00402272 03 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................Looks like some kind of Import information -- we'll keep our eye on this. This information to me seems to suggest that it will be working with COM (ole32.dll) and possible automation and control of other services (OLEAUT32.dll).
-
The depacker hits this loop again, grabbing bytes from
EBXand placing the value0x0002intoEDI. You might be able to guess what this means... -
EDIhits the firstDECinstruction, but since the resulting value is0x01, we do not take theJZbranch. The nextDECinstruction takes us to the value0x00which means that... -
We hit the
JZthat jumps us to the instructions that handleLoadLibraryAandGetProcAddresscalls.
Awesome! We have broken down this chunk of instructions. After working through all of this, it would seem that some of my initial thoughts were proven correct and some were proven wrong.
-
It does seem like the data stored at
0x004001D0is in fact some kind of header/struct used by FSG to give it the information it needs to unpack the rest of the binary, so I'll take that as a win for making that assumption earlier. -
I was wrong about the labels
LAB_004050b0andLAB_004050b7-- I hypothesized that the first label was the code that would be run normally and that there may be 6-byte sized structs located at0x004001D0and that some terminating condition would get us toLAB_004050b7, but really I had it backward! The special condition whereEDIcontains0x01and then the decrement makes it hit0takes us toLAB_004050b0and the terminating condition is actually havingEDIcontain0x02so that we can jump to the labelLAB_004050bethat contains the next logical portion of the code -- theLoadLibraryAandGetProcAddressloop.
Lets take a look at the last portion of instructions:
LAB_004050be XREF[1]: 004050a9(j)
004050be 5f POP EDI
004050bf bb 28 51 MOV EBX,PTR_LoadLibraryA_00405128
40 00
LAB_004050c4 XREF[1]: 004050d3(j)
004050c4 47 INC EDI
004050c5 8b 37 MOV ESI,dword ptr [EDI]
004050c7 af SCASD ES:EDI
004050c8 57 PUSH EDI
004050c9 ff 13 CALL dword ptr [EBX]=>KERNEL32.DLL::LoadLibraryA
004050cb 95 XCHG EAX,EBP
LAB_004050cc XREF[1]: 004050ef(j)
004050cc 33 c0 XOR EAX,EAX
LAB_004050ce XREF[1]: 004050cf(j)
004050ce ae SCASB ES:EDI
004050cf 75 fd JNZ LAB_004050ce
004050d1 fe 0f DEC byte ptr [EDI]
004050d3 74 ef JZ LAB_004050c4
004050d5 fe 0f DEC byte ptr [EDI]
004050d7 75 06 JNZ LAB_004050df
004050d9 47 INC EDI
004050da ff 37 PUSH dword ptr [EDI]
004050dc af SCASD ES:EDI
004050dd eb 09 JMP LAB_004050e8
LAB_004050df XREF[1]: 004050d7(j)
004050df fe 0f DEC byte ptr [EDI]
004050e1 0f 84 a9 JZ LAB_00401090
bf ff ff
004050e7 57 PUSH EDI
LAB_004050e8 XREF[1]: 004050dd(j)
004050e8 55 PUSH EBP
004050e9 ff 53 04 CALL dword ptr [EBX + 0x4]=>KERNEL32.DLL::GetProcAd
004050ec 09 06 OR dword ptr [ESI],EAX
004050ee ad LODSD ESI
004050ef 75 db JNZ LAB_004050cc
004050f1 8b ec MOV EBP,ESP
004050f3 c3 RET
The end is in sight! Let's take it from the top.
-
POP EDI-- When we hit this instruction the value at the top of the stack is0x00402162. That address looks familiar... remember what was unpacked there? The data that looked like imports!EDIwill contain this address. -
MOV EBX, PTR_LoadLibraryA_00405128-- The address for LoadLibraryA is contained at this pointer.EBXwill contain0x00405128. Looking at that address, we find the bytes40 0E E2 75. Since this is little endian, that'll be0x75E20E40which resolves to -- you guessed it --kernel32.LoadLibraryA. -
INC EDI; MOV ESI,dword ptr [EDI]-- Take a look at the data located at0x00402162. IncrementingEDIwill moveEDIto the next byte (0x00402163) and then the next instruction will grab the 4-byte value stored inEDIand place it intoESI.ESIwill then have the value0x00402048. This looks like an address to me! -
SCASD ES:EDI; PUSH EDI; CALL dword ptr [EBX]=>KERNEL32.DLL::LoadLibraryA-- The instructionSCASDessentially runsCMP EAX, EDIand then incrementsEDIby 4 bytes. This means followingSCASD,EDIwill contain0x00402167-- This is the start of the stringole32.dll. This address is then pushed onto the stack andLoadLibraryAis then called. For reference, the function signature forLoadLibraryAis:HMODULE LoadLibraryA([in] LPCSTR lpLibFileName);LoadLibraryAreturns the handle to the module on success or aNULLon failure inside ofEAX.So tl;dr: we are loading libraries. :)
-
XCHG EAX,EBP; XOR EAX,EAX; SCASB; JNZ LAB_004050ce-- After callingLoadLibraryA, we will store the return intoEBP, then clearEAX. This preparesEAXfor the next instruction,SCASB.SCASBwill go byte by byte over the string at EDI (remember that it currently points to the stringole32.dll) and compare the byte to the value inEAX. If the bytes do not equal each other, this will jump back to theSCASBinstruction and keep moving until it is equal. In this case, sinceEAXcontains0x00, this little loop is looking for a 0 byte -- its looking for the end of the string! -
The next bit requires a bigger picture to understand. Check this out:
DEC byte ptr [EDI] JZ LAB_004050c4 DEC byte ptr [EDI] JNZ LAB_004050df INC EDI PUSH dword ptr [EDI] SCASD ES:EDI JMP LAB_004050e8 DEC byte ptr [EDI] JZ LAB_00401090 PUSH EDI PUSH EBP CALL dword ptr [EBX + 0x4]=>KERNEL32.DLL::GetProcAddress OR dword ptr [ESI],EAX LODSD ESI JNZ LAB_004050cc MOV EBP,ESP RETWhat in the world is happening here??? Interestingly enough, there is a little bit of obfuscation going on with the imported function names! Check it out:
RleInitialize FoCreateInstance RleUninitializeIf you look up these function names, they don't exist!, but you know what does exist?
OleInitialize CoCreateInstance OleUninitializeThe first letter of the imported function names has been shifted up by 3.
The first time this logic is hit
EDIpoints to the stringRleInitialize.DEC byte ptr [EDI]will change this string toQleInitialize. It performs a check -- if the result of this operation produced a 0, it would then jump back up to restart theLoadLibraryAloop. Since that is not the case it keeps going and executesDECagain, producing the stringPleInitialize.The next check will jump if the result is NOT zero. Of course the character
Pis not0x00, so the result is not zero. We jump to0x004050dfwhich is ANOTHERDEC byte ptr [EDI]. This produces the stringOleInitialize-- the string it actually needs!With that string, it then moves on to setting up the stack for the call to
GetProcAddress. The signature is:FARPROC GetProcAddress([in] HMODULE hModule, [in] LPCSTR lpProcName);Remember that
EBPcontains the handle to the module andEDIis the address of the string containing the name of the function the code wants to use.GetProcAddressgets called, and then upon return, performs anORbetween the 4 bytes at[ESI]andEAX(the result ofGetProcAddress). The 4 bytes located at the address contained inESIare then placed intoEAXby theLODSDinstruction andESIis incremented by 4 bytes.With regards to the purpose of these instructions, the first time we hit these instructions
ESIcontains the address00402048which points to 4-bytes that contain no data. It would seem thatGetProcAddressis being run, and then the result is being stored here. SinceESIis incrementing with each passthrough, I believe that this is meant to be an array of function addresses.Note if the result of the
ORinstruction from earlier is not zero, then theJNZinstruction will loop us back to theLoadLibraryAfunctionality. If the result of theORis zero, then we willRETwhich should cause the application to close (if I am looking at it correctly). This to me seems like an error catching condition -- if this process gets messed up somehow, then exit.This process will continue to repeat until all the libraries have been loaded and the function addresses have been retrieved. Once this happens, there is a terminating entry in the list of functions to import that contains only the value
0x03. Remember how the logic works -- we decrement[EDI]3 times to get the correct first letter for each of the function names. So what happens if this first byte is0x03? Well the result will be0x00after this decrements occur. This means we will hit the final conditional that we haven't taken --JZ LAB_00401090. Where the heck is that??? Let's take a look at whats there:... 00401090 | 55 | push ebp | 00401091 | 8BEC | mov ebp,esp | 00401093 | 6A FF | push FFFFFFFF | 00401095 | 68 78204000 | push 402078 | 0040109A | 68 D0114000 | push <JMP.&_except_handler3> | 0040109F | 64:A1 00000000 | mov eax,dword ptr fs:[0] | 004010A5 | 50 | push eax | 004010A6 | 64:8925 00000000 | mov dword ptr fs:[0],esp | 004010AD | 83EC 20 | sub esp,20 | 004010B0 | 53 | push ebx | 004010B1 | 56 | push esi | 004010B2 | 57 | push edi | 004010B3 | 8965 E8 | mov dword ptr ss:[ebp-18],esp | 004010B6 | 8365 FC 00 | and dword ptr ss:[ebp-4],0 | 004010BA | 6A 01 | push 1 | 004010BC | FF15 0C204000 | call dword ptr ds:[<&__set_app_type>] | ...Look at that, it looks like the start of a real function! It even looks like we see some exception handling setup and we can see that the call to __set_app_type is setting this to a console app. This is code that you'd likely see run prior to
mainbeing executed in your code. For reference, here is the start of another binary's entry point:00401820 | 55 | push ebp | 00401821 | 8BEC | mov ebp,esp | 00401823 | 6A FF | push FFFFFFFF | 00401825 | 68 70204000 | push 402070 | 0040182A | 68 60194000 | push <JMP.&_except_handler3> | 0040182F | 64:A1 00000000 | mov eax,dword ptr fs:[0] | 00401835 | 50 | push eax | 00401836 | 64:8925 00000000 | mov dword ptr fs:[0],esp | 0040183D | 83EC 20 | sub esp,20 | 00401840 | 53 | push ebx | 00401841 | 56 | push esi | 00401842 | 57 | push edi | 00401843 | 8965 E8 | mov dword ptr ss:[ebp-18],esp | 00401846 | 8365 FC 00 | and dword ptr ss:[ebp-4],0 | 0040184A | 6A 01 | push 1 | 0040184C | FF15 58204000 | call dword ptr ds:[<__set_app_type>] |They are essentially identical. Pretty cool!
I'd say it is safe to assume at this point that we have found the actual entry point of our packed binary! Now for the final part -- determining how to unpack this statically without running it and making the tool to do so.
So what parts of this functionality are important for us to understand so that we can properly unpack the binary statically? Lets review the important overall steps of the FSG depacking stub:
-
Use aPLib to unpack the different data chunks. Note that all destinations after the first are derived from some kind of struct or header located at
4001D0:-
First destination (
EDI) is401000, first source is404000- This the
.textsection
- This the
-
Second
EDIis402000, secondESIis404156-
402000is derived by getting the two bytes at4001d0and then subtracting 2 and shifting left 12 bits -
402000-402053contain addresses to imported functions. I believe this is the.rdatasection! -
402058 - 402083contains data that, after some further debugging and setting breakpoints on accessing the data, determined that these are variables used for various function calls and operations throughout the code. This is still a continuation of the.rdatasection.
-
-
Third
EDIis403000, thirdESIis40417F-- This contains string resources -- I imagine this is the.dataor.rsrcsection. -
Fourth
EDIis402162, fourthESIis4041BF-- I believe this is the rest of the.rdatasection, as it contains the rest of the imports information essentially just as you would see it in a normal.rdatasection.
To get more clarity on this, I threw the binary into the classic PEview tool. There are three
IMAGE_SECTION_HEADERs. The sections are as follows:-
00401000 - 00403FFF- Based on what we found, this is where the main program will be after it is fully unpacked. Interestingly enough, the section header marks this as having a "size of raw data" of 0. This is one of those indicators that would have told us that the binary is likely packed (if we didn't already know). -
00404000 - 00404FFF- This is the section that is used to store all the packed data. In the binary file on disk, this raw data is located at0x1000 -
00405000 - 00405FFF- This is the section that contains the FSG depacking stub. In the binary file on disk, this raw data is located at0xE00.- Just a note from my future self -- after working on this project further,
Note From My Future Self
I don't know why I didn't consider looking into it during this stage of my reversing, but how does the decompression algorithm know when to stop for each of these source/destination combos? Easy enough, take a look at an aPLib compressed binary and look at the section that contains the compressed/packed data. You might notice something -- the lack of
0x00bytes. I'm not entirely sure what role null bytes play in the algorithm as a whole, but I noticed that seeing two null bytes (00 00) or a null byte followed by a 1 (00 01) were always the delimiters between the sections.On another note, the depacking "header" seems to ALWAYS come after the
IMAGE_SECTION_HEADERs. Every FSG 1.0 packed binary I have seen only has 3IMAGE_SECTION_HEADERs. The DOS, DOS stub, and NT headers combined seem to end at0x157, and theIMAGE_SECTIONS_HEADERs end at0x1CF, which places the depacking "header" at0x1D0pretty consistently (as in I have never seen a case where this is not true, but doesn't mean that it is ALWAYS true). It seems the packer is taking advantage of the empty space at the end of the section mapped for the PE header.In a similar vein, I noticed in every binary that I looked at that the packed data section always immediately follows the section mapped for the unpacked data. I also noticed that the depacking stub seems to always immediately follow the packed data section, so determining the location of the "header", packed data, and the stub can all be derived by using just the PE header of the packed binary.
Note that these observations are for the binary while in memory (while it is being run). When it comes to being on disk, the data is layed out with the PE header, then depacking stub, then the packed data.
-
-
Once unpacked, the depacker iterates over the list of imports located at
402162, callingLoadLibraryAon all the DLLs, demangling the import names, and callingGetProcAddresson the functions we are interested in. Note that these functions passed toGetProcAddresscould be string names or ordinals. -
Addressess retrieved using
GetProcAddressare stored at00402000in groupings based on DLL. These groupings are delimited by a 4-byte null terminator. -
Once all function addresses have been retrieved, the depacking stub jumps to the OEP.
With this, we have what we need to get started on a basic static depacker.
After some digging, I was able to find an original fsg.exe binary -- maybe loaded with malware but who knows 🤷.
Either way, I created two quick sample applications that I can use to test the fsg packer. These are simple applications with no optimizations.
The first is a basic server that accepts connections, reports the connection, and then disconnects.
The second is an application that will spawn a command prompt terminal window.
I have specifically made a .def file for sample_02 so that I could import by ordinal rather than name and make sure I can catch that kind of functionality in the unpacker. I also removed any dependency on mvscrt or other libraries -- the binary only imports kernel32.dll and the necessary functions it needs to run.
Lets pack them and check results!
The original sample_01.c file, when compiled on my machine, produces ~115KB binary. After running fsg on it, the binary is now ~61KB in size -- Wow! That is nearly a 50% compression rate right there. Impressive.
I had very similar results with sample_02.c.
When I attempted to run both binaries, I found that they were crashing! Opening up the binaries in x32dbg showed me that the entry point of the depacking stub is at 0042E000 in both binaries. Trying to run the depacking stub produces an EXCEPTION_ACCESS_VIOLATION on the very first instruction. Why?
Checking the logs, this is what I get:
Invalid relocation block for module sample_01.exe!
Breakpoint at 0042E000 (entry breakpoint) set!
DLL Loaded: 77310000 C:\Windows\SysWOW64\ntdll.dll
DLL Loaded: 74F70000 C:\Windows\SysWOW64\kernel32.dll
DLL Loaded: 75150000 C:\Windows\SysWOW64\KernelBase.dll
System breakpoint reached!
EXCEPTION_DEBUG_INFO:
dwFirstChance: 1
ExceptionCode: C0000005 (EXCEPTION_ACCESS_VIOLATION)
ExceptionFlags: 00000000
ExceptionAddress: <sample_01.OptionalHeader.AddressOfEntryPoint> (0042E000)
NumberParameters: 2
ExceptionInformation[00]: 00000008 DEP Violation
ExceptionInformation[01]: <sample_01.OptionalHeader.AddressOfEntryPoint> (0042E000) Inaccessible Address
First chance exception on 0042E000 (C0000005, EXCEPTION_ACCESS_VIOLATION)!
Looks like we have a few problems here. First of all is the relocation block being invalid! I imagine that has something to do with fsg itself since the binary functions properly prior to being compressed. Lets get rid of that by adding the /FIXED flag to the linker when building. Now we get:
Breakpoint at 0042E000 (entry breakpoint) set!
DLL Loaded: 77310000 C:\Windows\SysWOW64\ntdll.dll
DLL Loaded: 74F70000 C:\Windows\SysWOW64\kernel32.dll
DLL Loaded: 75150000 C:\Windows\SysWOW64\KernelBase.dll
System breakpoint reached!
EXCEPTION_DEBUG_INFO:
dwFirstChance: 1
ExceptionCode: C0000005 (EXCEPTION_ACCESS_VIOLATION)
ExceptionFlags: 00000000
ExceptionAddress: <sample_01.OptionalHeader.AddressOfEntryPoint> (0042E000)
NumberParameters: 2
ExceptionInformation[00]: 00000008 DEP Violation
ExceptionInformation[01]: <sample_01.OptionalHeader.AddressOfEntryPoint> (0042E000) Inaccessible Address
First chance exception on 0042E000 (C0000005, EXCEPTION_ACCESS_VIOLATION)!
That fix got rid of the relocation block error, but we are still left with the DEP problem. To fix this, lets add the NXCOMPAT:NO flag to the build and try again. After doing that we get....
Breakpoint at 0042E000 (entry breakpoint) set!
DLL Loaded: 77310000 C:\Windows\SysWOW64\ntdll.dll
DLL Loaded: 74F70000 C:\Windows\SysWOW64\kernel32.dll
DLL Loaded: 75150000 C:\Windows\SysWOW64\KernelBase.dll
System breakpoint reached!
INT3 breakpoint "entry breakpoint" at <sample_01.OptionalHeader.AddressOfEntryPoint> (0042E000)!
DLL Loaded: 76AE0000 C:\Windows\SysWOW64\ws2_32.dll
DLL Loaded: 75B20000 C:\Windows\SysWOW64\rpcrt4.dll
Thread 1264 created, Entry: ntdll.7734EF90, Parameter: 00583B90
DLL Loaded: 70270000 C:\Windows\SysWOW64\mswsock.dll
Thread 1264 exit
Process stopped with exit code 0x0 (0)
Lets go! It worked! We now have two test binaries we can examine. Lets look at sample_02.exe first and see if the pattern for the fsg depacker is as we described it above.
Everything looks the same except for some changes in the addresses. This means I can't bank on addresses like 00405000 being the entry point of the depacking stub and 00401090 being the OEP -- I'll have to get the entry point from the PE header and then locate the jmp that takes us to the OEP. Thankfully it appears that all the instructions for the depacking logic is still the same other than the swapped out addresses.
On to making our static depacker!
So... how do we do this? I am going to be using C. Based on the overall important steps performed by the depacking stub, here is how I think I can approach this problem:
-
Map a view of the target executable file
-
Get all the information we need from that file such as NT header information and section table information that will help is in the unpacking of the binary
-
Create a new output file, map a view of it, and retrieve all information we need for it.
-
Find the FSG "header" (I'm not sure what else I would call it) and gather the information I need from it to unpack the binary.
-
Create the PE header, starting with DOS then NT
-
Create the section table and unpack the sections using aPLib.
-
We're done! At least I think...
I've run into a few issues as I have been implementing this though.
First, specifics. My original reversing got me through a somewhat specific but also generalized description of what the depacking stub did to unpack the binary.
Ends up that generalizations do not work too well in code. Who would have thought?
When I mean specifics, I mean SPECIFICS. We are rebuilding headers and section tables from scratch. Where is the raw data located in the binary? What is the RVA of all the different section? The Virtual size? The raw size? EVERYTHING. This alone isn't too bad, but add in that we actually don't know a lot of this informationbefore hand because all the sections are packed together in a big blob of data and then we start to have problems. Why? Well, to start...
-
We actually don't know how big each section is going to be until it has been decompressed. This means we can't reliably determine where to place all the sections in the binary or fixup the section headers until the data has been decompressed. For example, without knowing how big the
.textsection is, we don't know where we need to put the.rdatasection and without the.rdatasection, we don't know where to put the.data. This also means we need information from the first section to reliably place the second, and information from the second section to reliably place the third, and so on. -
Second, given the blob of compressed data, we don't know where one section of compressed data ends and the other begins. You can call
aP_depack_asmon the compressed data to decompress it. This will do its thing, unpacking the data from thesourceinto thedestinationand then return the size of the decompressed data. Nice! Lets depack the next section! Oh wait... we can't because we don't know where the start of the next compressed section is. The depacking routine doesn't give us this information. InterestinglyaP_depack_asmkeeps track of the current source byte location usingESIthroughout the entirety of the function running. I thought maybe we could snag it after the function ends, but the register gets wiped out at the end of the function during thepopadinstruction. Without this information, we would have to manually parse the packed data until we found the end of the current section/start of the next section so we can start actually depacking that section. I've got a work around for this though I'll talk about later.
There are some good things though! We can make some assumptions that will help us with the output binary such as:
-
All FSG 1.0 packed binaries I have seen only have 3 sections regardless of how many sections the original binary had. Why is this important? We don't need to worry about decompressing anything other than these 3 sections and since this is the case, we know how big our resultant binary's section header table will be.
-
Thankfully, section data typically falls directly after the section headers so this means if we know how big the section header table is (which we do due to it only ever containing three entries), we have a default location to put the decompressed data for the first section -- AKA the
.textsection. The alignment of the different sections of data in the binary is0x200bytes by default so we if our section header table ends around0x230 - 0x250, we can put the text section at0x400in the binary and be pretty certain that we will not accidentally overwrite something important. -
Oh yeah, aPLib is somewhat open-source. We don't have the packing code, but we definitely have the depacking code from
depack.asm. With a little finagling, we can convert the FASM code to something that a MSVCdeclspec(naked) __asmblock will be able to run. Not only that, we can add a few of our own instructions to make it returnESIto us which will help relieve a problem I mentioned earlier about not knowing where the end of one block and the start of the next is. One more addition I made here was allowing the functionality to run without actually doing the decompression, meaning we can get this information from the function without needing to actually decompress the data. Doing this also comes with the benefit of not needing to link aplib.lib or include aplib.h -- we can just add the ASM to the header file for this code.
After ALL of this work, something is still preventing the unpacked binary from running. I am assuming that I messed up during the unpacking and patching of the section header tables and/or the PE header.
The error I am getting is "The unpacked.exe application cannot be run in Win32 mode." PEView shows that I have definitely goofed up. For some reason, the IMPORT Name Table and Address Table are showing up before the DOS stub despite the addresses being correct in the rdata section header. Also, the IMPORT directory table in the rdata section is displaying wonky information. I definitely did something not quite right while building the executable.
TO BE CONTINUED