Many security products monitor process trees very carefully to detect when for instance office applications spawn Powershell, cmd or other suspicious subprocesses. But is that enough?
Still many organisations are unable to deactivate macros in office documents as they are still widely used. Hence they introduce compensating controls to detect malicious macros as soon as they execute. Many security products mainly focus on process trees to detect when macros execute Powershell code or behave strangely in other ways. At first, that looks like a good idea and it actually is. But is it enough. So let’s see what it takes to get around pure process tree monitoring and how to still detect what’s going on.
I’m really bad in writing Powershell code so please bear with me. So my target was to write a simple Powershell backdoor that calls back to an even simpler php based C2 server. The reason I choose php for the C2 is, that you could run the C2 on any compromised shared webspace and don’t need to set up a full server.
So essentially every victim gets a unique id. As soon as the malware is running, it asks the C2 server for new commands every 4 seconds. If the command is anything other than “wait” it executes the command in cmd.exe. The results will then be transferred to the C2 server. Note that I used GET as method and transferred the results in a get parameter. I did to test a network based detection mechanism I’m working on. In a real attack I’d use POST requests as usually the full URI including GET parameters ends up in proxy logs while POST parameters in the body don’t.
Ok that part was easy. Now I needed a crappy backend to control the malware. It’s not beautiful but It get’s the job done.
So at that point it’s all about getting that into a production system that’s set up with all the usual security measures and more. Under the assumption, that you will always be able to get an employee to execute a macro as long as it’s not blocked, I created a fake word document. It looks something like that – you know the drill, right?
There are two things I want to prevent when my code fires:
I don’t want Word to spawn a sub process
I don’t want to load System.Management.Automation.dll with any process other than powershell.exe (Some monitor that as well)
Honouring the points above, I can load a WScript.Shell object but I can’t use it’s .Run or .Exec methods as that would spawn a cmd.exe process as child of the Word process. So what I did instead was creating a batch script loading my Powershell backdoor. I just placed that script in the startup folder. That will not give me a shell right away, but as soon as the user logs off and on again I’m in. So this is the macro I ended up using.
Today I had the pleasure of dissecting Shadow Hammer for together with our top malware analyst at InfoGuard(@InfoGuardAG) Stefan Rothenbuehler (@creative83).
ShadowHammer is a piece of malware that was distributed in a supply chain attack mimicking ASUS security updates. Once the malicious update explodes on the target system it loads various libraries it uses to determine the mac addresses of the machine. Only if the MAC address matches a set of predefined addresses it will actually load the next stage and infect the machine.
While some of the functionality is already well documented we feel like it would make sense to show how you can get the hashes out of the malware. Stefan and myself also want to make sure to do this writeup in an educational fashion. That might lead to some “overdocumenting” at some points. So skilled malware analysts out there, please bear with us.
Finding the malicious part
This sample as a little bit more complex as it contains benign code as well as the malware code.
When starting the analysis using static methods like Ghidra, the functionality of ShadowHammer is not immediately observable as it only executes once the benign part of the software update finished executing. The malware author placed the entry point of the malware just before the Process exit into the __crtExitProcess function. We tagged the function accordingly in win32dbg.
So how do we get there. Using the Symbols tab in win32dbg, we see that the file uses the VirtualAlloc and VirtualProtect function from kernel32.dll. Whenever we find those functions used by a binary we want to look at them more closely. Stefan pointed out a nice way to set a useful breakpoint at the end of the function. This way the eax register shows the target address of the memory allocation. When found jump into the code for Virtual Alloc and set a breakpoint right before the function returns. There are more of those but number 2 is the one you want to look at (just to save you some time).
If you start single stepping through the code from there entering all functions that are not API functions you will eventually end up at a place where you see two calls to GetAdaptersAddresses. The first of the two compares the eax register to 0x6f which is 111 in decimal. This return code would indicate a bufffer overflow. We believe that this fires, if the number of interfaces the machine maps is too high to fit the provided buffer.
The second call is followed by test eax, eax which is true when eaxis 0x0. This is the desired return coda as it indicates that we now got our interface structures successfully loaded into the buffer.
After that we are in a loop that creates the md5 sum of every interfaces mac address and stores it to the memory using memcpy. In my case when I dump the source for the memcpy operation the first 16 bytes should the md5 sum of one of my adapters.
The bytes I get are:
93 16 DF 71 02 90 78 CB 6E 33 9C BE 12 0B 1F 49
The mac address for my only adapter is:
Checking that using Cyberchef proves that we got the right data there.
So now that we see that the malware stores the md5 of our mac addresses, the question is, what does it compare it to?
To figure that we need to keep on stepping. I put a breakpoint shortly before my current function returns. and single-step from there. Eventually you will find a set of memcmp instructions in several loops. they compare the memory space filled with your adapter’s macs to the macs the attacker is looking for. It will compare 0x10 or 16 dec bytes in two memory locations. That’s the size of an md5 sum.
So argument one and two are eax and esi. They are on top of the stack so we can easily load them into the data dump window. esi contains our values and eax the value hardcoded in the malware.
In this post I’ll describe an approach on how to leverage Excel to dump dynamically created Shellcode from a Macro.
I’m always looking for new challenges for our team that they can solve in slow times. During my research I stumbled upon a nice sample in @0xffff0800 malware archive (Find the current link to the archive at 0day.coffee0). The sample itself was not that complex, getting the potential shellcode out required a technique I never used before. So let’s cut to the chase.
The sample is a Word document with a Macro. According to 0xffff0800 directory structure it’s out of Lazarus group’s tool chest (Wikipedia). The Thor APT scanner by BFK Consulting supports that assumption as it flags the Document with the yara rule “APT_MalDoc_SharpShooter_Lazarus_Campaign_Dec18_1“
That shows me a Macro that is slightly obfuscated. The first five declarations look interesting though.
Attribute VB_Name = "NewMacros"
Private Declare PtrSafe Function SharpShooter Lib "msvcrt" Alias "_beginthread" (ByVal StartAddress As LongPtr, StackSize As Long, ByVal ArgList As LongPtr) As Long
Private Declare PtrSafe Function efasdv Lib "kernel32" Alias "VirtualAlloc" (ByVal address As Long, ByVal size As Long, ByVal aloctype As Long, ByVal fprot As Long) As LongPtr
Private Declare PtrSafe Function gzsdfasd Lib "kernel32" Alias "RtlMoveMemory" (ByVal dest As LongPtr, ByRef src As Any, ByVal dlen As Long) As LongPtr
Private Declare PtrSafe Function ennfiaje Lib "kernel32" Alias "LoadLibraryA" (ByVal libname As String) As LongPtr
Private Declare PtrSafe Function dnnaigej Lib "kernel32" Alias "GetProcAddress" (ByVal module As LongPtr, ByVal pname As String) As LongPtr
The Macro seems to define some strange variable names for well known functions leveraged by malware, VirtualAllocA only being one of them. We also see that there is a 2 dimensional array called llsodiplo.
To make reading easier I went through the code and gave the variables and functions more meaningful names. The result below shows a clearer picture of what’s going on. You can also download the full deobfuscated code here and the original macro here.
As this blog post is not about detailing how that macro works exactly I’ll just point out some key points. So the Macro stores bytes in a 2d byte array which I called shellcode for now. It then allocates 3224 bytes using VirtualAlloc. The allocated sections carry the 0x40 protection flag which according to Microsoft’s documentation refers to PAGE_EXECUTE_READWRITE. So whatever the macro puts there will be executable. It then flattens out the 2d array into a 1d byte array using two nested loops. I called that variable binbuffer. Now comes the tricky part and the reason why I post about this sample. The macro in-memory replaces two sections of the flat bytearray with the memory addresses of LoadLibraryA and GetProcAddr. I assume the resulting in memory-code will use these library calls and needs the addresses of these functions. This gives the code the ability to address the library more easily even if ASLR is activated (which it usually is for Office Products). Unfortunately it makes our job more difficult as well. We can’t just dump the byte array and treat it as runnable shellcode as we will be missing the actual addresses of the mentioned functions. That requires us to use another frequently used technique to extract stuff – alter the code until it spits out whatever we need.
VBA ≠ VBS
There is one very important thing you need to know about Macros. VBA is not VBS. While you can run VBS code using cscript.exe, VBA code will not run. For this particular code it fails. VBA is exclusive for Microsoft applications and a few third party vendors who licensed VBA for their products, one of them being AutoCAD. For us, that means that our best bet is to use Word to execute the Macro. What I’m interested in is the exact byte array it loads int o memory in the last for loop.
For eIndex1 = 0 To size_count - 1
eValue = binbuffer(eIndex1)
Result = RtlMoveMemory(vAddress + eIndex1, eValue, 1)
So let’s fire up Excel and don’t allow the Macros to run. We first need to make running that save. Obviously everything from now on happens on a save Lab VM.
So before I allow macros to run I edit the code a little bit. Generally commenting out one line will be enough. Line 75 would execute the in-memory code. Sharpshooter was declared to be msvcrt._beginthread().
Private Declare PtrSafe Function SharpShooter Lib "msvcrt" Alias "_beginthread" (ByVal StartAddress As LongPtr, StackSize As Long, ByVal ArgList As LongPtr) As Long
LMCooperator = SharpShooter(vAddress, 0, 0)
So I just comment that line out. In addition to that I want to dump the resulting byte array to a file. The easiest way to do that for me was to use the StrConv Function and the Open command to open a file. The resulting code section looks something like this.
For eIndex1 = 0 To yefawfq - 1
eValue = grqwasf(eIndex1)
Result = gzsdfasd(vAddress + eIndex1, eValue, 1)
Open "C:\Users\abc\Desktop\shellcode.txt" For Output As #1
hexstr = StrConv(grqwasf, vbUnicode)
Print #1, hexstr
Dim LMCooperator As Long
'LMCooperator = SharpShooter(vAddress, 0, 0)
This is a fast and easy way to get the final version of the binary code out leveraging Word.
Another story on how you might discover new artifacts to help your investigation – MFT Carving.
It’ been some time since I wrote my last blog post. Like every year, the last quarter is very busy. Still I got something new I want to share. This week I have been teaching SANS FOR 508 with Francesco Picasso @dfirfpi in Paris. When Francesco talked about $MFT entries, I was curious where on a drive single $MFT entries or groups of MFT entries might end up other than the currently active $MFT. I briefly googled if there are solutions that support carving single, potentially corrupted $MFT entries and couldn’t find any. There are many solutions which parse a complete and active $MFT and solutions which carve files, but that’s not was I was looking for. So back in my room I started to do some research.
As the $MFT is literally filled up with timestamps I figured it might come handy to have a MFT-Carver-Parser that also handles half corrupted MFT entries. If you just want to get the tool I wrote to do that and not read the whole blogpost, feel free to download it at https://github.com/cyb3rfox/MFTEntryCarver/
As pointed out, I did not expect to find many entries in unallocated space, but I gave it a try anyhow. First of all, I dumped the unallocated space of a test Windows 7 image using sleuthkit’s blkls.
blkls image.ewf > unallocated.blocks
This produced a file around 7Gb big. To see if it would even make sense to start writing something, I just did a strings search on the file.
strings -a unallocated.blocks | grep FILE0 | wc -l
Note that basically FILE (\x46\x49\x4C\x45) is the header for an entry, but not using it will give you many strings that are part of textfiles and script. Still most not to say all entries start like FILE0 (\x46\x49\x4C\x45\x30). \0x30 declares the offset to the fixup and that is usually the same for many of the files. And as it represents 0 in ASCII it’s easy to grep. Anyway, the above command leads to the following output.
So it looks like we have 5026 potential $MFT entries. Next, I wanted to understand, how those artifacts are distributed across the dump file. Hence, I searched for the magic bytes again, only this time, I also got the offset of the hits.
strings -a -t d unallocated.blocks | grep FILE0
I then grouped the hit offsets into a manageable number of buckets and plotted a histogram.
The histogram shows, that not all of those entries are clustered in one place but in at least 3 separate locations (there are some less populated sections that don’t show on the graph).
That implies, that the entries might be coming from different sources. Interesting enough for me to start writing a little tool. The $MFT is very well documented. I used two resources to understand what I needed to do.
I guess this is the time to point out, that I’m obviously not a full time coder and python is quite new to me. If you don’t believe me, look at the code and you see what I mean. So please excuse my spaghetti code. In the end, it works and is even reasonably fast.
I wanted the tool to be able to do the following things:
Find potential file entries
parse as many of their $FN attributes (long and short) as possible
parse $STDInfo and $FN Timestamps
For resident $data attributes, recover the data as well
still work if half of the information is corrupted
output all in csv format
So, first of all, I needed to find all potential $MFT entries. As the input files for this kind of tool can get quite big, using methods that need to put the whole file into memory might fail. In the end, I decided to use the python mmap library. It’s fast and you can open really big files and move a pointer over the data. Also searching for hex patterns is supported.
The pattern search for FILE (\x46\x49\x4C\x45) returned 54583 potential entries. So how do I decide which are legitimate ones and which are false positives? My approach was to try and parse further attributes and sanity check the results as good as possible. So, for example, I use size checks a lot. All attributes in $MFT entries store their size. I parse that and if the parsed value is smaller than the minimum size of the attribute or bigger than the size of the whole entry, the bytes I parsed are probably not part of a real $MFT entry. I don’t want to go into that too deeply, please look at the code if you want to understand my approach. Suggestions and contributions are highly welcome.
So after a couple of hours of work, I get to the following results.
python MFTEntryCarver.py -s unallocated.blocks
So essentially. MFTEntryCarver.py will give you all the artifacts mentioned above if it can find them. It only keeps on parsing when it finds at least one valid $FN attribute. If certain artifacts are not there, it will put in “corrupted” in the respective field. Below are the statistics the tools shows after processing the test dataset.
The good news here, in the end, there were not only 5026 entries to parse but a total of 54583. Out of those MFTEntryCarver.py recovered 14975 entries. So let’s take a look at one of the entries.
So, in this case, we found a .URL file called “Get Windows Live.url” and the data seems to resident. Some timestamps couldn’t be parsed, MFTEntryCarver.py still gets you as much as it can. I usually transform and analyze raw hex data using CyberChef. Throwing the hex contents of the resident data attribute at it produces readable results (at least for ascii characters).
I’ll try the tool some more in investigations, but it looks promising. offers a viable way to get older $MFT entries, including timestamps. Artifacts like these usually help an investigation when the attacker cleaned up and/or is long gone. If you use it, please give some feedback. You can easily reach out to me on twitter @mathias_fuchs. You can get the code at https://github.com/cyb3rfox/MFTEntryCarver
As people quite frequently ask me how I triage potentially malicious Microsoft Office documents, I decided to run through a quick analysis here.
Our specimen for that tutorial is a word document out of the malware collection published by @0xffff0800 on http://iec56w4ibovnb4wc.onion (URL might change. Check current address at 0day.coffee). @0xffff0800 attributes the file to an Iranian Threat actor dubbed APT34 by Mandiant/FireEye. You can download the file directly from the repository or Virus Total (https://www.virustotal.com/#/file/db53b4157868fffd0331c1498e2209c11499b14f5aa980fe4fb3453858ed90b5/detection)
As you can see, they didn’t really care about crafting a nicer fake document. I trust, your red-team does better than that.
Triage vs. Full Analysis
The main goal of a triage is to allow a medium experienced Forensic analyst who probably has no background in malware reverse engineering to figure out if a document is malicious and even get some IOCs out of it. The approach I’m suggesting here is a low-risk approach as it works completely without opening the file in Microsoft Word or executing any PowerShell code. For the sake of completeness, I’ll give some hints on how a Malware Analyst could continue dissecting the malware.
We are looking at an old Word Document format as indicated by the .doc suffix. Those documents are stored in the OLECF file format (further reading). Oledump is a nice tool written in python that allows you to extract various streams contained in the olecf file. So let’s take a closer look at the file and see if it is even malicious.
This will scan the file for all subelements in the compound file. For the given specimen it shows the following output. Note the stream number 7. This is the only stream that contains a macro.
Oledump offers a fast and easy way to extract individual streams. As macros are usually compressed, we need the -v flag as well to decompress the content.
Looking at the macro, what we see are typical Powershell parameters and a lot base64 encoded sections. Let’s dump the macro in a separate file to look at it more closely.
oledump.py -s7 -v MagicHoundAPT34.doc > macro.txt
2.) Revealing the actual code
Looking at the macro in the Texteditor of your choice (I use Sublime), preferably one that supports syntax highlighting we see that the main payload seems to be PowerShell based, and the vb macro only executes PowerShell and shows an error message.
So apparently we need to decode the base64 Powershell source next. You probably realized that it is not a single string block, but multiple concatenated string blocks. So we need to clean that up a bit and then decode it into a new file called base64.txt
base64 -D base64.txt > powershell1.txt
base64 -d base64.txt > powershell1.txt
That gives us an interesting piece of PowerShell code. At first, we see, that there seems to be some additional PowerShell code in a string variable called $G8t. at the end of the file, that same Powershell code gets base64 encoded again and then executed with the 32-bit version of PowerShell (Note the location of powershell.exe in a subfolder of syswow64). A very common reason to use 32-bit binaries is when the attacker happens to have 32-bit shellcode he wants to execute. This code wouldn’t run using 64-bit binaries. So let’s look for some shellcode. I’m sure you spotted it already. The variable called $z is an array of byte values. If you look at the code more closely you see, that the attacker leverages the memset function to write the shellcode bytewise into memory he allocated using VirtualAlloc. The malware seems to be flexible when it loads shellcode. Normally it allocates $g bytes which would be 1000 bytes. If the shellcode us longer it changes $g to reflect the actual size of the shellcode.
For an incident responder with no malware analysis background, that would be the right moment to hand the sample over to a malware analyst.
3.) Extracting and analysing the shellcode
If you don’t know what shellcode is, Wikipedia has an easy to read article on that topic. So lets get out the shellcode in hex first. It looks like this.
I want to use CyberChef to create a binary file out of the shellcode. For that to work, I either need to get rid of all the 0x or the commas as CyberChef only accepts one separator. I’ll get rid of the commas and paste it into CyberChef’s input window. Selecting the “From HEX” recipe gives me some gibberish characters in the output window. That’s exactly how machine code is supposed to look like in ASCII. So let’s save that to a file by clicking on the save icon. I choose shellcode.bin.
So now that we got the shellcode, what do we do next? Shellcode is not a complete binary. It usually uses functions provided by the operating system, in that case, windows to do whatever it needs to do. We have two options now. We do have way more than one option to run it anyhow. For a first glance, I’ll use a tool called scdbg. It allows us to run shellcode without first putting it into a complete windows executable. The downside of this is, that we can’t really debug it from there. One note, if you do the same thing, be aware that scdbg actually executes the code. This can definitely harm your system. So let’s see if we can get it to run.
Ok, so now we know a bit more about the shellcode. It seems to leverage a WSASocket to open a connection to a local IP address on port 5555. This IP is not in the subnet of the analysis workstation so there is no way it would get a response. So wouldn’t it be nice if we could look at what it is trying to do there more closely? I guess it is worth a try. There is a nice little tool written by Adam Kramer that can help us out. It is called jmp2it. It allows us to debug the shellcode. But that’s already way beyond simple malware triage for Incident Responders. I’ll put up a separate blog entry on how to proceed with that sample as soon as I have time. So happy hunting and have a great weekend.