Mathias Fuchs alias CyberFox blogging about DFIR and Cyber Security.

Quick Office Document Triage

As people quite frequently ask me how I triage potentially malicious Microsoft Office documents, I decided to run through a quick analysis here. 

Our specimen for that tutorial is a word document out of the malware collection published by @0xffff0800 on http://iec56w4ibovnb4wc.onion (URL might change. Check current address at attributes the file to an Iranian Threat actor dubbed APT34 by Mandiant/FireEye. You can download the file  directly from the repository or Virus Total (

Specimen Details

Size39’936 b
This is how the file looks like when you open it in Word (Don’t do it, but if you really have to don’t enable Macros 😜)

As you can see, they didn’t really care about crafting a nicer fake document. I trust, your red-team does better than that.

Triage vs. Full Analysis 

The main goal of a triage is to allow a medium experienced Forensic analyst who probably has no background in malware reverse engineering to figure out if a document is malicious and even get some IOCs out of it. The approach I’m suggesting here is a low-risk approach as it works completely without opening the file in Microsoft Word or executing any PowerShell code. For the sake of completeness, I’ll give some hints on how a Malware Analyst could continue dissecting the malware. 

Tools used

1.) Finding the Macro

We are looking at an old Word Document format as indicated by the .doc suffix. Those documents are stored in the OLECF  file format (further reading). Oledump is a nice tool written in python that allows you to extract various streams contained in the olecf file. So let’s take a closer look at the file and see if it is even malicious. MagicHoundAPT34.doc 

This will scan the file for all subelements in the compound file. For the given specimen it shows the following output. Note the stream number 7. This is the only stream that contains a macro.

Scanning the document

Oledump offers a fast and easy way to extract individual streams. As macros are usually compressed, we need the -v flag as well to decompress the content.

Stream 7 in uncompressed form

Looking at the macro, what we see are typical Powershell parameters and a lot base64 encoded sections. Let’s dump the macro in a separate file to look at it more closely. -s7 -v MagicHoundAPT34.doc > macro.txt 

2.) Revealing the actual code

Looking at the macro in the Texteditor of your choice (I use Sublime), preferably one that supports syntax highlighting we see that the main payload seems to be PowerShell based, and the vb macro only executes PowerShell and shows an error message.

powershell.exe  called using the vb Shell command

So apparently we need to decode the base64 Powershell source next. You probably realized that it is not a single string block, but multiple concatenated string blocks. So we need to clean that up a bit and then decode it into a new file called base64.txt 

Mac OS:
base64 -D base64.txt > powershell1.txt
Most Linux:
base64 -d base64.txt > powershell1.txt

That gives us an interesting piece of PowerShell code. At first, we see, that there seems to be some additional PowerShell code in a string variable called $G8t. at the end of the file, that same Powershell code gets base64 encoded again and then executed with the 32-bit version of PowerShell (Note the location of powershell.exe in a subfolder of syswow64). A very common reason to use 32-bit binaries is when the attacker happens to have 32-bit shellcode he wants to execute. This code wouldn’t run using 64-bit binaries. So let’s look for some shellcode. I’m sure you spotted it already. The variable called $z is an array of byte values. If you look at the code more closely you see, that the attacker leverages the memset function to write the shellcode bytewise into memory he allocated using VirtualAlloc. The malware seems to be flexible when it loads shellcode. Normally it allocates $g bytes which would be 1000 bytes. If the shellcode us longer it changes $g to reflect the actual size of the shellcode.

PowerShell code in powershell1.txt

For an incident responder with no malware analysis background, that would be the right moment to hand the sample over to a malware analyst. 

3.) Extracting and analysing the shellcode 

If you don’t know what shellcode is, Wikipedia has an easy to read article on that topic. So lets get out the shellcode in hex first. It looks like this.


I want to use CyberChef to create a binary file out of the shellcode. For that to work, I either need to get rid of all the 0x or the commas as CyberChef only accepts one separator.  I’ll get rid of the commas and paste it into CyberChef’s input window. Selecting the “From HEX” recipe gives me some gibberish characters in the output window. That’s exactly how machine code is supposed to look like in ASCII. So let’s save that to a file by clicking on the save icon. I choose shellcode.bin.

Creating binary shellcode from the HEX string

So now that we got the shellcode, what do we do next? Shellcode is not a complete binary. It usually uses functions provided by the operating system, in that case, windows to do whatever it needs to do. We have two options now. We do have way more than one option to run it anyhow. For a first glance, I’ll use a tool called scdbg. It allows us to run shellcode without first putting it into a complete windows executable. The downside of this is, that we can’t really debug it from there. One note, if you do the same thing, be aware that scdbg actually executes the code. This can definitely harm your system. So let’s see if we can get it to run.

Shellcode loaded into scdbg
Results after running the shellcode through scdbg

Ok, so now we know a bit more about the shellcode. It seems to leverage a WSASocket to open a connection to a local IP address on port 5555. This IP is not in the subnet of the analysis workstation so there is no way it would get a response. So wouldn’t it be nice if we could look at what it is trying to do there more closely? I guess it is worth a try. There is a nice little tool written by Adam Kramer that can help us out. It is called jmp2it. It allows us to debug the shellcode. But that’s already way beyond simple malware triage for Incident Responders. I’ll put up a separate blog entry on how to proceed with that sample as soon as I have time. So happy hunting and have a great weekend.