Carving $MFT (MFTEntryCarver.py)

Another story on how you might discover new artifacts to help your investigation – MFT Carving.

It’ been some time since I wrote my last blog post. Like every year, the last quarter is very busy. Still I got something new I want to share. This week I have been teaching SANS FOR 508 with Francesco Picasso @dfirfpi in Paris. When Francesco talked about $MFT entries, I was curious where on a drive single $MFT entries or groups of MFT entries might end up other than the currently active $MFT. I briefly googled if there are solutions that support carving single, potentially corrupted $MFT entries and couldn’t find any. There are many solutions which parse a complete and active $MFT and solutions which carve files, but that’s not was I was looking for. So back in my room I started to do some research.

As the $MFT is literally filled up with timestamps I figured it might come handy to have a MFT-Carver-Parser that also handles half corrupted MFT entries. If you just want to get the tool I wrote to do that and not read the whole blogpost, feel free to download it at https://github.com/cyb3rfox/MFTEntryCarver/

As pointed out, I did not expect to find many entries in unallocated space, but I gave it a try anyhow. First of all, I dumped the unallocated space of a test Windows 7 image using sleuthkit’s blkls.

blkls image.ewf > unallocated.blocks

This produced a file around 7Gb big. To see if it would even make sense to start writing something, I just did a strings search on the file.

strings -a unallocated.blocks | grep FILE0 | wc -l

Counting potential number of $MFT entires

Note that basically FILE (\x46\x49\x4C\x45) is the header for an entry, but not using it will give you many strings that are part of textfiles and script. Still most not to say all entries start like FILE0 (\x46\x49\x4C\x45\x30). \0x30 declares the offset to the fixup and that is usually the same for many of the files. And as it represents 0 in ASCII it’s easy to grep. Anyway, the above command leads to the following output.

So it looks like we have 5026 potential $MFT entries. Next, I wanted to understand, how those artifacts are distributed across the dump file. Hence, I searched for the magic bytes again, only this time, I also got the offset of the hits.

strings -a -t d unallocated.blocks | grep FILE0

Figuring out the entry offset for plotting

I then grouped the hit offsets into a manageable number of buckets and plotted a histogram.

Histogram of $MFT entries in the dumped unallocated space

The histogram shows, that not all of those entries are clustered in one place but in at least 3 separate locations (there are some less populated sections that don’t show on the graph).
That implies, that the entries might be coming from different sources. Interesting enough for me to start writing a little tool. The $MFT is very well documented. I used two resources to understand what I needed to do.

I guess this is the time to point out, that I’m obviously not a full time coder and python is quite new to me. If you don’t believe me, look at the code and you see what I mean. So please excuse my spaghetti code. In the end, it works and is even reasonably fast.

I wanted the tool to be able to do the following things:

  • Find potential file entries
  • parse as many of their $FN attributes (long and short) as possible
  • parse $STDInfo and $FN Timestamps
  • For resident $data attributes, recover the data as well
  • still work if half of the information is corrupted
  • output all in csv format

So, first of all, I needed to find all potential $MFT entries. As the input files for this kind of tool can get quite big, using methods that need to put the whole file into memory might fail. In the end, I decided to use the python mmap library. It’s fast and you can open really big files and move a pointer over the data. Also searching for hex patterns is supported.

The pattern search for FILE (\x46\x49\x4C\x45) returned 54583 potential entries. So how do I decide which are legitimate ones and which are false positives? My approach was to try and parse further attributes and sanity check the results as good as possible. So, for example, I use size checks a lot. All attributes in $MFT entries store their size. I parse that and if the parsed value is smaller than the minimum size of the attribute or bigger than the size of the whole entry, the bytes I parsed are probably not part of a real $MFT entry. I don’t want to go into that too deeply, please look at the code if you want to understand my approach. Suggestions and contributions are highly welcome.

So after a couple of hours of work, I get to the following results.

python MFTEntryCarver.py -s unallocated.blocks

MFTEntryCarver.py output -s flag shows statistics in the end

So essentially. MFTEntryCarver.py will give you all the artifacts mentioned above if it can find them. It only keeps on parsing when it finds at least one valid $FN attribute. If certain artifacts are not there, it will put in “corrupted” in the respective field. Below are the statistics the tools shows after processing the test dataset.

The good news here, in the end, there were not only 5026 entries to parse but a total of 54583. Out of those MFTEntryCarver.py recovered 14975 entries. So let’s take a look at one of the entries.

[u'GETWIN~1.URL', u'Get Windows Live.url'];2010-11-10 08:22:22.646273;2010-11-10 08:22:22.646273;2010-11-10 08:22:22.646273;2010-11-10 08:22:22.646273;1601-01-01 00:00:00;1601-01-01 00:00:00;1601-01-01 00:00:00;1601-01-01 00:00:00;0d0a50726f70333d31392c320d0a5b496e7465726e657453686f72746375745d0d0a55524c3d3c0174703a2f2f676f2e6d6963726f736f66742e636f6d2f66776c696e6b2f3f4c696e6b49643d36393137320d0a49444c6973743d0d0a794711

So, in this case, we found a .URL file called “Get Windows Live.url” and the data seems to resident. Some timestamps couldn’t be parsed, MFTEntryCarver.py still gets you as much as it can. I usually transform and analyze raw hex data using CyberChef. Throwing the hex contents of the resident data attribute at it produces readable results (at least for ascii characters).

CyberChef can parse the resident data

I’ll try the tool some more in investigations, but it looks promising. offers a viable way to get older $MFT entries, including timestamps. Artifacts like these usually help an investigation when the attacker cleaned up and/or is long gone. If you use it, please give some feedback. You can easily reach out to me on twitter @mathias_fuchs. You can get the code at https://github.com/cyb3rfox/MFTEntryCarver

Leave a Reply

Copyright Cyberfox 2018
Tech Nerd theme designed by Siteturner