The Evolution of Evidence Extraction: From ASCII Files to Artificial Intelligence

Tod Ewasko

Jun 14 2018

For those of us who have been in the trenches of digital forensics for some time, the technological advancements have been constant. This evolution in the tools available to us for digital evidence extraction has completely transformed the way investigations are conducted and promises to take those investigations to a new level in the years ahead.

Beginnings of Evidence Extraction

As recently as the early 2000s, the landscape of digital evidence extraction was dominated by the common structure of data types such as those found in Microsoft®, Lotus® or WordPerfect® software applications. Encryption was not a serious consideration because the strength of computers were out pacing the encryption strength and it was common to simply brute-force every possible password since the maximum amount of possible password hashes in a word-processing document was 65K. In fact, it’s fairly well-known that there were zero Word documents created before 2003 that have an unbreakable password.

A good example of the beginnings of evidence extraction is to look at how we would approach an ASCII-encoded text file. The basic information in this file is the file name, key dates (created, modified, accessed), location of storage, body, size and file type. One could easily open this text file and find all of the key information you need with a simple right-click —this would be considered manual evidence extraction. This method of manual interaction and copying has clear misgivings as the forensic integrity of the data will be compromised due to human error.

Soon, however, our industry began to develop programmatic methods of gathering this information in an automated fashion for the purpose of searching, sorting, filtering and culling datasets. This was made possible by companies such as Oracle, which created tools to extract and identify files, and Microsoft, which developed their own interoperable tools that connected with their installed software products. Meanwhile, AccessData built the first generation of our digital forensics tools, providing investigators with less-expensive and more precise tools to extract documents. That method of taking data from its native form and indexing it would come to be known as evidence extraction (or “parsing”).

Modern Investigations

As data volumes exploded, our industry was forced to adapt by creating an architecture that could scale and keep up with that demand. This involved the leveraging of Big Data and machine learning across the continuum of data collection, review and analysis. It also required significant advancements in the use of technology for evidence extraction.

We now face much higher levels of encryption and new file structures that make it much more difficult for investigators to penetrate. Moreover, the proliferation in the use of mobile devices has created a tremendous new challenge for law enforcement professionals to unlock and extract potentially valuable digital evidence. In 2017, there were more than 5 million mobile apps— between devices powered by Apple® and Google™ technology combined—and most offered some form of chat capabilities for users.

Today’s digital forensics investigators are attempting to tackle this enormous challenge by using automated evidence extraction tools that parse the most popular applications. Unfortunately, that only includes a small percentage of the available apps and only works with consistency in a fairly small sample of devices. Due to voice translation software and other technologies, our mobile chats and other communications contain far more typo errors and missing context than in the past, which will not be resolved by a keyword document search or standard indexing functions. The mobile application data is traditionally stored in multiple locations so a conversation is not as easy to define as a document; it is a joining of relevant information. Keyword searches retrieve only a portion of the conversation and lack human readable chat handles. The investigator is tasked with remedying this by piecing together different files or databases. So for the vast number of modern digital forensics investigations, we are thrown right back to manual parsing—yes, where we started.

Looking to the Future

These challenges are large and they are sobering. In order to facilitate advanced evidence extraction in the future, we need a new generation of tools and approaches. Here comes the exciting news: there are two specific areas of innovation that I believe will help us achieve this next evolution in evidence extraction so investigators can effectively find the data needed from these rapidly changing apps.

  1. Crowdsourcing
    There is tremendous opportunity to leverage involvement across the digital forensics community to create a parsing API. We are committed to leading the way with this approach at AccessData. It will require a lot of work to achieve success with extracting the relevant data and then creating standardization around that technology, but the systems and tools needed to enable this crowdsourcing are here now.
  2. Intelligent Parsing
    The next major leap in evidence extraction will be the advent of automated intelligent parsing. This will likely be a cloud-based tool so aggregation of data types can be performed across larger datasets. The first step down this path will be the combination of machine predictions and human validation, which will remove the need to manually interact and hunt for the encrypted data. At AccessData we continue to push the envelope and lead in parsing technologies and methods.

We have a number of challenges in front of us today in order to write the next chapter in the evolution of evidence extraction, but we should be encouraged by the knowledge that we have made enormous strides in digital forensics over just the past 20 years. The future is going to be both exciting and complex—an environment in which we always rise to the occasion.

To download a copy of this blog, click here.

About the Author
Tod Ewasko is director of product management at AccessData, a leading provider of integrated digital forensics and e-discovery software.

Contact us today to learn more about our products and our
approach to improving how you collect, analyze and use data.
Tell Me More