Email Threading: Can it Help with Review and Processing?
09-08-2010, Avansic - Corporate
Avansic recently released a whitepaper covering the advantages and disadvantages of email threading for e-discovery processing and review.

Email Threading in E-Discovery
by Dr. Gavin W. Manes

The characteristics and subtleties of email can substantially complicate any e-discovery project. However, knowing more about the various email formats and functions can help attorneys properly scope any email processing and review project.

Email is a frequently requested information type during discovery, but it is a very different animal than what most people perceive is a “document”. First, email is essentially without format which is unlike a Microsoft Word document - which has a limited file format and a large amount of embedded metadata that allows users to perform all the functions they are used to. An email is essentially a text document with minimal header information that can appear in .pst, .edb, .eml, .msg, and various other disparate formats; extracting metadata and threading information requires different tools and techniques. To add to the confusion, there may be additional emails or attachments associated with any particular email.

Second, unlike a traditional “document,” email often represents a conversation between two to hundreds of different individuals. Email’s back-and-forth nature captures a large amount of information including the sender, recipient, date sent, body text, and even the servers through which information passed: all of which may be present for each successive email in the thread. Side conversations amongst recipients and the ability to blind carbon copy another user add another layer of complexity.

Figure 1 shows a sample email thread which demonstrates that even a conceptually simple conversation - lunch plans among office colleagues – creates many different email threads. In this instance, John initiates the main thread by emailing Nancy and Paula about going to lunch. Paula then directly replies to John, who replies back to her. But when Nancy responds to the initial email, she replies to all (John and Paula). John then forwards that email to Marc, who replies to both John and Paula separately. Marc gets another forward from Paula. John replies to Nancy’s email, copying Paula as well. This single email spawned five distinct sub-threads, each with their own context. Even without the email body text, it is clear how much contextual information is contained in any given email.

If these conversation “threads” are teased away from each other, this background information may be lost. However, reviewing an entire email thread requires a different mindset from reading each individual email that may lead to faster and more accurate document review. In the above example which represents a very simple email conversation, reviewing the entire set would give a sense of the relationships between John, Paula, Nancy and Marc as well as a basic understanding of their email conversation.

In response to these intricacies, many attorneys have chosen to request and review all of the email on a particular thread containing a search term rather than the specific email with that term; this results in a larger review set. Many of those same attorneys use “threading” software or processes that group messages based on the flow of the conversation, allowing for context-sensitive review.

Now presented with the entire email thread, reviewers must choose how to tackle the set: a linear approach may not work because email threads generally do not take a direct path. A “leftmost” review is the closest approximation to linear review, where the reviewer chooses the most left element as the next email to review (see above Figure). More risky approaches propose only reviewing the leafnodes (the newest email in a thread) for end of each thread - this assumes that all previous information is wholly contained in the last communication of the email chain. However, this notion is inaccurate since attachments may not be included in replies, alterations may be made to previous text, and some users may not include previously included information in replies.

Unfortunately, none of these solutions is a silver bullet for the complications that arise when processing and review email during e-discovery. The problem with reviewing each message individually is duplication of effort and lack of context, which often translates to longer review time. The problem with threading and ESI processing software is that email is notoriously tricky to handle due to the variety of content and formats, therefore automation may miss key information. While choosing a path to review email threads is difficult, it is clear that having a single reviewer look over an entire thread is more efficient than multiple people reviewing the emails piecemeal within that thread. Email threading is a good first step in addressing the time and effort necessary to review and produce useful results from email communication in e-discovery.