Data Clustering & Technology Assisted Review

A case involving several hundred thousand documents that needed to be reviewed in a short, month-long timeframe leading up to late stage depositions. The client did not want to spend the amount of time necessary to train dozens of additional reviewers nor did they want to incur the costs to lay eyes on every document. They turned to Avansic to assist with applying technology to make the review more efficient and effective. This was particularly important since this was not just a relevance/privilege review; they knew the information within this document set had the power to shape a winning strategy for their case they just had to find it.

Complicating Factors
The enormous volume of data and extremely short timeframe were the biggest barriers to success in this case. Although the client could have afforded the cost of reviewing every single document, the timeframe didn’t allow for the option.

The biggest challenge to any project involving technology assisted review or predictive coding is creating the appropriate workflow. When to use the technology is almost as important as which technology is used; in this case, applying the Brainspace Discovery to cluster the document based on their concept was very valuable in the initial stages.

Right after the data was processed (de-NISTed and de-duped), Avansic loaded the entire set into Brainspace applying its clustering via unsupervised machine learning; there the lead attorneys on the case examined the resulting “wheel” visualizer. They identified two particular concepts of interest on the wheel, drilling down to a set of approximately 7,000 emails and attachments of critical importance. In addition the remaining documents were sampled and coded. Those codes were propagated using supervised machine learning.

Avansic then exported the documents, with their concept tags in place, to the XERA online review tool. The attorneys gathered their litigation team and reviewed this critical subset in a matter of days. They then crafted a strategy for moving forward and continued to review at a slower pace on the remaining prioritized sections of the Brainspace “wheel.”