AI / TAR & CAL
Artificial Intelligence (AI) is frequently used in popular platforms like Netflix, Amazon, and Google, but it can be very helpful in eDiscovery and litigation support projects as well.
An eDiscovery example of AI is mining a database to find a document similar to the one you're already looking at. AI can be helpful on projects of any size, provided that quality document text is available.
One of the confusing things about AI is the multiple terms used to refer to the number of processes that fall under its umbrella. It can be called advanced analytics, clustering, predictive coding, find similar, email threading, near-duplication, data mining, and data aggregation. In general, all of these analytics rely upon "machine learning." There are two types of machine learning: supervised and unsupervised.
Supervised learning is where a human teaches the computer how to build an algorithm by training the system. Each time the user does this, the computer learns additional information, adjusts its algorithm, and is able to better predict what a human might choose next.
Predictive coding, Technology Assisted Review (TAR), and Continuous Active Learning (CAL) are examples of supervised machine learning. For instance, if an attorney is coding a set of documents as responsive for American football but non-responsive for soccer, the computer will learn based on the context of the coded documents that you're interested in American football and not soccer. It would indicate that a document about soccer would be unresponsive even if it contained the word football.
CAL belongs in a unique category of supervised learning. CAL trains on every action the user takes, one code at a time, rather than learning on discrete sets. CAL can suggest the next document to review in order to further its training rather than just choosing the next document in the set. This allows a CAL system to train quickly and efficiently.
Supervised learning is best applied when you are certain what codes you want to train the system on and that those definitions will not change during training. The performance of unsupervised learning techniques is not effected by changing codes since it is not reliant on human input.
Unsupervised learning is where the computer, using predetermined algorithms or training, attempts to categorize data without the need for human training. In eDiscovery, this technique is used for document clustering, find similar, and text near-duplication. The most visual example of unsupervised learning is clustering, where you can see groups of documents automatically categorized together. A cluster that was labeled as "sports" might have sub-groups of football, soccer, or tennis, each of which might include further subgroups about teams or locations. For eDiscovery purposes, think of this as a curated library of your documents. The visual depiction of documents allows you to navigate to the section of interest. This technique is also used to create samples of documents in order to determine what type of data is in the set.