Avansic's "E-Discovery Searching Strategies" Whitepaper Published in "On the Record"
12-06-2010, Avansic - Corporate
http://dallasparalegals.org/default.asp
Dr. Gavin Manes' most recent whitepaper, "E-Discovery Searching Strategies: Crafting Keyword Lists" was published in the December 2010 issue of "On the Record" (a publication of the Dallas Area Paralegals Association). This paper covers the various approaches to search methods for large document sets and the most strategic ways to achieve a targeted document set.

See below for the full text:

Choosing the right keywords during the processing stage of e-discovery means a greater chance of finding target documents, thereby reducing review time. Many attorneys have had the experience of sending what they thought was a comprehensive and specific list to a vendor and receiving thousands of irrelevant documents in return.

Here are some tips and tricks for crafting a search term list that can help instead of hurt. Knowing a little bit about how computers name files and how automated programs “search” for documents can empower attorneys to make effective keyword lists.

1) Use “wildcards” to your advantage – A wildcard, commonly represented by a “*” symbol, tells the computer “any character or characters can be here.” For instance, if you’re looking for singular and plural instances of the word, you can search for “dog*” and the computer will return all hits with “dog”, “dogs”, “dogged” and “doghouse.” Careful use of the wildcard can allow the simultaneous search for multiple tenses, plural words, and sometimes misspellings.

2) Short words and single characters are not your friend – Many e-discovery products won’t even index words of less than 3 characters and often don’t index the most common words such as “the”, “any”, and “because”. Even if the short word is indexed, the chance of getting junk results is much higher for short words. If you have to search for a very short word or abbreviation, consider pairing it with another word that generally appears in the same document or ask that the abbreviation be searched for in a case sensitive manner.

3) Phrases – The more unique and contextual the search terms, the better the results. Combined with the use of wildcards as mentioned above, using phrases will give much more specific results than each word searched separately. Most e-discovery programs also have a “within” feature, where it can be specified to return all instances of a word when it occurs within several words or characters of another keyword. For instance, a search for “quick” within two of “fox” would return a hit on the phrase “the quick brown fox” but would not return “the quick sand”.

4) Tenses – For words where wildcards may not work on various tenses, fuzzy searching can be utilized. Fuzzy searching uses both a dictionary of common tenses and misspellings as well as an algorithm to determine words that are “near” to the word being sought. For example, a fuzzy search for “receive” would find “recieve”. A search for “find” would locate “found.” Fuzzy searching can be powerful, but should be used in moderation as it can provide false positive results.

5) De-NISTing (or NSRLing) – This process is also known as “known file filtering” and compares an e-discovery set against a list of known files kept by the National Software Reference Library. It removes common operating system files in order to reduce a data set, but should be used thoughtfully since the presence of certain known files may be relevant to a case. For example, the installation of a remote desktop program might be important if the case involves a theft of intellectual property from a terminated employee, but that program information might be removed from the set during de-NISTing.

Any expert e-discovery vendor will be happy to consult with you regarding keyword lists. Those with significant experience with e-discovery processing tools know the ins and outs of those programs and can refine your concepts, phrases, and words in order to produce the most relevant data set. In the end, a refined data set will take less time and money to review, leading to lower e-discovery and overall litigation costs.


About the Author
Dr. Gavin W. Manes, President and CEO of Avansic, is a nationally recognized expert in e-discovery and digital forensics. Dr. Manes has published over fifty papers on computer security and digital forensics, and has given hundreds of presentations to attorneys, executives, students, professors, law enforcement, schools, and professional groups on topics ranging from ESI processing to cyber law.