Posted on: September 10, 2015in Blog
4 Methods to Help Improve Your Keyword Search Results
This post explains four best practices for developing better functioning keywords: visibility, dictionary access, iteration/testing, and keyword expansion via concept analytics.
In 1985, the Blair and Maron study, found that keywords alone identified less than 20 percent of the relevant documents in their test case of roughly 350,000 pages of text. Thirty years and a many conversation later, the majority of cases we work continue to rely on keyword searching as the primary means by which to reduce document collections by identifying a relevant, document-review set. Despite these findings and judicial statements about the inefficacy of keyword searching in eDiscovery, the prevailing practice continues to be search, process, review, produce.
Learn how to optimize your keyword search terms with proven best practices and expert tips in this 4-week course.
So, given that keywords are not very effective and that they still remain as the dominant culling tool in eDiscovery, how do we make them function better? There are four keys to developing better functioning keywords. Those keys are: visibility, dictionary access, iteration/testing and keyword expansion via concept analytics.
Too often search terms are created in a vacuum with no insight into what words or phrases actually exist in the dataset. In the past when processing rates exceeded $2000 per gigabyte terms were used to reduce collections prior to processing thereby saving vast amounts of money. Today the market has changed and processing rates are at historic lows. Searching in the review tool affords greater visibility into search results. Case teams have immediate access to search hits and can quickly determine which terms are successful and which terms need refinement. Visibility allows you to see keywords in context rather than as a number on a report.
All keyword indexes are built off a dictionary or list of terms in a particular database. Some review tools expose this dictionary to end users so that you can look up the number of instances and documents that contain any specific term. This is a powerful tool because before negotiating terms with the opposition, you can get an understanding of what terms exist. Dictionaries also allow you the ability to submit fuzzy searches to search terms with variable spellings. Fuzzy searching is a good strategy for commonly misspelled last names. For databases with a high volume of OCR, it may not be enough to simply search for the name, “Johnson.” A dtSearch dictionary will tell you how many variations of the word “Johnson” exist, while allowing you to eliminate legitimate variations such as “Johanson” and “Johnston.”
Although keyword expansion requires that addition of an analytics engine, it can be a powerful tool for creating search terms lists to run against the dtSearch index. When you submit a keyword to the expansion engine, the analytics index will identify all the terms that are “conceptually” related to your term. By conceptually related we don’t mean synonyms, but terms that share conceptual meaning within your specific dataset. For example, an R&D team may be working on a project code named “falcon” but the product name might be GL4800. The case team probably knows about GL4800, but may not know to also search for “falcon.” Keyword expansion will help you find those conceptually similar terms.
Finally, it is important to test your terms and to run multiple iterations of terms to ensure that your recall is what you might expect. Too often, we run terms only to have case teams want to refine them after review has started because of the large number of false positives. Testing and sampling terms is an iterative process. Most review tools allow you to sample search hits, allowing you to confirm the results of your search terms before finalizing and building a review. Building this practice into your future reviews may require a little time at the outset, but will save you time in the end.
While search terms remain the primary means by which case teams cull and identify documents to review for litigation, the fact is that they are terribly unreliable. That being said, the use of keywords as a review strategy is not going away any time soon. The four keys noted here will help to improve keyword recall and make sure that you are making the most of your terms. The success of your search terms is predicated upon the time you put into development and testing before you finalize them.
D4 Weekly eDiscovery Outlook
Power your eDiscovery intellect with our weekly newsletter.
Posted November 14, 2018
Simple Guide on How to Transfer Data Correctly
Posted November 08, 2018
How to Document Your Chain of Custody and Why It's Important
Posted October 31, 2018
7 Tips for Better Law Firm Security
Posted October 19, 2018
Creating Strategic eDiscovery Workflows for Small Teams
Posted October 10, 2018
How to Reduce Your Threat of Cyber Attacks in Wake of China Hack Allegations
Posted September 26, 2018
X1 Insight and Collection & RelativityOne Integration: Testing and Proof of Concept
Posted September 19, 2018
D4 used Relativity to pinpoint a single Chinese character with hundreds of thousands of WeChat messages
Posted September 12, 2018
Why You Should Implement Pre-Review Analysis in Your ECA Workflow
Posted September 05, 2018
What is Data Mapping? ESI Basics for eDiscovery
Posted August 29, 2018
ILTACON 2018 Takeaways: 4 Ways to Get Your Lawyers to Use Advanced Technologies