May 12, 2014
Joe Hodnicki: I’ve been fairly critical of law professors who venture into the field of bibliometrics without an understanding of the theory and techniques information scientists use to produce citation studies, without in many cases even an inkling of the work of Eugene Garfield, the founder of modern citation indexing and citation analysis. Most studies are more info antics than metrics but not your landmark The Web of Law study . What made you venture into these murky waters?
Fred Shapiro, Origins of Bibliometrics, Citation Indexing, and Citation Analysis: The Neglected Legal Literature, 43 Journal of the American Society for Information Science, 337 (1992)
James F. Spriggs, II and Thomas G. Hansford, Measuring Legal Change: The Reliability and Validity of Shepard’s Citations, 53 Political Research Quarterly 327 (2000)
J. H. Fowler, T. R. Johnson, J. F. Spriggs II, S. Jeon, and P. J. Wahlbeck, Network Analysis and the Law: Measuring the Legal Importance of Precedents at the U.S. Supreme Court, 15 Political Analysis 324 (2007)
Tom Smith: In 2005, I read Albert-Laszlo Baribasi’s book Linked and it struck me that legal citations must form a network like those Barabasi discussed in his book. In fact, citation networks, especially of science articles, have been studied for many years, but I did not know this at the time. I decided I wanted write a paper on this topic, and went searching for data. I was able to contact a sympathetic person at Lexis/Nexis, and after much negotiation was able to get from them a license for some citation frequency data of US federal and state cases. This was strictly numerical data that told me that in California, for example, how many cases had been cited once, how many twice, how many 3 times, and so on. I did not get any network data regarding what case cited what case. Using the citation frequency data I wrote The Web of Law, which got a gratifying response among other law professors and law librarians on SSRN.
Joe Hodnicki: Your Web of Law research led you to develop your search engine, PreCYdent. How did that start?
Tom Smith: It occurred to me that if Google worked by mining information in the link structure of the Web, why couldn’t a search engine be built which mined the information latent in the Web of Law? I wrote many emails to mathematicians around the country, and received one kind response from Prof. Steve Strogatz at Cornell, who said while he was no longer working on complex networks himself, he had a graduate student who might be interested. He introduced me to Antonio Tomarchio, a visiting student from the Politecnico di Milano, Italy’s largest and probably most prestigious technical university. Antonio and I (mostly Antonio) wrote a long, technical monograph on the mathematical properties of the Web of Law, using such limited data as we had. At the end of 2005 I suggested we build a legal search engine, and Antonio agreed enthusiastically.
We went looking for financing. In a few months we were fortunate to find in San Diego a former aerospace engineer and entrepreneur who was willing to provide some seed money. Antonio recruited friends from among his network of friends in computer science and math grad students in Italy, and Piero Fraternali, a computer science professor at the Politecnico, agreed to be our scientific advisor and arranged for incubator space in a Politecnico building in Como.
Antonio and I formed PreCYdent as an LLC in April, 2006. Antonio and our team began designing and testing our algorithm on a database of US Supreme Court and US Court of Appeals cases assembled from various public domain sources.
Joe Hodnicki: The key to any search engine is the algorithm used to generate search results. What can you tell us about it.
Tom Smith: The PreCYdent algorithm was developed over about 18 months of intensive work by team members. Because it is the core of our intellectual property, I obviously cannot describe it in detail, but it is fair to say that it is another branch off the trunk of Cornell Professor and MacArthur Prize winner Jon Kleinberg’s path breaking work in the mathematics of networks, just as many would argue Google’s PageRank is. However, we found that no straightforward application of Kleinberg authority scores, for instance, worked at all well on the legal citation network, and PreCYdent team members devoted much effort to coming up with a mix of mathematical techniques that was suited to legal search. These efforts paid off handsomely though, in that in our tests the PreCYdent algorithm outperforms by a wide margin in search recall and precision (standard measures in the industry) the Westlaw natural language search engine, which is clearly the natural language search engine to beat in the industry. This means that if you were to ask an expert to list the top 20 cases he would want to see in response to a five-word query in a “Google-style” search, you would be a lot more likely to find a number of those cases in the first 20 PreCYdent search results than in the Westlaw natural language results. Of course, this is subject to the caveat that search quality is difficult to measure, and search quality is better measured by an independent consultant. So this is just our good faith, best efforts measure of our relative search quality.
Joe Hodnicki: What else should we know about PreCYdent?
Tom Smith: Currently our website has all US Supreme Court cases and US Court of Appeals cases back to the 1950s. We plan to soon have state cases going back 10 years or more from all 50 states, and ultimately all state and federal cases back to the beginning, as well as statutory and administrative materials.
You may notice that the site has ads in the margins. This is because our plan is to make all US federal and state law available free to users, and to generate revenue by advertising. We believe legal materials in the public domain ought to be public in practice as well as in theory, and to us that means available free and with effective search to anybody with an internet connection. We believe this model is commercially viable, but we also think it will make American and later other law available to interested persons all over the world, and promote the spread of the rule of law.
Joe Hodnicki: PreCYdent is currently alpha. It already contains over 300,000 opinions and 2,500 statutes. In addition to growing the database, what’s on your To Do list?
Tom Smith: This is a true alpha; we are very much still in development and we will study the comments we receive carefully. We think law librarians and law professors play a critical role in the American legal system and value their feedback highly.
We are very interested in vendor-neutral citation systems, and that is on our to do list. You may also notice various Web 2.0 features of the site, such as tagging and rating of cases. We believe users may contribute useful content to a site such as ours.
Joe Hodnicki: Legal citation indexing originated with tables of cases cited which, according to Fred Shapiro, date at least as far back as 1743. Joseph Story contributed to early American efforts by supplying toolmakers with citations in the 19th century. In your own way, you are asking the legal community to follow in Story’s footprints. I hope my colleagues answer your call by checking out PrecYdent. Good luck with your project and thank you for this interview.