Blogs & Articles

Home | Blogs/Articles | Litigation News | Data on Patent Law: Sources and Uses Explained

Data on Patent Law: Sources and Uses Explained

by | Feb 1, 2024

Data on Patent Law: Sources and Uses Explained

Sometimes the most useful litigation tools are ones you assemble on your own – that way you can tailor them to your needs, and occasionally they are even free.  Here is an example of resources for the federal circuit, PTAB, and other trial level patent litigation. These resources can give you a sense of judicial behavior which can help generate expectations for case outcomes and timelines.  The three resources that I will quickly run through in this post are the the Compendium of Federal Circuit Decisions compiled by the University of Iowa Law School (what I will call the “Iowa Database”), the USPTO’s datasets and case resources, and CourtListener’s RECAP Archive. Each of these resources is free, and each can significantly assist you in developing a patent rights strategy.

Federal Circuit Database

This Iowa Database is comprehensive of Federal Circuit decisions since 2004 and has multiple pieces of information for each case.  The Database contains 19,761 cases and is consistently updated.  The types of information that one can derive from this dataset are invaluable. Anything from the likelihood of a granted en-banc (136 granted and 13,237 denied for a rate of approximately 1%) to the number of appeals adjudicated from PTAB (1,777) is readily available.  

Since the Iowa Database contains information on all decisions from the Federal Circuit, some sorting is required to isolate particular types of appeals like those relating to patents.  If you have a software application that can easily create crosstabs like Tableau (my favorite) you can organize and synthesize the information to derive useful outputs. The 19,761 records, for instance, can be sorted by dispute type. Although many of the records records relate to orders that aren’t connected to several of the case outcome variables, among cases that are labeled by type, 3,270 deal with patent infringements, 1,059 deal with inter partes review, 517 deal with contract claims, etc.

The patent cases are coded for whether they relate to code sections 102 or 103 along with other issues like claim construction and definiteness.  Once a specific area is nailed down, let’s say patent infringement, then more specific analyses can be performed.  If we isolate the cases from 2015 forward for example, we can see which judges have been the most frequent majority authors (Stoll with 99, then Prost with 93, and Lourie with 87). We can also look to see who authored dissents most frequently (Newman with 31 and Reyna with12). Or perhaps we want to know about the most frequent lower courts (District Court for the District of Delaware with 246, District Court for the Eastern District of Texas with 163, and the District Court for the Northern District of California with 158). Maybe we even want to know who is or was most likely to dissent from an opinion authored by Judge Stoll. In such instances, Judges Dyk, Hughes, Lurie, and Newman each dissented twice.

USPTO Website

The USPTO also has a treasure trove of free resources for the legal data enthusiast. Some of the information is quite helpful for legal practitioners moving forward while other data are mostly historic. Even the backwards looking data though can aid with current decisions to the extent that they are based on litigation before active judges.

The historic element is quite fascinating. While unfortunately only updated through 2016, the Patent Litigation Docket Reports have case level information from 81,350 district court cases filed between 1963 and 2016.  A few nice feature of the Docket Reports is that they track litigation timing and this can be parsed by on other variables like the judge or court of interest.  There are also multiple datasets that correlate to one another so you can look at observations based on the attorneys on the cases, patents, case names, or documents. 

We might, for instance, be interested in the magistrate judges who these cases were referred to in order to gauge how long proceedings end up taking in their courts. Here is an output of magistrate judges with over 200 proceedings in this dataset.

Judge Roy S. Payne for the Eastern District of Texas has the lion’s share of these cases with all other judges only deciding a fraction of Judge Payne’s count. Let’s say we are interested in the time it takes these judges to move from an opened to a closed case, we can use the time parameters in the dataset to run this calculation for each individual case, and then generate averages by judge.  Here are what the averages look like for these judges.

Judge Payne cleared his cases the quickest of the group at just under 250 days while, at the other end of the spectrum, Judge Trumbull of the District Court for the Northern District of California averaged over 535 days per case.

There are also other datasets available on the USPTO site as well including the Patent Examination Research Dataset (PatEx) which covers “13 million publicly-viewable provisional and non-provisional patent applications to the USPTO and over 1 million Patent Cooperation Treaty (PCT) applications.” 

CourtListener’s RECAP Archive

The RECAP Archive is a freely accessible tool that compiles PACER records.  It is an extremely useful resource and was used to derive some of the datapoints for the USPTO measures.

RECAP is generally more of a qualitative data source that can be used to put together quantitative statistics. One of the nice parts of RECAP though is that you can dive into case dockets and in some instances you can view documents filed in cases. 

One of the nice features of the RECAP archive is that you can filter by PACER codes, so, if for instance you were interested in patent cases, you could plug in nature of suit code 830 and find that since the beginning of 2015 there are 28,257 cases that fit under this code and 1,867,132 docket entries. If you were interested in the cases referred to Judge Roy S. Payne in the Eastern District of Texas you could refine your search by judge and find there are 2,611 relevant cases since the beginning of 2015.

A nice feature of RECAP that was presumably used in the creation of the USPTO dataset is the RECAP metadata that correlates with the variables in the USPTO site. These variables include the judge assigned to and referred to the case, the citation, date filed and terminated, date of last known filing, cause of action and nature of suit, jury demand, and jurisdiction type. There are also data on the parties and attorneys where available through PACER.

The upside to these data is that they allow for updating beyond the numbers currently available from the USPTO dataset which only run through 2016 and provide additional information not provided in the dataset. The downside though is that it takes either scraping and parsing skills to put it into a useable format or taking the time to input the data manually. If you have specific information you are trying to assemble rather than raw general data though, this is a good place to begin.

Concluding Thoughts

Legal data help with generating predictions, following trends, and understanding changes in the legal landscape.  The data described in this article are all readily available and relatively easy to use and navigate. These are great starting points for research and comparisons and provide context to those interested in specific cases. Another big upside is that these resources are free.

While the resources I described generally relate to patent law, this is just an example of the legal data that are freely available on the web. There are many other resources for other areas. If you already understand the value of data, then the raw data available to put together novel datasets abound. Furthermore, there are experts in legal data analysis that can help you develop the skills to make use of these resources and to ascertain answers and solutions to complex legal questions that are not answerable through doctrine alone. For claimholders, litigators, litigation funders, and insurers, such data provide the additional benefit of oftentimes lending themselves to probabilistic determinations that can help individuals forecast potential outcomes and generate likelihood intervals that relate to the probability that certain outcomes will come to fruition.

Adam Feldman is the editor of Empirical SCOTUS, a blog that conducts data analysis of the United States Supreme Court, and the Principal of Optimized Legal, a legal data/statistical consultancy. He is also an adjunct professor of political science and public law at California State University, Northridge. You can reach Adam for specific data and analyses related to your own litigation questions in this and other areas.

Certum Group Can Help

Get in touch to start discussing options.

Subscribe to Our Newsletter