Credit: Dr Leo (UTS)- You will be redirected to Leo’s Github Page for the following repositories.

Leo maintains a few datasets to help other researchers and save their efforts and they are mostly used in his papers.

CIK to CUSIP Mapping

Provide linking files between CIK and CUSIP using 13G and 13F filings.

USPTO full text database

Provide OCR full text data for pre-1975 USPTO patents. They offer great improvements in quality and coverage than those in Google Patents

Name Matching

Algorithm to match firm names based on string similarities

Replace and Delete (rd)

Extremely fast command line utility to replace and delete strings in text files

Fuzzy Process (fuzzprocess)

Deep-learning approach to find nearest K matches for two sets of names