MyHeritage recently announced the publication of a massive new collection of 982 million names, extracted from our U.S. and Canadian historical newspaper collections. The following excerpt is from the MyHeritage blog.
Historical newspapers are some of the most important sources for genealogical information because they are very rich in detail. Newspapers can often add color and personality to the dry facts that are often the output of other genealogical sources such as census records.
About the collection
The collection is an index of names that were extracted from existing free-text U.S. and Canadian newspaper collections on MyHeritage. The free text in these collections was generated from the scanned images of newspapers using Optical Character Recognition (OCR) technology, which converts images into text.
The new Newspaper Name Index does not replace the free-text newspaper collections, but is added on top of them as a separate collection. What’s more, this name index is the fruit of only half of our newspapers, and the other half of the name index is currently being generated and will be published soon, so that nearly one billion additional records will soon be added.
Records in the index include a person’s name, a snippet of text mentioning them in the newspaper, and the newspaper’s publication title, date, and place of publication. Each record includes a scanned image of the original newspaper article. Some records will also include additional searchable information such as the name of a spouse and the place of residence based on the information extracted by the machine learning algorithms. Year range and place coverage in this collection vary greatly.