Another important tool in our products is fuzzy text matching. Finding two strings that are not the same, but textually similar from a large set of possible candidate strings. Textkernel has developed FuzzServer, a software module that integrates high speed fuzzy text matching into any application.
Grasping the letters
In many cases documents originate as scanned images. The first step on these documents is to turn them into text using OCR (Optical Character Recognition) technology. The output of an OCR process, however, often contains many characters which have not been recognized correctly. This is a problem for further processing, and it is especially true for scans of low quality images. Our fuzzy matching solution relies on highly hardware optimized brute-force string matching: for every input string FuzzServer is able to retrieve the most similar record from millions of candidates in a database with in less than one tenth of a second. That’s fast!
During the nearest string search, FuzzServer performs automatic alignment of the strings using many similarity metrics. On the resulting aligned strings, powerful validation and classification algorithms can be tuned to deliver confidence scores to any precision and recall trade-off.
The FuzzServer engine is an OEM product that can be integrated into Text Mining, Search Engines, Product and Customer ID Search, Directory Search and Database cleaning products of other software vendors. The technology is also used in many other Textkernel products to power mapping to taxonomies and information extraction from noisy OCR data.