What could you do with the web in your pocket? A lot, probably. There is so much data out there on the web that is relevant to your business, but sifting through web search engines does not deliver you the right kind of overview that enables you to analyse the information and to act on it.
Crawling and Extracting
Textkernel applies powerful document understanding technology to data that is spidered from the web. Our technology is optimised for collection of structured data feeds from very large numbers of unstructured online sources.
Web mining involves the aggregation of information from the web. Many separate technologies are involved. Fetching web pages from the huge amount of web sites out there, also known as wide coverage crawling, is the first step. For some sources, the so-called ‘deep web’ has to be accessed by customized crawlers. After the pages are collected, they must be classified whether they contain information about the domain of interest. Information is extracted, checked for freshness, and deduplicated based on fuzzy matching of its content. The result is a domain specific structured database of incredible breadth, timeliness and value. Without manual data entry!