Document Understanding
There are many types of documents where full text search by keywords
simply does not work well enough. These types of documents are
characterised by the fact that the information they contain only
becomes useful when entered into a database with domain specific
search fields. Examples of such documents are CV's and vacancies, real
estate descriptions, classified ads, email or fax orders, invoices,
scanned business cards, or legal contracts. What is important is not
which keywords the documents contain, but their classification by
categories, the identification of names, prices, codes, phone numbers,
locations, and their relationships.
Eliminate manual data entry
Normally such documents are entered into databases manually. This is costly and the process does not scale well. Textkernel has developed software that can eliminate data entry by an intelligent intepretation of documents. Our software does not rely on fixed formats, patterns or rules to extract key information from documents. The processing of text by our software can be described as a context sensitive interpretation of many local patterns in the whole document, and an optimization step that puts together the most likely document analysis given all these local patterns. The local patterns can refer to words, phrases, context, layout, external knowledge and consistency across the whole document. If similar patterns have been seen in the training data, the software will be able to make a sensible (similar) interpretation of the concepts in the document. Since this process tries to mirror human understanding, we refer to it as Document Understanding, to differentiate it from character recognition or fixed format data extraction.
Save costs and eliminate workflow bottlenecks
Using Document Understanding technology from Textkernel allows you to:
- save costs on manual data entry processes by replacing typing with confirmation, and allowing parts of the inout stream to be processed automatically at very high confidence.
- eliminate information bottlenecks by lowering thresholds for structured completion of profiles by your business partners
- start on new business opportunities where the sheer quantity of data entry seemed to block the feasibility of the project.
CV Parsing
Since its inception in 2001, Textkernel has gained a strong reputation with more than a thousand customers in the HR market place. Our product Textractor Enterprise for CV parsing is widely used by many large international staffing & recruitment agencies, and corporate recruiters, and it is recommended or even implemented as standard by many world leading recruitment software suppliers.
The most important aspects of this success we believe are:
- Our proven technology and ability to integrate into large and complex systems
- Our ability to deliver a wide variety of languages at no additional cost
- Our competitive pricing and flexible licensing models
- Our large experience with complex IT projects both in and outside of CV parsing
The Textkernel CV Processing solution is able to automate the entire process of capturing candidate applications. The main benefits are:
- Captures fully coded candidate profiles from all incoming streams of unstructured CV's.
- Eliminates data entry backlogs
- Slashes handling cost as compared to manual processing.
- Enhances the value of the resulting CRM database both by higher consistency and completeness of database content.
Textkernel Workflow Agents
Automated fulltime monitoring of candidate input channels such as email, web, or file system. This component fills the processing pipeline of the system by submitting applications and their attachments to Sourcebox.Sourcebox
CV processing web application that provides:- Routing logic, exception rules, duplicate checking, and data filtering.
- An account and role based security and configuration scheme.
- Fully configurable interface to Textkernel's CV recognition engine (Textractor). Sourcebox delegates the parsing of documents to Textractor. Multiple language/culture/business-unit specific Textractor instances can be configured per account.
- Multilingual web interface for staff to manage the CV queue and manually check and correct exceptions.
- Interface to many leading CRM, ATS, and HRMS systems.
Textractor Enterprise
multi-lingual CV parsing engine that performs:- Document processing for DOC, DOCX, PDF, RTF, HTML, TIFF, TXT, XML, MSG, and EML type documents.
- Language identification for 70 languages.
- Document classification (CV, Cover Letter, other TBD document types),
- CV parsing for:
- Contact Information,
- Education,
- Career History, and
- Skills
- Candidate Coding (normalisation of extracted fields to customer specific codes, via synonym taxonomies).
- Formatted CV generation.
- Linear scaling of queue processing by parallel processing on multiple CPU's or servers.
Search and Match engine
Textkernel offers interfaces with leading matching engines from our technology partners.Order and Invoice Parsing
An order of magnitude more is spent worldwide on processing incoming business mail than on sending it!
Many types of documents in the mailstream or still coming in trough fax seem highly structured to humans but are in essence free format and difficult to recognize for traditional OCR based document capture systems. Textkernel has built a solution that is able to recognize free format orders and invoices including line information, and code the results directly to a clients ERP system. This solution, called e-FAX, is currently marketed by TNT Post Billing and Document Solutions as part of the Scanpost product portofolio. See TNT Post BDS for more information.
Working with Textkernel to develop custom models
Textkernel can develop accurate extraction, recognition and coding engines for any domain where repetitive data entry is needed to enter information from textual documents into forms or databases. Due to our machine learning technology, annotation of a few thousand example documents is enough to train a highly accurate free format recognizer for a new domain or language. Please contact us to learn more about the possibilities in your organisation and business process.
- TextractorServer: A highly scalable state-of-the-art document recognition engine.
- Sourcebox: Web based document processing user interface and enterprise integration platform
- Textkernel Workflow Agents: Configurable tools for implementing document workflows

