Quality is an essential aspect of technology for our customers and therefore one of Textkernel’s biggest priorities. In order to further improve the quality of its software, Textkernel continuously explores new techniques to optimise its extraction models.
Textkernel’s latest research revolves around the new Deep Learning technology. Deep Learning is a set of algorithms used on large amounts of data to automatically learn representations of similar words. For instance in a CV extraction system “Amsterdam” and “London” are both used in the “city” context. When a new word with similar representations as Amsterdam and London is found, it is likely to be a city. With Deep Learning, the extraction model becomes more robust as it can automatically learn the semantics of new words.
Textkernel’s CV Parsing research team has performed groundbreaking research on applying Deep Learning technology into its extraction models. While Textkernel just started off with this cutting-edge technique, the results on CV extraction models are already very promising. With significant quality improvements for the English and French language model, Textkernel is proud to present Deep Learning technology as part of its Extract! resume parsing 2014.1 release!
Read the full Extract! CV parsing – 2014.1 release notes below.
Textkernel Extract! CV parsing – 2014.1 release
Support for parsing DoYouBuzz JSON profiles
This enables the “Apply with DoYouBuzz” widget. Remember the ‘apply with-widget‘? With this widget your applicants can apply with their medium of choice (such as their resume or their LinkedIn, Xing, Viadeo, Google+ or Facebook profile). At the same time you receive their full profile details automatically structured in the format of your database. Applying with a DoYouBuzz resume is now also possible via the “Apply with” Widget.
Improvements in handling PDF files with embedded fonts (OCR support is required)
PDF files that have embedded fonts were often converted into question marks instead of letters. Textkernel can now detect those documents with embedded fonts and process them with OCR (when OCR is enabled in your account). This improves the extraction and therefore allows you to better search and retrieve these candidates.
Spanish: improvements in all fields
In the past months Textkernel also successfully directed research efforts at the Spanish language model. In the 2014.1 release, significant improvements to the Spanish model have been implemented for all fields of the Spanish CV with a strong focus on the personal and experience fields.
English: improvements in parsing of Indian CVs
In the Indian-English CVs improvements have been made for the following fields:
personal information, especially in name and address parsing
experience items that specify a duration instead of a date range (e.g: 1 year or 2 months)
English & French: improvements by means of the Deep Learning technique
With the implementation of Deep Learning, the 2014.1 release contains significant improvements in the English and French language model for all sections.
German: improvements for the following
page classification (detecting CV pages in scan PDFs)
chronological CVs (CVs that mix education and experience items in one long section)
job title extraction
Italian: improvements in personal information fields