Posted on March 27, 2015

New: Hungarian CV parsing (in Extract! 2015.1)

Textkernel’s R&D team is happy to announce a 2015.1 Extract! CV parsing release . Version 2015.1 introduces Hungarian cv parsing and further improvements to the German, Dutch and English parsers.

New: Hungarian CV parsing

In late 2014, Textkernel started working on Hungarian CV extraction and is now proud to announce the new Hungarian CV parsing model. With the addition of Hungarian, Textkernel now offers CV parsing for 16 languages .

Development of the Hungarian CV parser
Hungarian-CVThe development of a new language model is a complex process. First, a large set of resumes has to be annotated. Hungarian linguistics students were hired to identify the different sections in each CV such as education and experience, but also more specific information such as the education level, position title, and company name.

Textkernel’s researchers then trained the CV parsing engine on these examples. A Hungarian CV parsing model was created and optimised and fine-tuned using more Hungarian CVs, until the desired performance was achieved. Lastly, a Hungarian language guesser was added in order for Hungarian CVs to be routed to the new Hungarian CV parsing model.

Improving German CV parsing with Deep Learning

Last year, Textkernel’s R&D team started applying Deep Learning techniques to further improve the quality of their CV parsers. Following successes with the English and French models, Deep Learning is now being used for the first time to improve the German model. This new technology increases the robustness of the German CV parser and has improved extraction of experience and education items (such as job title and company name).

Improvements to the Dutch and English CV parsers

Additional improvements have been made to the Dutch and English CV parsers. They include:

  • Dutch: improved multi-word city name extraction from addresses, such as ‘Den Helder’
  • Dutch: improved name extraction
  • English: improved extraction of Indian mobile phone numbers

For more information on this release or about Textkernel’s CV parsers, please contact Textkernel.