Textkernel is happy to release its new version of Extract! CV parsing. This release contains the brand new parser for Greek CVs, as well as large improvements for all languages and language-specific improvements for English, Flemish, Slovak and French.
New: Greek resume parsing
Answering to our customer demand, Textkernel is introducing its 17th language of CV parsing: Greek. In order to add a new language, Textkernel’s parsing engine needs to be trained and tuned on resumes from those languages. Textkernel’s research engineers were able to overcome the Greek language’s diversity, richness and complexity and have developed a state-of-the-art language model.
For more detailed information on the development of Greek parsing, read the blog post “Greek CV parsing, an Odyssey
Resume parsing improvements for all languages
The R&D team at Textkernel has made several improvements to its parsing engine that result in parsing enhancements for all languages.
- New: extraction of Apple’s Pages file format (.pages)
In addition to the standard file types (such as .doc, .docx, .pdf .html, .text), Textkernel’s parsers can now also process Apple pages file types.
- Support for even more Microsoft Word and PDF file subtypes
Textkernel made improvements to its preprocessor, which converts the original CV into text that is then used for parsing.Textkernel is now able to accept even more subtypes of Microsoft Word and PDF resumes
- Improved extraction of phone numbers
- Improved extraction of dates
- Improved extraction of all skills
Language-specific parsing improvements
- English: Better extraction of candidate name, especially when only the first name is present in the CV.
- English: improvements to the experience and education sections:
- Better segmentation of items
- Better recognition of English dates
- US: better extraction of city, region and country
- US: better extraction of company names and locations
- Belgian: Better extraction of names from Flemish CVs
- Slovak: improved classification of education items and degrees
- French: improved splitting of addresses