Chinese CV parsing is the 15th language model in the new release of Textkernel’s module Extract!. Textkernel managed to develop a good parser for Chinese. However, developing the Chinese model was a challenging task.
Differences between Chinese and Western languages The written Chinese language is very different from Western languages. A text is a series of Chinese characters without spaces. Certain combinations of characters are meaningful units (such as our words). Which combinations are used, depends on the context.
Extra step for parsing
In order to parse Chinese CVs, words and phrases must first be identified in the string of characters. To make this even more challenging: simplified Chinese contains about 7000 characters, of which 2500 characters are commonly used.
Overcoming the obstacles
Textkernel uses advanced segmentation techniques, to identify the semantical units with high accuracy. Combined with the information extraction technology that is also used for parsing other languages, Textkernel developed a good parser that can analyse and automatically structure Chinese CVs in your database.
Request a web demo
Does your company receive Chinese applications? Or do you want to experience the magic of the CV parsing in Chinese for yourself? Contact us for a free web demo.