Ontology Mining

Keeping HR domain knowledge up to date

HR domain knowledge is what makes the difference between a generic search engine and a search engine that understands recruiters and provides meaningful search results. Textkernel’s technology makes use of an extensive knowledge library. But how does Textkernel add new terms and makes sure their ontologies and taxonomies stay up-to-date?

Textkernel’s software works with so-called taxonomies and ontologies. While taxonomies can be imagined as a hierarchical tree-like structure with children-parent relationship of the terms, an ontology on the other hand can be understood as a web which connects terms based on their relationships with each other. Both concepts are important in order to deliver semantic search- and matching software for HR.

Knowledge about skills and professions are typically part of the HR domain knowledge that recruiters use in their daily routines when they dive into CVs and vacancies trying to understand and match them. But not necessarily of computer systems that try to do the same. Data Scientist is for example a (very trendy) profession and hadoop is a skill that most data scientists need to have. Macroeconomics and microeconomics are both branches of economics while the job title “Investigator” may mean both a scientific researcher and a police investigator.

How do we make sure a machine understands the slight differences in HR?

This ambiguity typically results in suboptimal performance of CV and vacancy parsers as well as search engines, as the lack of domain knowledge does not allow these systems to process real-world concepts instead of strings, and reason about them. Textkernel’s software modules, on the other hand, do have access to such knowledge in the form of Textkernel’s ontology, a large knowledge graph that defines and interrelates concepts and entities about the HR and recruiting domain, such as professions, skills, qualifications, educational institutes, companies etc.

The HR domain knowledge is represented and stored within the ontology according to a unified schema with clear and well documented semantics. Each concept has a unique identifier (URI), and rich linguistic information (synonyms and spelling variations) in multiple languages. This information allows our parsers to map terms and keywords, found in CVs and vacancies, to entities and concepts with a very high precision and coverage.

Taxonomies and ontologies

Concepts of the same type are organised in taxonomical hierarchies where the meaning of a child concept is narrower than the meaning of its parent (e.g., microeconomics narrower than economics). Concepts of the same or different type are also related in an associative way via domain-specific relations (e.g., professions are associatively related to the skills they mostly demand in the job market). Both hierarchical and associative relations enable our systems to a) disambiguate ambiguous terms when parsing CVs or vacancies and b) determine the semantic similarity between these terms when searching or matching them.

Furthermore, an important feature of Textkernel’s ontology is its interlinking to external knowledge graphs and taxonomies/vocabularies, both recruiting-related ones (e.g., ISCO, ESCO, O*NET, ROME etc.) and general-purpose ones (e.g., DBPedia, Eurovoc). By having our own concepts mapped to the external knowledge resources we make easier a) the flow of knowledge from our ontology to these resources and vice versa, and b) the semantic interoperability with systems that already use these resources. The latter is important as many of these models are standards (either internationally or in their respective countries) and therefore widely used by our clients.

Building and evolving Textkernel’s ontology

Constructing and maintaining a large knowledge graph about the recruitment domain is a big challenge. Not only because the domain is quite large but also because it is very heterogeneous (different industries and business areas, languages, labour markets, educational systems etc.) and changes in a very fast pace. A fully manual approach (with human experts defining all necessary knowledge) is too costly and cannot scale, while a fully automatic one (with data mining and machine learning techniques being used for extracting the knowledge from text) may suffer from low quality. To deal with this challenge we do create new content based on data-driven ontology mining but have humans in the loop to assure the quality. The combination of automatic mining of new concepts and human quality insurance insures the best outcome in terms of semantic technology for the HR domain.

If you want to find out more about ontology mining, read our blog post!