Improving matching result with Textkernel’s learning-to-rank research

High quality semantic search means highly relevant search results. The modern approach towards tuning a complex ranking function is called Learning to Rank (LTR). Machine learning techniques are used to learn from user feedback which search results are good and which are not. As outlined in a blog post from my colleague Ruben, at Textkernel we use Elasticsearch as underlying engine for Search! and Match! to have a highly scalable product. The incorporation of more elaborate domain knowledge into a simple linear weighted ranking function is hard. In this blog post I will explain how we use LTR algorithms to improve the matching results.

By Agnes van Belle

Say you want to give a match on location a higher score when searching for construction workers, compared with when searching for IT professionals. You know this will result in more relevant documents at the top of your result list. You could hack this into the search engine, but what if you have a multitude of such rules? And what if you know these rules may change, or differ per user?

At Textkernel we have the specific task of matching jobs and CVs for which multiple of such rules will exist. These more complex and costly ranking rules are incorporated in a reranking model on the top K results as returned by the search engine, that re-orders these results.

Why is learning to rank important?

Learning to rank (LTR) simply refers to training such a reranking model. Machine learning can discover elaborate and non-linear dependencies in the data and use them to generate models that can improve the relevance of search results beyond what can be conceived by human inspection. Being able to machine-learn a model also allows for automatic tailoring of the ranking to a certain user’s preferences.

For learning to rank, you first need a set of queries and their results, and relevance labels for each of these documents indicating how good of a result they are for that query. Then you need to extract features from this data. LTR algorithms will then train a model that uses the relevant labels and the features from the queries and documents to infer a new scoring function that, when applied, will result in a ranking where relevant documents are higher ranked and less relevant documents lower.

Textkernel’s learning to rank research project

First we needed to know, given a certain query, which resulting documents should be ranked higher and which lower. We generated relevance-labeled data for learning by letting expert users in the HR domain define if the result was either “completely irrelevant” or “would directly hire”.

LTR algorithm input is a single feature vector per result, plus the aforementioned relevance label. The feature can be based on the query and document and/or the other documents in the same resultset. We use five different types of features that we extract for each result:

  • Document features (e.g. important jobgroup-field related keywords),
  • Document-Resultset features (e.g. average of the number of experience years over all documents in the result set),
  • Query features (e.g. number of IT skills in the query),
  • Query-Document features (e.g. a document’s match score for the query’s jobtitle field), and
  • Query-Document-Resultset features (e.g. average number of language skills matched).

The algorithm can then use these features to learn new rules, for example if the query has more than 5 IT skills, and the average number of experience years in the complete result set is lower than 3 (i.e. we’re selecting from starters), the jobtitle-match should be weighted lower than when the average number of experience years in the complete result set is higher than 15.

Screen Shot 2016-05-30 at 12.04.38
Distribution of score differences per query of our reranking-model compared to the baseline (no-reranking) score for each query

We first experimented with several established LTR algorithms using common open-source libraries (RankLib, JForests, RankPy). After that we implemented some of the best-performing algorithms ourselves to incorporate some modifications based on recent research. We conducted a grid search to find the best algorithm and its best hyperparameters given our annotated data set. In the end, using a bagged version of a boosted regression trees algorithm, with an optimal set of hyperparameters, gave the best results for our data: 22% performance increase in retrieval effectiveness, as measured by the NDCG metric.

How do we use this in Search!?

For usage within Search! and Match! we have created a pluggable Reranking-Library. Internally it consists of two parts: a reranking part that re-orders the results according to the current reranking model, and a machine learning toolkit that can learn new reranking models.

Search! has an integrated assessment mode that allows people to give relevance labels to results of a query. We automate the learning of new rerankers with our Reranking-Library, which means we take care of the feature extracting, learning and parameter settings. In other words, with our Reranking-Library, any user can have their own reranker based on their specific data and preferences, provided that they regularly assess the documents resulting from their queries.

Looking into the future!

We have built a new framework in our product for learning and applying rerankers. This gives serious improvements of overall search quality for all our customers already, and allows us to build specialised ranking improvements for specific client needs. While it is tempting to keep experimenting with the latest algorithms and optimisations to them, we found that real gains can be made by simply increasing the amount of labeled data to train from.

Currently we are therefore creating more human-annotated data, but also working on extracting relevance data from our Search! user logs. This will provide us with a continuous, customer-tailored supply of log data for which the customers don’t have to do anything and from which we will be able to further automate the learning of robust and customer-specific rerankers.

We are looking forward to the release of our Reranking-Library with Search! and even more to this next step of providing it with data fetched from logs – we believe that incorporating refined as well as personalised knowledge about matching jobs and CV’s will be a major leap for Textkernel’s semantic search.

About the author

1dAgnes van Belle is a research engineer at Textkernel, working on machine learning research related to (re)ranking and software development projects to improve the retrieval performance of the Search! and Match! products. In her free time she also likes to draw (comics), read novels, and play some chess.

Curious about Textkernel? We are growing and hiring!