How Google’s Job Search Engine Uses Machine Learning

by Posted @ Apr 26 2018

Twitter

I’ve had some people ask me recently why I think it is important to share patents that describe things that a search engine might offer, like a recently added job search engine The legal purpose behind a patent is to give a patent holder a chance to exclude others from using the same process as them and keep others from infringing upon their inventions. The tradeoff behind such an exclusion is a requirement that patents end up being published, to give others examples of how people are innovating to try to overcome problems, which can be inspirational, and give people some insights into the assumptions such inventors may have about search, and searchers, and the Web.

Google’s Job Search Engine

It’s always fun when Google introduces some new feature, and then searching through new patents from the search engine, that new feature is the subject of one of those patents. For example, you may have noticed that Google introduced a Google Job Search engine, as you can see results from here:

Google Job Search Results

There have been articles about that job search engine:

Google’s jobs search engine gets salary ranges, a better location filter and more

There are also Google Help pages about the Google Job Search:

Search for jobs on Google

If Google Started A Search Engine Today…

The new patent has a very simple name, “Search Engine”. Because it took that approach, it made me wonder what Google might be like if the people working on it started building it today. I did find it interesting that they included these definitions of what a search engine is, that start the description of the patent:

A search engine may generally be described as any program that executes a search and retrieves stored data. However, based on the task at hand, a search engine can be configured in a variety of different ways. For example, some search engines may be configured to perform keyword-based search and retrieval. Such search engines may identify relevant search results based, at least in part, on the number of times a search term appears in a particular resource, or the particular resource’s metadata. Alternatively, or in addition, some search engines may identify search results as being responsive to a query because a resource provider paid the search engine provider a sum of money to return the provider’s resource(s) in response to search queries that include a particular search term. However, the aforementioned ways that a search engine can identify search results responsive to a query are merely exemplary.

A search engine can also be configured to identify search results responsive to a query in a variety of other ways. A custom configuration of a search engine can be employed, as necessary, to solve particular problems related to search and retrieval. Customizing a search engine can include altering the way a search engine executes a search, identifies relevant search results, ranks identified search results, or the like.

Expanding upon Keyword Searches in Google’s Job Search Engine

This patent does focus on a job search, rather than all searches. It does go into more depth in describing details behind what Google is offering with job search, and why Google’s implementation of job search might be an improvement upon job searches offered elsewhere:

In some implementations, a job identification model is provided that enhances job search by improving the quality of search results provided in response to a job search query. The search results are improved because the job identification model is capable of identifying relevant job postings that would otherwise go unnoticed by conventional algorithms due to the inherent limitations of keyword-based searching. By employing additional methods other than, or in addition to, conventional keyword-based searching, the job identification model can identify relevant job postings that include job titles that do not match the keywords of a received job search query. For example, in response to a job search query that seeks job opportunities for a “Patent Guru,” the job identification model may identify job postings related to a “Patent Attorney,” an “Intellectual Property Attorney,” an “Attorney,” or the like.

The patent gives us a glimpse of this in this drawing that accompanies it:

That is interesting that this search expands upon keyword-based searching like that. The inventors give us some insights into how machine learning plays a role in helping to go beyond matching keywords in a query to job postings, as they describe here:

According to one implementation, the subject matter of this specification may be embodied in a method to facilitate job searching. The method may include actions of defining a vector vocabulary, defining an occupation taxonomy that includes multiple different occupations, obtaining multiple labeled training data items, wherein each labeled training data item is associated with at least (i) a job title, and (ii) an occupation, generating, for each of the respective labeled training data items, an occupation vector that includes a feature weight for each respective term in the vector vocabulary, associating each respective occupation vector with an occupation in the occupation taxonomy based on the occupation of the labeled training data item used to generate the occupation vector, receiving a search query that includes a string related to a characteristic of one or more potential job opportunities, generating a first vector based on the received query, determining, for each respective occupation of the multiple occupations in the occupation taxonomy, a confidence score that is indicative of whether the query vector is correctly classified in the respective occupation, selecting the particular occupation that is associated with the highest confidence score, obtaining one or more job postings using the selected occupation, and providing the obtained job postings in a set of search results in response to the search query.

An Occupational Taxonomy for the Job Search Engine

The patent tells us about how that occupation taxonomy That job searches are based upon might be further developed, as well:

The operations may include receiving a search query that includes a string related to a characteristic of one or more job opportunities, generating, based on the received query, a query vector that includes a feature weight for each respective term in a predetermined vector vocabulary, determining, for each respective occupation of the multiple occupations in the occupation taxonomy, a confidence score that is indicative of whether the query vector is correctly classified in the respective occupation, selecting the particular occupation that is associated with the highest confidence score, obtaining one or more job postings using the selected occupation, and providing the obtained job postings in a set of search results in response to the search query.

Feature Weights in Occupation Terms in the Job Search Engine

The patent does expand upon the feature weight for terms in queries for the job search engine also:

In some implementations, the feature weight may be based, at least in part, on a first value representing a term frequency that is determined, based at least in part, on a number of occurrences of each respective term in the job title of the respective training data item. Alternatively, or in addition, the feature weight may be based, at least in part, on a second value representing an inverse occupation frequency that is determined based, at least in part, on a number of occupations in the occupation taxonomy where each respective term in the job title of the respective training data item is present. Alternatively, or in addition, the feature weight may be based, at least in part, on a third value representing an occupation derivative that is based, at least in part, on a density of each respective term in the job title of the respective training data item across each of the respective occupations in the occupation taxonomy.

In some implementations, the feature weight may be based, at least in part, on both (i) a second value representing the inverse occupation frequency that is determined based, at least in part, on a number of occupations in the occupation taxonomy where each respective term in the job title of the respective training data item is present and (ii) a third value representing an occupation derivative that is based, at least in part, on a density of each respective term in the job title of the respective training data item across each of the respective occupations in the occupation taxonomy. Alternatively, the feature weight may be based on a sum of (i) the second value representing the inverse occupation frequency, and (ii) one-third of the third value representing the occupation derivative.

The Job Search Engine Patent

(US20180107983) SEARCH ENGINE

Application Number: 15296230
Application Date: 18.10.2016
Publication Number: 20180107983
Publication Date: 19.04.2018
Inventors: Seyed Reza Mir Ghaderi, Xuejun Tao, Ye Tian, Matthew Courtney, Pei-Chun Chen and Christian Posse

Abstract:

Methods, systems, and apparatus, including computer programs encoded on storage devices, for performing a job opportunity search. In one aspect, a system includes a data processing apparatus, and a computer-readable storage device having stored thereon instructions that, when executed by the data processing apparatus, cause the data processing apparatus to perform operations. The operations include defining a vector vocabulary, defining an occupation taxonomy that includes multiple different occupations, obtaining multiple labeled training data items, wherein each labeled training data item is associated with at least (i) a job title, and (ii) an occupation, generating, for each of the respective labeled training data items, an occupation vector that includes a feature weight for each respective term in the vector vocabulary, and associating each respective occupation vector with an occupation in the occupation taxonomy based on the occupation of the labeled training data item used to generate the occupation vector.

subscribe to our newsletter

2 Comments

  1. Ryan Rodden

    April 27th, 2018 at 11:30 am

    Nice article Bill. I used to do SEO in the jobs/recruitment niche and it certainly was a challenge. Google’s new search feature really disrupted the market and brought forth a further need to bring in an experienced consultant when it comes to semantic search and developing technologies.

    It reminds me of this video from a few years ago – I think you would really like it considering the topic of this article:

    Glen Cathey – How Semantic Search Changes Recruitment
    https://www.youtube.com/watch?v=mHe9Sma1Y4w

    Reply

    • Bill Slawski

      April 27th, 2018 at 11:46 am

      Thank you, Ryan.

      Very interesting video. I can see why you thought of it. I am recommending that people watch it. Thanks!

Leave a Comment