User-Generated Content and Machine Learning at Google

Posted in: SEO

Join thousands of marketers to get the best search news in under 5 minutes. Get resources, tips and more with The Splash newsletter:

Is Google the New Home for Machine Learning on the Web?

I recently wrote about a patent from Google describing machine learning to identify opinions in news articles in the post Opinion News Found By Machine Learning at Google

The use of machine learning is growing at Google. In this patent, Google may use machine learning to identify other types of content other than just opinion pieces. This new patent tells us about the identification of user-generated content using machine learning.

We are told expressly that the patent is about “leveraging machine learning to predict user-generated content.”

Why look for user-generated content on the Web? The patent points out that:

User-generated content can be used to obtain information about various entities.

Such user-generated content can be obtained, for instance, by engaging a plurality of users in a contribution experience.

Machine learning can help us understand contribution experiences such as a question-and-answer system where questions and/or other prompts involving various entities are provided to the users. The users are then instructed to respond to the questions and/or prompts.

Related Content:

Those user responses can be used to make inferences associated with attributes of the various entities. (I like the mention of attributes in the patent here.)

Those contribution experiences could require a significant number of user responses before an accurate inference can be made regarding an entity.

So, many users may be required to tell that a particular entity possesses an attribute so that an accurate inference regarding the possession of the attribute by the entity can be made.

How do machine learning and these questions and answers help us learn about entities?

The method from the patent includes receiving, by one or more computing devices, first entity data associated with an entity.

This first entity data can include user-specified data associated with an entity’s attribute (attributes of entities can include dates and other values associated with them.)

The method then includes inputting, by one or more computing devices, the first entity data into a machine-learned content prediction model.

The patented method further includes receiving as the output of the machine-learned content prediction model, by the one or more computing devices, inferred entity data comprising inferred data descriptive of the entity’s attribute.

This patent can be found at the USPTO at:

Leveraging machine learning to predict user-generated content
Inventors: Arun Mathew, Kaleigh Smith, Per Anderson, and Ian Langmore
Assignee: Google LLC
US Patent: 10,878,339
Granted: December 29, 2020
Filed: January 27, 2017

Abstract

Systems and methods of leveraging machine learning to predict user-generated content are provided.

For instance, first entity data associated with an entity can be received.

The first entity data can include user-specified data associated with an attribute of the entity.
The first entity data can be input into a machine-learned content prediction model.

Inferred entity data can be received as the output of the machine-learned content prediction model.

The inferred entity data can include inferred data descriptive of the attribute of the entity.

One example of this patent’s example tells us about using machine learning better to understand user-generated content about one or more entities.

The process can determine various attributes about one or more entities.

So user-specified entity data about an entity’s attribute can be received.

This entity data could include user responses to an information collection task provided to one or more users.

That information collection task may ask users to respond to questions about an entity.

That entity data may be provided as input to a machine-learned content prediction model.

The machine-learned content prediction model may include logistic regression.

Examples of Entities Asked About to Learn More About Them

This might make more sense to a reader of the patent if it provided some specific examples.

We are told that the machine-learned content prediction model can provide as output inferred entity data associated with the entity’s attribute.

The inferred entity data may include information associated with predicted or estimated results of the information collection task.

More particularly, the information collection task may be associated with a question and answer system or other contribution experience used to collect information associated with an entity.

This information collection task may be any suitable task that prompts users to provide an answer response associated with an entity’s attribute.

These entities can be any suitable entities, such as a business, restaurant, movie, song, book, video, product, or any other suitable entity for which descriptive information can be obtained.

That information collection task may be associated with a survey, questionnaire, etc., associated with the entity.

The information collection task can be a question provided to the users associated with the entity’s attribute.

The questions may be a “Boolean question” having possible answers of “true” or “false” (or “unsure”).

For example, an information task can include a question asking whether a particular restaurant provides a romantic atmosphere or whether the restaurant provides an accommodating atmosphere for large groups.

In this way, a user can provide a true response to indicate that the entity possesses the attribute or a false response to indicate that the entity does not possess the attribute.

I have been asked questions about businesses asked related to Local Guide questions in the Google My Business program.

The user-specified entity data can include data indicating the user response(s) to the provided information collection tasks.

That kind of user-specified entity data can be provided as input to the machine-learned content prediction model.

The user-specified entity data can be provided to the logistic regression of the content prediction model.

And, the logistic regression can further receive global entity data as input.

That global entity data may include any:

  • Suitable structured or unstructured data associated with the entity
  • One or more additional entities
  • A geographic area (e.g. city, county, area code, country, etc.) in which the entity is located
  • And/or other suitable data.

That global data can include:

    • Data indicative of user responses to various other information collection tasks associated with the entity and/or the additional entities
    • Various attributes of the entity and/or the additional entities
    • Keywords associated with the entity and/or the additional entities
    • Etc.

.

What Do People Learn about Entities?

The global entity data may be associated with a profile of the entities that describe various aspects of the entities.

Global entity data can be obtained from various suitable databases, such as databases associated with a geographic information system.

And the global entity data can be obtained from suitable websites.

That logistic regression can be configured to output a predicted or estimated “best guess” response rate.

The “best guess” response rate could predict user responses to the information collection task as the number of user responses approaches infinity.

The response rate could be a “true rate” that specifies a predicted ratio of “true” responses to the total amount of responses to the information collection task as the number of responses approaches infinity.

The response rate to the questions may be determined based at least in part on the entity data and/or the global entity data.

Correlations may be identified within the global entity data. Those correlations may indicate a likelihood of an entity’s attribute-based, at least in part, various signals within the global entity data.

For example, it may be determined that an entity categorized as a microbrewery (as specified by the global entity data) generally receives high “true” response rates to an information collection task asking whether the entity provides a good atmosphere for groups.

A logistic regression associated with an information collection task asking whether a subject microbrewery is good for groups can consider such correlation when determining a predicted response rate for the information collection task.

In this way, the response rate can be determined based at least in part on attributes of the subject entity that are shared with or similar to attributes of various other entities.

The patent also tells us that the machine-learned content prediction model can further include a beta-binomial model coupled to the logistic regression.

As such a model, the logistic regression output (e.g., the predicted response rate) can be provided as input to the beta-binomial model.

And the user-specified entity data can further be provided to the beta-binomial model as input.

That beta-binomial model can be configured to infer entity data associated with the entity and the information collection task.

The beta-binomial model can be configured to use a confidence score associated with the predicted response rate.

The beta-binomial model can be configured to determine inferential statements based at least in part on the response rate and/or confidence score.

The beta-binomial model can be configured to output an inferential statement indicating a percentage of likelihood that a “true” response rate to the Boolean question of the information collection task will be greater than some threshold.

The beta-binomial model may output an inferential statement that indicates that the “true” response rate to the Boolean question of the information collection task will be under some threshold.

The beta-binomial model may be configured to generate a probability density function specifying the probabilities of various response rates associated with the information collection task.

That output of the beta-binomial model may be used to determine the attribute associated with the information collection task.

The inferred entity data can determine whether the entity possesses the attribute associated with the information collection task.

For example, the information collection task asking about the entity may provide a good atmosphere for groups that can be answered positively or negatively based at least in part on the inferred entity data output by the beta-binomial model.

The attribute can be determined based on whether the inferred entity data meet some suitable criteria.

For instance, the criteria can be associated with the inferential statement output by the beta-binomial model.

Crowdsourcing User Generated Content

The information collection task may be associated with a question and answer system where several information collection tasks are provided to many users to “crowdsource” information collection associated with a plurality of entities.

Examples of the present disclosure can be applied to a plurality of information collection tasks within the question and answer system.

The utilities determined for the information collection tasks can prioritize and/or rank the information collection tasks.

These information collection tasks can be provided to subsequent users by prioritization.

For example, an information collection task having a higher utility can be provided to a user before a provision of an information collection task having a lower utility.

In this way, information collection tasks for which a subsequent additional response will have a greater effect on determining an attribute for an entity can be prioritized in the provision of the information collection tasks.

Leveraging Machine Learning to Predict User Generated Content

User-generated content can be used to obtain information about many entities.

We can find out information about those entities by engaging several users in a contribution experience.

For example, those contribution experiences can include a question-and-answer system wherein questions and/or other prompts relating to various entities are provided to the users. The users are instructed to respond to the questions and/or prompts.

Those user responses can be used to make inferences associated with attributes of the various entities.

The contribution experiences may require a significant number of user responses before an accurate inference can be made regarding an entity.

Many users may be required to indicate that a particular entity possesses an attribute so that an accurate inference regarding the possession of the attribute by the entity can be made.

For example, the question and answer system can be a survey, questionnaire, etc., provided to the users to obtain crowd-sourced information relating to the entity.

Entities involved can be any suitable entity, such as a:

  • Geographic location
  • Point of interest
  • Business
  • Restaurant
  • Landmark
  • Song
  • Movie
  • Video
  • Book
  • Product
  • Any other suitable entity for which information can be obtained via the question and answer system

In addition to having people answer questions about businesses and other entities, Google may find other ways to learn about entities, as I wrote about in the post Search Engine Queries May be Used to Identify Entity Attributes.

I also covered that topic in the post Google Adds Entity Attributes to its Knowledge Base from Queries

It makes sense that Google may find other ways to ask questions and learn about entities, especially when they have a large pool of cooperative users to work with, such as Google Maps Local Guides.

It shouldn’t come as a surprise that a search engine will use as many sources of information that it can to learn more about real-world entities that it might be indexing, such as entities.

What Kind of Entity Data Might Google Collect?

It can include:

  • Any suitable structured or unstructured data associated with the entity
  • One or more additional entities
  • A geographic area (e.g. city, county, area code, country, etc.) in which the entity is located
  • Other suitable data

These can also include data that is indicative of user responses to many other information collection tasks associated with the entities, attributes of the entities, or keywords associated with the entities.

The patent provides many details about collecting information associated with entities related to a website. As I pointed out, Google may learn about entities by getting a sense of queries related to those and what a searcher may look for when they create queries about a site or entity associated with it.

I have written in the past about Gooogle patents covering User Generated Content, like the post Do Searchers Find Value in User-Generated Content Search Results?. So it shouldn’t come as a surprise that Google will use a machine learning approach to collect more data about entities, especially since it has an audience that will help it learn about them.

We have seen Google using entity information through the use of knowledge graphs more and more in search, as I wrote about in Ranked Entities in Search Results at Google

In preparation for such an SEO change, it makes sense for someone working on a site to learn as much about the entities involved in that site and try to include as much positive information about those entities on that site as possible.

Search News Straight To Your Inbox

This field is for validation purposes and should be left unchanged.

*Required

Join thousands of marketers to get the best search news in under 5 minutes. Get resources, tips and more with The Splash newsletter: