We’ve been seeing a few patents from Google about how their automated assistant works. Recent Posts I’ve written about those patents include:
- April 4, 2019 – Conversational Search Queries at Google (Context from Previous Sessions) – How Google may tag content to make it easier to respond to conversational queries, using contextual data from previous conversational queries.
- November 26, 2019 – Google Automated Assistant Search Results – About limitations of dialog with a person using an automated assistant, and how Google may try to work around those limitations, with a look at some of the adaptions that Google has been making to present search results to searchers.
- December 13, 2019 – The Google Assistant and Context-Based Natural Language Processing– Introduces the concept of dialog systems when discussing the Automated Assistant, which is a technical terms referring to a persons interactions with a voice-based system, often known as in some instances as “chatbots.” Provides some insights into query templates and user-defined entities and contexts, and the rules that a dialog system may follow when responding to a user query.
A new patent from Google granted the last week in February combines a number of the ideas from some of those previous patents to explain more about how an Automated Assistant may work:
Humans may engage in human-to-computer dialogs with interactive software applications referred to herein as “automated assistants” (also referred to as “chatbots,” “interactive personal assistants,” “intelligent personal assistants,” “personal voice assistants,” “conversational agents,” etc.). For example, humans (which when they interact with automated assistants may be referred to as “users”) may provide commands, queries, and/or requests (collectively referred to herein as “queries”) using free form natural language input which may include vocal utterances converted into text and then processed and/or typed free form natural language input.
This patent can cover a range of different types of automated assistants but seems to primarily focus upon smart speakers which respond vocally to questions and queries from humans.
This patent also tells us that it is geared towards interactions with children and that it may take steps to keep that kind of interaction work well with children.
The patent expresses the problems that it was intended to solve by giving us a hypothetical example:
The focus of assistant devices on vocal interaction makes them especially suitable for use by children. However, many features built into or otherwise accessible using commercially-available automated assistants may not be suitable for children.
- For example, if a child were to ask if the Tooth Fairy were real, a conventional automated assistant may be based on documents located online, reply, “No, the Tooth Fairy is an imaginary character evoked by parents to incentivize children to pull loose teeth.”
- As another example, an automated assistant may be configured to engage with independent agents, such as third party applications, that enable users to order goods/services, such as pizza, movies, toys, etc.–this type of capability could be used by children who may not be able to judge all the consequences of their actions.
- Additionally, conventional automated assistants are designed to interact with people having fully-developed vocabularies. If a user’s input is not sufficiently clear, the automated assistant may request clarification and/or disambiguation, rather than attempt to resolve the user’s request based on a “best guess” as to the user’s intent. Such a long back and forth may cause excessive consumption of various computer and/or network resources (e.g., as a result of generating and rendering the requests for clarification and/or processing the resulting input) and/or maybe frustrating for children with limited vocabularies.
Automated Assistants for Children
The patent tells us that it will adjust how it behaves based on a detected age range or vocabulary level of someone who is engaging an automated assistant. It may use a specific mode, such as a “kid’s mode” when interacting with children, and a “normal” or “adult” mode when interacting with someone who has not been deemed to be a child (teenagers and older.) The patent tells us that an automated assistant may be capable of transitioning between a series of modes, each associated with a specific age range or a number of vocabulary levels.
It may do this when it attempts to:
(i) Recognize the user’s intent
(ii) Resolve the user’s intent
<iii) Decide how the results of resolving the user’s intent are output.
An Automated assistant may request clarifications in some instances when:
- A user’s speech is less clear than that of the average user of such devices, (e.g. when the subsequent user is a young child, has a disability which affects the clarity of their speech
- A User is a non-native speaker
Age and vocabulary levels are not the only things that an assistant may attempt to accommodate for. The patent tells us that It may try to understand other user characteristics, such as gender, location, etc.,
Those may also influence an assistant’s behavior. The automated assistant will try to be aware of young users with more advanced vocabularies, and older users with adult sounding voices but limited vocabularies.
Like many patents, this one contains some options that may be implemented, and it tells us that:
In some implementations, parents or other adults (e.g., guardians, teachers) may manually transition the automated assistant into a kid’s mode, e.g., on-demand and/or during scheduled time intervals during which children are likely to be engaged with the automated assistant.
An automated assistant may try to automatically detect a user’s age range by looking at characteristics such as:
A machine learning model may be used to try to predict the age of a user at well.
We are also told that voice recognition may be used by automated assistants to distinguish between and identify individual speakers. (I have added a speaker to my house, and Google had me repeat some lines to train on my voice, so it seems that they are doing this.)
What Impact Might an Age Determination have on an Automated Assistant?
- The automated assistant may be less rigid about what utterances will qualify as invocation phrases than if the speaker is determined to be an adult or otherwise a proficient speaker.
- One or more on-device models (e.g., trained artificial intelligence models) may be used, e.g., locally on the client device, to detect predetermined invocation phrases.
- If the speaker is detected to be a child, an invocation model specifically designed for children may be employed.
- If a single invocation model is used for all users, one or more thresholds that must be satisfied to classify a user’s utterance as a proper invocation may be lowered, e.g., so that a child’s mispronounced attempt at invocation may
- nonetheless be classified as a proper invocation phrase.
I just asked my assitant on my phone what a “Giddy Gat” sounds like, and it recognized that I was asking about a kitty cat.
Query Understanding Models
An Automated Assistant may understand the intent behind a query differently based upon the age range of a user of an automated assistant, too:
As another example, the user’s estimated age range and/or vocabulary level may be used in detecting the user’s intent. In various implementations, one or more candidate “query understanding models,” each associated with a specific age range, may be available for use by the automated assistant. Each query understanding model may be used to determine the user’s intent but may operate differently than other query understanding models. A “standard” query understanding model designed for adults may have a particular “grammatical tolerance” that is lower than, for instance, a grammatical tolerance associated with a “kid’s” query understanding model. For example, the kid’s query understanding model may have a grammatical tolerance (e.g., a minimum confidence threshold) that allows the automated assistant considerable leeway to “guess” the user’s intent even when the user’s grammar/vocabulary is imperfect, as would typically be the case with young children. By contrast, when the automated assistant selects the “standard” query understanding model, it may have a lower grammatical tolerance and therefore may be quicker to seek disambiguation and/or clarification from the user, rather than “guessing” or selecting a relatively low-confidence candidate intent as the user’s actual intent.
Speech to Text Processing
On my phone, I can see when I ask my assistant to “meow like a giddy.” that it is transcribing that request as “meow like a kitty” and it gives me the sound of a cat.
The patent says that in some cases it might reject a request with a statement such as “I’m sorry, I didn’t catch that.”
However, the patent tells us that if it detects that a child is making such a request that it may understand such a request:
likewise, a natural language understanding module may utilize a child-centric query understanding model to interpret the text “giggy” as “kitty,” whereas if an adult-centric query understand model were used, the term “giggy” may not be interpretable.
Interesting that it understands me calling a kitty a “giddy” and answers me.
The patent tells us that it may be more proactive when working with children, and willing to try to understand what is being asked of it:
Generally speaking, an automated assistant configured with selected aspects of the present disclosure may be more proactive when engaging with children than conventional automated assistants. For example, and as described previously, it may be more willing to “guess” what a child’s intent is. Additionally, the automated assistant may be more lax about requiring invocation phrases when it detects a child speaker. For example, in some implementations, if a child shouts an animal name, the automated assistant may, upon determining that the speaker is a child, forego the requirement that the child speak an invocation phrase, and may instead mimic a sound the animal makes. Additionally or alternatively, the automated assistant may attempt to “teach” a child proper grammar, pronunciation, and/or vocabulary, e.g., in response to a grammatically incorrect and/or mispronounced utterance.
Requests that are not suitable for children
While an automated assistant may be more tolerant of children, it may not provide information that is not appropriate for children either, based upon a predicted age range of a user. It may:
- Limit some online corpora of data that it might use to retrieve information responsive to a user’s request, whitelisting some kid-friendly sites, and blacklisting some kid unfriendly sites
- A request to play music may limit the music played to a library of kid-friendly music, rather than an adult-centric library that includes music commonly targeted towards older individuals
- It may not require specification of a playlist or artist and may just play music appropriate for the user’s detected age
- An adult’s request to “play music” may cause an automated assistant to seek additional information about what music to play
- Actions, such as ordering goods/services through third-party applications, may not be suitable for children, and those may be refused when engaging with a child, (refusing to perform various actions that might, for instance, cost money, or facilitate engagement with strangers online)
The voice used by an automated assistant may b different when interacting with a child, such as the voice of a cartoon character, and it may speak at a slower pace.
Different natural language models may be used based upon predicted ages of users of automated assistants as well. For adults, longer and more complex sentences may be used. For Children, the automated assistant may speak in more complete sentences to encourage children to use those too. Words that are complex may be fully explained by the automated assistant when engaging with a child as well.
The automated assistant may also choose to use slang and terms suitable for children as well.
A translation service, such as an “adult-English-to-simple-English” translation system could also be used when returning information to a younger user from a web page.
Data for Adults About Children Users of Automated Assistants
The patent tells us that it could have a feature built into it to tell an Adult about a child’s use of an automated assistant:
In some implementations, the automated assistant may be configured to report on a child’s grammatical and/or vocabularic progress. For example, when the automated assistant determines that it is engaged with an adult, or especially when it recognizes a parent’s voice, the adult/parent user may ask the automated assistant about one or more children’s progress with interacting with the automated assistant. In various implementations, the automated assistant may provide various data in response to such inquiries, such as words or syllables the child tends to mispronounce or struggle with, whether a tendency to stutter is detected in a child, what questions the child has asked, how the child has progressed in interactive games, and so forth.
This Automated Assistant patent can be found at:
Automated assistants that accommodate multiple age groups and/or vocabulary levels
Inventors: Pedro Gonnet Anders, Victor Carbune, Daniel Keysers, Thomas Deselaers, and Sandro Feuz
Assignee: GOOGLE LLC
US Patent: 10,573,298
Granted: February 25, 2020
Filed: April 16, 2018
Techniques are described herein for enabling an automated assistant to adjust its behavior depending on a detected age range and/or “vocabulary level” of a user who is engaging with the automated assistant. In various implementations, data indicative of a user’s utterance may be used to estimate one or more of the user’s age range and/or vocabulary level. The estimated age range/vocabulary level may be used to influence various aspects of a data processing pipeline employed by an automated assistant. In various implementations, aspects of the data processing pipeline that may be influenced by the user’s age range/vocabulary level may include one or more of automated assistant invocation, speech-to-text (“STT”) processing, intent matching, intent resolution (or fulfillment), natural language generation, and/or text-to-speech (“TTS”) processing. In some implementations, one or more tolerance thresholds associated with one or more of these aspects, such as grammatical tolerances, vocabularic tolerances, etc., may be adjusted.
The detailed description part of the patent provides many more details and examples on how age or vocabulary related moded is selected, and how it might be trained with a users voice to better understand invocations of requests for information, and how to respond to such requests sith an appropriate language proficiency.
The patent discusses text to speech involving voice synthesis with an automated assistant.
There is also information about natural language understanding, and it is recommended that the detailed description part of the patent be read to better understand how it is trying to communicate with a human to better communicate with them. Having seen how an automated assistant might have flexibility built into it to make it usable for children shows the efforts that Google is undertaking to try to make such a system useful to families.