“Hey Google; New York, New York!”
Google hears a query for “New York, New York.” Does it give directions, play a Frank Sinatra Song, or show tourist style search results? Likely that depends upon the context of that query.
As we are told in a Google patent:
User input can be identified as ambiguous for a variety of reasons. Generally, user input is identified as being ambiguous if the system interprets it as having more than one likely intended meaning, in the absence of attempts to disambiguate the input using the techniques described here. For instance, in the present example, the user input is identified as being ambiguous based on each of the commands possibly corresponding to the input–the user input “Go To New York, New York” can indicate a geographic location (the city of New York, N.Y.), a song (the song “New York, New York”), and a web page (a tourism web page for the city of New York, N.Y.). The commands can be identified as possibly corresponding to the input using any of a variety of techniques, such as polling an application and/or service corresponding to each command (e.g., querying a music player associated with the command “Go To [Song]” to determine whether “New York, New York” is an accessible song on the mobile computing device), accessing one or more groups of permissible terms for each command (e.g., accessing a group of permissible geographic location terms for the command “Go To [Geographic Location]”), etc.
Google has been providing input to search queries to provide unambiguous answers to search queries. This recently granted Google patent looks at the context of queries to try to disambiguate user inputs to make results not ambiguous.
As the patent tells us, this is its purpose:
In the techniques described in this document, the context of a computing device, such as a mobile telephone (e.g., smartphone, or app phone) is taken into consideration in order to disambiguate ambiguous user inputs. Ambiguous user input is input that, in the absence of relevant disambiguating information, would be interpreted by the computing device or for the computing device (e.g., by a server system with which the computing device is in electronic communication) as corresponding to more than one query or command. The ambiguous input may be particularly common for spoken input, in part because of the presence of homophones, and in part because a speech-to-text processor may have difficulty differentiating words that are pronounced differently but sound similar to each other. For example, if a user says “search for sail/sale info” to a mobile computing device, this voice input can be ambiguous as it may correspond to the command “search for sail info” (e.g., information regarding a sail for a sailboat) or to the command “search for sale info” (information regarding a sale of goods). A device might even determine that the input was “search for sell info,” because “sell” and “sale” sound alike, particularly in certain dialects.
How might this search input disambiguation work?
The patent tells us that ambiguous user input may be disambiguated based on a context associated with a mobile computing device (and/or a user of the mobile computing device) separate from the user input itself, such as:
- The physical location where the mobile computing device is located (e.g., home, work, car, etc.)
- Motion of the mobile computing device (e.g., accelerating, stationary, etc.)
- Recent activity on the mobile computing device (e.g., social network activity, emails sent/received, telephone calls made/received, etc.)
Examples of context being disambiguated can include
1. A device that is docked may determine the type of dock it is in, such as via physical electrical contacts on the dock and device that match each other, or via electronic communication (e.g., via Bluetooth or RFID) between the dock and the device. That could tell it if it is in a context as “in car” or “at home” based on such a determination. Because of that,
…the device my then disambiguate spoken input such as “directions,” where the term could be interpreted as geographic directions (e.g., driving directions) in an “in car” context, and how-to directions (e.g., for cooking) in an “at home” mode.
2. In another example, receiving, at a mobile computing device, ambiguous user input that may indicate multiple commands may cause it to determine a current context associated with the mobile computing device that can indicate where the mobile computing device is currently located. That can influence the results provided based on that context.
Advantage of Disambiguating Search Input Based Upon Context
The patent tells us of the advantage of following the processes described in the patent as being:
Permitting users to instruct a mobile computing device to perform the desired task without requiring the user to comply with all of the formalities of providing input for the desired task. As features provided by a mobile computing device have increased, users may be required to provide their input with greater specificity so that the input is properly associated with the intended feature. However, such specificity can be cumbersome and difficult to remember. The described methods, systems, techniques, and mechanisms described in this document can allow a user to provide input using less specificity than formally required for a feature yet still access the intended feature.
The patent is:
Disambiguating input based on context
Inventors: John Nicholas Jitkoff and Michael J. LeBeau
Assignee: Google LLC
US Patent: 9,966,071
Granted: May 8, 2018
Filed: July 1, 2016
In one implementation, a computer-implemented method includes receiving, at a mobile computing device, ambiguous user input that indicates more than one of a plurality of commands; and determining a current context associated with the mobile computing device that indicates where the mobile computing device is currently located. The method can further include disambiguating the ambiguous user input by selecting a command from the plurality of commands based on the current context associated with the mobile computing device; and causing output associated with performance of the selected command to be provided by the mobile computing device.
I had a discussion with a Google speaker (device) this morning that started with a “Hey Google”, but didn’t require me to say that hot word phrase after Google has made some changes announced at the recent Google I/O conference. I asked for sport scores, and then asked questions about them. I’m still learning how best to interact with my speaker version of Google Now, but it is interesting. (Will saying please when we ask for something be helpful?) My morning conversation came to mind as I started reading this passage from this patent:
This document describes techniques, methods, systems, and mechanisms for disambiguating ambiguous user input on a mobile computing device (e.g., mobile feature telephone, smart telephone (e.g., IPHONE, BLACKBERRY), personal digital assistant (PDA), portable media player (e.g., IPOD), etc.). As the features provided by mobile computing devices have increased, the number of commands recognized by a mobile computing device can increase as well. For example, each feature on a mobile computing device may register one or more corresponding commands that a user can type, speak, gesture, etc. to cause the feature to be launched on the mobile computing device. However, as the number of recognized commands increases, commands can converge and make it more difficult to distinguish to which of multiple commands user input is intended to correspond. The problem is magnified for voice input. For example, voice input that is provided with loud background noise can be difficult to accurately interpret and, as a result, can map to more than one command recognized by the mobile computing device. For instance, voice input “example” could be interpreted as, among other things, “egg sample,” “example,” or “exam pull.” As another example, the command “go to” may represent “go to [geographic location]” for a mapping application, and “go to [artist/album/song]” for a media player.
As we are trying to learn how best to interact with our devices and speakers and mobile devices to get the best results from Google, Google is also trying to learn how best to interact with us, and to make sure we are understood when we ask for something. This patent takes steps in that direction. As it tells us:
Using the techniques described here, in response to receiving ambiguous user input, a current context for the mobile device (and/or a user of the mobile computing device) can be determined and used to disambiguate the ambiguous user input. A current context for a mobile computing device can include a variety of information associated with the mobile computing device and/or a user of the mobile computing device. The context may be external to the device and represent a real-time status around the device, such as a current physical location (e.g., home, work, car, located near wireless network “testnet2010,” etc.), a direction and rate of speed at which the device is travelling (e.g., northbound at 20 miles per hour), a current geographic location (e.g., on the corner of 10th Street and Marquette Avenue), and ambient noise (e.g., low-pitch hum, music, etc.). The context may also be internal to the device, such as upcoming and/or recent calendar appointments (e.g., meeting with John at 2:30 pm on Jul. 29, 2010), a time and date on a clock in the device (e.g., 2:00 pm on Jul. 29, 2010), recent device activity (e.g., emails sent to John regarding the 2:30 meeting), and images from the mobile computing devices camera(s).
I often use my phone to navigate to places, and would like to be able to speak to my phone, to make changes to where I am navigating to, such as if I decide to drive past my original destination to go to a different store first, and would like to turn off the navigation to get it to stop telling me to take a U-turn to travel back to that first destination.
This patent is worth spending time going over because it does present some interesting ideas about what might influence how devices might work based on context, as it tells us here:
With the ambiguous user input identified, at step B a current context for the mobile device can be determined. The current context includes information that describes the present state and/or surroundings of the mobile computing device and/or the user of the mobile computing device at the time the input is received. For instance, the current context can include a variety of information related to the mobile computing device and the user, such as information regarding the surrounding physical environment (e.g., available networks, connections to other nearby computing devices, geographic location, weather conditions, nearby businesses, volume of ambient noise, level of ambient light, image captured by the mobile device’s camera, etc.), the present state of the mobile computing device (e.g., rate of speed, touchscreen input activated, audio input activated, ringer on/off, etc.), time and date information (e.g., time of day, date, calendar appointments, day of the week, etc.), user activity (e.g., recent user activity, habitual user activity), etc. The current context can be determined by the mobile computing device using data and sensors that are local and/or remote to the mobile computing device.
Changes depending upon Context
Once upon a time, when you optimized a page for a query, it was likely a query performed by someone sitting at a desk using a desktop computer or a laptop computer. Now it might be someone in a car or on a bus or train, or in the aisles of a store or at a coffee house. When they search for “New York, New York” it may be because they want traffic directions, or to listen to a song, or to read a web page to find out what is happening in downtown.
I remember visiting my sister when she went to school in Manhattan, and she suggested that we find out whether there were any street festivals going on it the city that day. She picked up the phone and dialed 411, and asked an operator. This was about 5 years before there was a World Wide Web to use to find out, and she did get answers from the operators, which surprised me tremendously. I didn’t expect those answers from that source. I would expect now to be able to find a Web page that could tell me about those, but wouldn’t have expected to be able to find information like that using a computer, or a mobile phone, some day in the future. The world is changing.
How prepared are you for changes that mobile devices and search engines will be bringing us?