The First Time I Remember Voice Search from Google
I remember going to a conference in New York City in 2007. I had taken a cab from Penn Station to my hotel, and the front of the hotel was filled with SEOs arriving to attend the Conference. I ran into Loren Baker (The owner of the Search Engine Journal), whom I used to work with, and he was making a phone call to something called Goog 411, which was an automated phone directory that Google decided to run for three years. I had no idea at that point of time how often I would end up using my phone to find information about businesses, or that voice search would become as popular as it is now with phones and speaker search devices.
A Google Patent granted this week is about voice searches for business listings, and refers to an “automated 411 directory assistance system” which might interact with a person in a way that “mimics the manner in which a human operator would interact with a caller.”
Google Is Making Automated Phone Calls by itself now
The patent reminded me of the Duplex system that was demonstrated at the Google I/O conference earlier this year. While that demo showed off calls that could be made on behalf of a person by Google Assistant. This patent talks about calls that you could make to an automated system that could answer and give you information. The patent tells us that it involves:
A conventional automated system includes a speech recognition engine that recognizes the caller’s speech input. The automated system includes a search engine that searches a database for the phone number of the specific business requested by the caller. If the speech recognition engine cannot recognize the caller’s speech input, the recognition engine may ask the caller to repeat the input, ask the caller disambiguating questions, or transfer the call to a human operator.
So, how does a voice search work when someone is looking for businesses by something like a voice query? The patent provides some details that tell us about the interactions that we might have with a computer system that might be taking our queries and searching for information to return to us.
One of the first steps is to ask for “type of business or category information” in addition to location information and possibly an identifier of a specific business. That query may be responded to with a search engine searching a database to find information (e.g., phone number) about a specific business.
Business type information may be provided by user input, that could be information provided by users in past calls or online search activities of users, such as keyword searches and click-throughs. The patent points out this example:
…the system may establish a new business type if a number of users typed in a certain keyword or phrase, and later clicked on specific businesses, indicating that the users associated the specific businesses with the keyword or phrase.
What we don’t seem to be seeing from this patent is what Google learned from their Goog 411 Service that they used to offer voice-based searches on the phone. They told us on the Official Google Blog that they were ending that service in the post, Goodbye to an old friend: 1-800-GOOG-411 in 2010. As that post tells us:
GOOG-411 was the first speech recognition service from Google and helped provide a foundation for more ambitious services now available on smartphones, such as:
- Voice Search – search Google by speaking instead of typing.
- Voice Input – fill in any text field on Android by speaking instead of typing.
- Voice Actions – control your Android phone with voice commands. For example, you can call any business quickly and easily just by saying its name.
This very recently granted patent on Voice search of an automated directory doesn’t seem to describe something that is all that new. Reading about Goog 411, it appears that it provided information about businesses free to callers to enable Google to collect voice data. As Marrisa Mayer stated in an interview with Infoworld:
The speech recognition experts that we have say: If you want us to build a really robust speech model, we need a lot of phonemes, which is a syllable as spoken by a particular voice with a particular intonation. So we need a lot of people talking, saying things so that we can ultimately train off of that. … So 1-800-GOOG-411 is about that: Getting a bunch of different speech samples so that when you call up or we’re trying to get the voice out of video, we can do it with high accuracy.
This patent comes across as something that might be filed so that something like Goog 411 could be launched, which is why the filing date for the patent in 2016 was surprising. The patent is at:
Business or personal listing search
Inventors: Brian Strope, William J. Byrne and Francoise Beaufays
Assignee: GOOGLE LLC
US Patent: 10,026,402
Granted: July 17, 2018
Filed: October 3, 2016
A method of searching a business listing with voice commands includes receiving, over the Internet, from a user terminal, a query spoken by a user, which includes a speech utterance representing a category of merchandise, a speech utterance representing a merchandise item, and a speech utterance representing a geographic location. The method includes recognizing the geographic location with a speech recognition engine based on the speech utterance representing the geographic location, recognizing the category of merchandise with the speech recognition engine based on the speech utterance representing the category of merchandise, recognizing the merchandise item with a speech recognition engine based on the speech utterance representing the merchandise item, searching a business listing for businesses within or near the recognized geographic location to select businesses responsive to the query spoken by the user, and sending to the user terminal information related to at least some of the responsive businesses.
Voice Search Take Aways
I looked this patent up in the PAIR (Patent Application Information Retrieval) Database on the USPTO website to find out more about it. The PAIR database contains dockets of actions in the prosecution of patents, including things such as rejections. There was a rejection of this patent, and an amendment of the claims, before it was granted. There was an earlier version of this patent that was filed in 2015 under the name Business Listing Search, which had been granted. Google withdrew that version of that patent, so that this version could be granted instead. There really didn’t seem to be many differences between the two. I had some expectations that Google might be using the voice data that they had collected from running Goog 411 (and they might have) and was surprised to not see anything about that mentioned in this patent.
There are other patents involving voice search, and those may be worth looking at, but this patent about an automated business listings approach does seem like the kind of thing that someone would file to try to stop others from running a Goog 411 service. If the Goog 411 service was really such a great way to collect voice-based data than keeping others from collecting data like that may not be a bad idea.
A paper published by Google which shares some authors with this patent tells us about the role of Goog 411 in leading to voice search at Google, and is worth a look. it is at: Google Search by Voice: A case study. There is more from Google about Voice search, and I tracked down some more patents that tell us more about what they have been looking at and working upon.
Other Voice Search Patents
Instead of digging too deeply into those other patents, I’m just going to list a few of those here so that anyone interested in digging further into voice search can do so. There are more patents that focus on voice search, but I didn’t see any specifically about searching for businesses.
Automatic language model update
Inventors: Michael H. Cohen, Shumeet Baluja, Pedro J. Moreno Mengibar
Assignee: Google LLC (N/A)
US Patent: 9,953,636
Granted: April 24, 2018
Filed: October 9, 2015
A method for generating a speech recognition model includes accessing a baseline speech recognition model, obtaining information related to recent language usage from search queries, and modifying the speech recognition model to revise probabilities of a portion of a sound occurrence based on the information. The portion of a sound may include a word. Also, a method for generating a speech recognition model, includes receiving at a search engine from a remote device an audio recording and a transcript that substantially represents at least a portion of the audio recording, synchronizing the transcript with the audio recording, extracting one or more letters from the transcript and extracting the associated pronunciation of the one or more letters from the audio recording, and generating a dictionary entry in a pronunciation dictionary.
Speech recognition with attention-based recurrent neural networks
Inventors: William Chan, Navdeep Jaitly, Quoc V. Le, Oriol Vinyals and Noam M. Shazeer
Assignee: Google Inc.
Granted: October 24, 2017
Filed: February 26, 2016
Methods, systems, and apparatus, including computer programs encoded on computer storage media for speech recognition. One method includes obtaining an input acoustic sequence, the input acoustic sequence representing an utterance, and the input acoustic sequence comprising a respective acoustic feature representation at each of a first number of time steps; processing the input acoustic sequence using a first neural network to convert the input acoustic sequence into an alternative representation for the input acoustic sequence; processing the alternative representation for the input acoustic sequence using an attention-based Recurrent Neural Network (RNN) to generate, for each position in an output sequence order, a set of substring scores that includes a respective substring score for each substring in a set of substrings; and generating a sequence of substrings that represent a transcription of the utterance.
Data driven word pronunciation learning and scoring with crowdsourcing based on the word’s phonemes pronunciation scores
Inventors: Fuchun Peng, Francoise Beaufays, Brian Strope, Xin Lei, Pedro J. Moreno Mengibar and Trevor D. Strohman
Assignee: Google Inc.
US Patent: 9,741,339
Granted: August 22, 2017
Filed: June 28, 2013
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for determining pronunciations for particular terms. The methods, systems, and apparatus include actions of obtaining audio samples of speech corresponding to a particular term and obtaining candidate pronunciations for the particular term. Further actions include generating, for each candidate pronunciation for the particular term and audio sample of speech corresponding to the particular term, a score reflecting a level of similarity between of the candidate pronunciation and the audio sample, wherein the said score for the particular term is obtained by using a minimum of individual scores of phonemes comprising the term. Additional actions include aggregating the scores for each candidate pronunciation and adding one or more candidate pronunciations for the particular term to a pronunciation lexicon based on the aggregated scores for the candidate pronunciations.