Relevant Search Results for Longer Documents at Google

by Posted @ Jun 09 2021

Twitter

Ranking of Books and Other Long Documents in Search Results

A newly granted Google patent tells us about information retrieval and searcher interfaces for relevant search results for a query. It focuses on rankings of longer organic content such as books, or newsletters, or catalogs in search results. I have seen results from Google Books mixed in with organic results, which are usually one page at a time, but the results from books are often one large document covering many pages of the same book and could show excerpts from that book in search results. Of course, these results aren’t as different as results from different types of results regarding organic results or news results or local results, or knowledge-based results. But it was interesting seeing a patent that focused on these larger documents in SERPs.

Having a specific process for longer documents like this tells us that content length isn’t specifically a ranking signal because Google wants both longer and shorter documents to show up in search results. This patent describes a way for those longer documents to show up in a meaningful way.

This patent tells us this about search engines, as it introduces the processes that make it work:

Modern computer networks and the Web have made information widely and easily available. Free search engines index many millions of web documents linked to the Internet. A person connected to the Web can enter a query to locate web documents filled with relevant search results.

A category of content that is not widely available on the Web involves more traditional printed works of authorship, such as books and magazines.

These works are not usually available because of difficulty converting printed versions of those works to a digital form.

Optical character recognition (OCR) (the act of using an optical scanning device to generate images of text then converted to characters in a computer-readable format such as an ASCII file) is a known technique for converting printed text to a useful digital form.

OCR systems generally include an optical scanner for generating images of printed pages and software for analyzing the images.

In the description that summarizes how this patent works, it breaks that patent down into features.

A method to return relevant search results from a search engine may involve:

  • Receiving a search query
  • Identifying a document based on the search query
  • Providing relevant search results based on the document

Relevant Search results may include:

  • Images associated with the document
  • Excerpts from the documents associated with the query
  • Links to other excerpts in the document associated with the query

A GUI (graphical searcher interface) may include relevant search results associated with a set of documents.

The search results are likely generated based on a search query.

One of the search results may include:

  • An image associated with the document
  • An excerpt from the document that includes a search term of the search query
  • Links to other excerpts in the document that include a search term of the search query

The graphical searcher interface may include:

  • Links to portions of a document
  • Excerpts from the document, where the excerpt may include an image of text from the document
  • Descriptions of content of the document
  • Information about web documents associated with the document
  • Bibliographic information associated with the document

The GUI may include a page of a document, which includes:

  • A search term
  • A set of links to portions of the document
  • A link to a next or previous page of the document that includes the search term

The GUI may include a:

  • First excerpt, with a portion of text and a thumbnail image
  • Second excerpt with a portion of text and a thumbnail image

The GUI may include:

  • Images from a document including a search term
  • Links associated with images, where the links may permit a larger view of the image
  • Links to other portions of the document

The GUI interface may include information about:

  • A page of a document
  • Links to previously accessed pages, where each link has been generated from a searcher accessing the previous page.

The GUI may include information about:

Previously accessed pages associated with a set of documents
An image associated with one of the documents.

The information may be generated based on a searcher accessing the previously accessed pages.

A computer-readable medium may include instructions for:

Identifying a document based on the search query
Providing a search result based on the document.

The search result may include:

  • An excerpt from the document that includes a search term associated with the search query
  • Links to other excerpts in the document that include a search term associated with the search query

This patent can be found at:

searcher interfaces for a document search engine
Inventors: Siraj Khaliq, Joe Sriver, Frederick G. M. Roebert, William Brougher, Adam Smith
Assignee: Google LLC
US Patent: 11,023,550
Granted: June 1, 2021
Filed: October 26, 2016
Prior Publication Data

Abstract

A method includes receiving a search query, identifying a document based on the search query, and providing a search result based on the document.

The search result includes, for example, an image associated with the document, an excerpt from the document that is associated with the search query, and links to other excerpts in the document that are associated with the search query.
The method may also include providing other information associated with the document.

Returning Relevant Search Results from Larger Documents

More and more types of documents are becoming searchable via search engines.

This includes documents such as books, magazines, and/or catalogs, scanned with their text recognized via OCR.

This patent tells us that it is beneficial to present information about those and other types of documents in a way that is useful to searchers seeking such information. I have seen such search results from sources such as books included on search results, and this patent reminds me of many I have seen like that.

We are told that “systems and methods consistent with the principles of the invention may provide information regarding documents that may be identified as relevant to search queries in a manner that is useful to the searchers who provided the search queries.”

In many ways, these seem similar to other organic search results, but they show information from larger documents such as books and where content relevant to a book might be found. Illustrations from this patent show the process in the patent being used to return excerpts from content from books and other larger documents relevant to search queries.

Exemplary Processing

This patent shows processing beginning with a searcher providing a search term (or a group of search terms) as a search query for searching a document repository. The document repository can include documents available from the Web and/or a database, and the vehicle for searching this repository is a search engine. The searcher may provide the search query via web browser software on a client.

receive search query

The search query may be received by the search engine and used to identify documents (e.g., books, magazines, newspapers, articles, catalogs, etc.) related to a search query.

Many techniques exist for identifying documents related to a search query. For example, one might include identifying documents that contain the search term or synonyms of the search term. In addition, when the search query includes more than one search term, a technique might include identifying documents containing the search terms as a phrase, containing the search terms but not necessarily together, or that contain less than all of the search terms.

review search excerpts

Optionally, the documents may be scored in some manner. For example, the score for a document may be based on information retrieval (IR) score. Several techniques exist for generating an IR score. For example, an IR score for a document may be generated based on the number of occurrences of the search terms in the document text, where the search terms occur within the document (e.g., title, content, footer, header, etc.), or characteristics of occurrences of the search terms (e.g., font, size, color, etc.).

select relevant search results

Search results may be formed based on the documents and their optional scores and presented to the searcher. The search results may include information associated with the documents, such as links to the documents, that may optionally be sorted based on the document scores. The search results may be provided as an HTML document, similar to search results provided by conventional search engines. The search results may be provided according to another format agreed upon by the search engine and the client (e.g., Extensible Markup Language (XML) or PDF).

Searcher Interfaces for Presenting Search Results

Assume that a searcher provides a search query that includes the search term “memory,” and a search was performed based on the search query to identify a set of documents related to the search query.

A search result may include:

  • A document title
  • Author information
  • An excerpt from the document
  • An address associated with the document
  • Optionally Links to other relevant excerpts in the document
  • An image associated with the document

That document title may include a title associated with the document. In addition, the selection of the document title may cause detailed information, possibly in the form of a reference page (described below) or an excerpt page (described below), associated with the document to be presented. For example, the author information may include the name(s) of the author(s) of the document.

searching for memory

An excerpt may include a portion of the document that includes a search term of the search query. Optionally, occurrences of the search term may be visually distinguished (e.g., highlighted) in the portion of the document. An excerpt may also include a page number associated with the excerpt. The selection of the page number may result in the presentation of an excerpt page associated with the excerpt.

relevant search results in excerpts

An address may include an address at which the document is stored. Links may permit one or more other excerpts from the document to be presented to the searcher. An image may include an image of a front cover (or another portion) of the document (if available). The image can include a thumbnail version of the front cover of the document.

A search result may include:

  • Document title and author information
  • A first excerpt from the document
  • A second excerpt from the document
  • Optionally a Link to other relevant excerpts in the document
  • An image associated with the document

Reference Pages That May Be Presented

Assume that a searcher provided a search query that included the search term “memory,” and a search was performed based on the search query to identify a set of documents related to the search query.

A reference page may include:

  • An excerpt from the document
  • A synopsis of the document
  • A jacket or flap description associated with the document
  • Related information
  • Bibliographic information
  • Links to different portions of the document

ranking relevant search results excerpts

An excerpt may include a text from the document that may include a search query search term. The portion of text may correspond to an image of the document text or the text version. The search term’s occurrence may be visually distinguished (e.g., highlighted) in the portion of the text. The searcher can view three excerpts from the document by selecting a selectable object, such as “Next” or “Previous.” In such ways, the searcher may be permitted to view more or fewer excerpts.

Choosiing relevant search results excerpts

A synopsis may include a brief description of the contents of the document. For example, a jacket or flap description may include text from a jacket, cover, or flap associated with the document.

Related information may include information regarding web documents related to the document or an author associated with the document.

Related information may include:

  • Information relating to web document(s) with a review of the document
  • Web document(s) with a biography of the author
  • Other web document(s) related to the document
  • Web document(s) and/or image(s) related to the author
  • News article(s) related to the document or the author or product(s) related to the document

Bibliographic information may include information, such as the ISBN, ISSN, the name of the publisher, the category code that identifies a category of the topical content of the document, the publication date, the title, the name of an author associated with the document, and/or a format (e.g., hardcover, paperback, etc.) associated with the document. Bibliographic information may also include more, fewer, or different pieces of information. Links may include links to various portions of the document. The links may reference the front cover, the table of contents, the index, and/or the back cover of the document.

The reference page may also include an image and/or an advertisement (ad) associated with the document. Image may include an image of, for example, a front cover (or another portion) of the document (if available).

The Image can include a thumbnail version of the front cover of the document. The advertisement may include a set of advertisements associated with a business that sells the document, other documents associated with the author, and/or documents related to this document. The advertisement may also include an advertisement associated with or derived from the search query, other (related) documents, or searcher behavior.

Ads associated with excerpts

A reference page may also include a synopsis about the document, a jacket or flap description associated with the document, related information, bibliographic information, a set of links to different portions of the document, an image associated with the document, and/or an advertisement associated with the document. The reference page may also include a set of excerpts from the document. The excerpts may include portions of text from the document that may include a search query search term. The portions of text may correspond to images of the document text or the text versions. Occurrences of the search term may be visually distinguished (e.g., highlighted) in the portions of text. In this implementation, three excerpts from the document may be presented, or more or fewer excerpts may be presented.

Previously Accessed Pages

The patent tells us that it may be beneficial to provide searchers with easy access to pages of a document that the searchers previously accessed. It may also be beneficial to provide searchers with easy access to pages from different documents that the searchers previously accessed. Either of these would assist searchers in finding information of interest. In addition, techniques exist for tracking pages accessed by searchers.

An excerpt page may also include a set of links associated with previously accessed pages. For example, links may include links to particular previously accessed pages and links to all previously accessed pages. Selection of one of the links may cause an excerpt page similar to the excerpt page to be presented. Conversely, the selection of links may cause a page of previously accessed pages to be presented.

A Page of Previously Accessed Pages Associated with a Document

Documents that return relevant search results may include the document title and author information, an image associated with the document, links to different portions of the document, a set of excerpts associated with previously accessed pages from the document, and an advertisement for the document.

Document title and author information may include a title associated with the document and/or the name(s) of the author(s) of the document. Image may include an image of a front cover (or another portion) of the document (if available).

The image could include a thumbnail version of the front cover of the document. Links may include links to various portions of the document. For example, the links may reference the front cover, the table of contents, an excerpt, the index, and/or the back cover associated with the document. In addition, the links may reference more, fewer, or different portions of the document. For example, the advertisement may include a set of advertisements associated with a business that sells the document, other documents associated with the author, or documents related to this document. The advertisement may also include an advertisement associated with or derived from the search query, other (related) documents, or searcher behavior.

The excerpts may include portions of text from previously accessed pages of the document and may correspond to images of the document text or the text versions. Occurrences of a search term may be visually distinguished (e.g., highlighted) within the portions of text. Each of the excerpts may include a page number associated with the excerpt. In one implementation, selection of the page number may result in a presentation of an excerpt page, such as to excerpt page (FIG. 8), associated with the excerpt. The number of excerpts may be configurable based on time (e.g., all pages accessed within the last 10 hours) or number (e.g., the last 20 pages accessed).

subscribe to our newsletter

Leave a Comment