Increased Role for Structured Data at Google
Recently I wrote about Trust Metrics at Google, which explored some of the ways that Google was using Trust to rank pages in places like Google Custom Search Engines. I noticed recently that Google was enabling the ability for people to use Google Custom Search Engines with Schema (see: Make a topical search engine with schema.org types), which reminded me of Google’s newer experimental searches which ideally everyone should know about.
Structured data and unstructured data are common on the Web, and in places like Web-based search engines, we often see results from unstructured content such as text and images on web pages. We’ve seen signs that Google might start returning results that combine both structured data and unstructured data in things like answer passages for featured snippets.
If you haven’t been including structured data on your pages, in the form of Schema or data in tables, it may be time to consider doing so.
Structured Data in Tables Search at Google
Google has been focusing upon understanding and using Structured Data such as Schema and Data in Tables, in things like featured snippets and knowledge panels and Web Search Results. It’s also possible to see structured data appearing in other searches.
One of these experimental searches is Table Search, which came out of Google’s Webtables project, started in 2008. I came across a white paper called Ten Years of WebTables, by Michael Cafarella, Alon Halevy, Hongrae Lee, Jayant Madhavan, Cong Yu, Daisy Zhe Wang, and Eugene Wu. The paper describes how the Webtables project started, and turned out, and what it may lead to in the future. It’s worth reading through, and I love that Google is providing information to us about projects like this one.
Here is a look at an example table search result (see many like this one from ecommerce sites):
Structured Data in Datasets Search at Google
An even more recent experimental search at Google is one that searches through sites that have set up dataset schema on their pages. The Dataset Search Beta is something that many scientific sites (including sites that individual scientists might consider setting up and using. Google describes how to apply to be included in this experimental search at: Structured data markup for datasets. I was looking through the papers that will be presented early this month at the Web Conference in Los Angeles, and noticed that there was one about this Dataset search, titled Google Dataset Search: Building a search engine for datasets in an open Web ecosystem by Natasha Noy, Matthew Burgess, and Dan Brickley. It describes why Google decided to release a datasets search engine, and what they have learned while doing so. The abstract for the paper is a good introduction to it:
There are thousands of data repositories on the Web, providing access to millions of datasets. National and regional governments, scientific publishers and consortia, commercial data providers, and others publish data for fields ranging from social science to life science to high-energy physics to climate science and more. Access to this data is critical to facilitating reproducibility of research results, enabling scientists to build on others’ work, and providing
data journalists easier access to information and its provenance. In this paper, we discuss Google Dataset Search, a dataset-discovery tool that provides search capabilities over potentially all datasets published on the Web. The approach relies on an open ecosystem, where dataset owners and providers publish semantically enhanced metadata on their own sites. We then aggregate, normalize, and reconcile this metadata, providing a search engine that lets users
find datasets in the “long tail” of the Web. In this paper, we discuss both social and technical challenges in building this type of tool, and the lessons that we learned from this experience.
The ability to share structured data on a source like this makes some of the early dreams of a Semantic Web seem like possble future. It won’t happen overnight, but the steps are being built that may lead to such a Semantic Web.
While a good place to find information about structured data such as Schema is on Schema.org, often when Google implements something that might cause rich results or lead to inclusion in something like an experimental search like the Dataset Search, they provide a writeup with many more details on a Google Developer page, such as their page on on Dataset Schema.
If you are in a field which produces a lot of data that you want people to search through and share, setting up dataset schema on your website, and submitting your site so that it can be included in the Dataset Search Beta would be a good idea.
Two of the authors of the paper for the Web Conference 2019 wrote an introductory post about the dataset search in 2017 titled Facilitating the discovery of public datasets. I like the last paragraph of that post:
Our ultimate goal is to help foster an ecosystem for publishing, consuming and discovering datasets. As such, this ecosystem would include data publishers, aggregators (in the form of large data repositories that provide additional value by cleaning and reconciling metadata), search engines that enable data discovery of the data, and, most important, data consumers
Here is an example of the results of a dataset search (Not limited to only scientific sites, but those are there.):
I referred to a couple of papers from Google that tell us more about how projects involving Table Search and Dataset Search at Google work. I mentioned Custom Search Engines at the start of this post, and would really like to see a similar paper that might tell us more about the use of those at Google.
It’s nice to see things like these case studies on the use of structured data on sites.
Are you using Structured Data on your pages? It is likely that the search engines will be paying more attention to such data.
Added 5/2/2019 at 1:46 pm (pdt): Just noticed a post from the Official Google webmasters blog that looked worth adding to the end of this post: Monitoring structured data with Search Console
Last updated: May 23, 2019.