Building Knowledge On the Web
A patent granted to Google about Structured Data recently told us about how JSON-LD could be used to answer queries about books. I wrote about that patent in a post titled Google Patent on Structured Data Focuses upon JSON-LD. I was interested in finding more about what Google might be doing with Structured Data, and I came across a presentation that looked like it was worth sharing. The presentation was called Leaving No Valuable Data Behind: the Crazy Ideas and the Business by Xin Luna Dong.
Building Knowledge, From Google To Amazon
An old resume of Xin Luna Dong tells us about how she was involved in Google’s Knowledge Vault project, and that she was the inventor of Knowledge-Based Trust. I wrote about the Knowledge Vault project in the post Good Bye Knowledge Graph, Hello Google Knowledge Vault? It was a project that appeared to be an effort to improve upon Google’s Knowledge Graph by finding ways to extract Data from the Web with more accuracy than the knowledge graph was achieving. Knowledge-based trust was a way of scoring websites based on how accurately they answered a set of facts. The paper behind that approach can be found at: Knowledge-Based Trust: Estimating the Trustworthiness of Web Sources. Note that this is a paper, and not a patent (I did search to see if there was a related patent, but didn’t find one.)
I came across Xin Luna Dong after seeing her “Leaving No Valuable Data Behind” presentation, which focuses upon the ideas that came out as she was improving Google’s Knowledge Graph in the creation of the Knowledge Vault and Knowledge-Based Trust, and the business ideas that flew out of those. I also noticed that she now works for Amazon after doing those interesting things at Google. Her focus is still on building knowledge on the Web but now appears to be building a Product Knowledge Graph at Amazon (December 2017 Presentation). It’s interesting seeing how she distinguishes between a knowledge graph and a product graph. It is also interesting seeing the challenges she describes that come with building a product Knowledge Graph. Part of the answer appears to be using (1) Structured Data Sources and entity resolution to find information to use in a product graph (Such as comparing Amazon Movies and IMDB.com information.) She also points out the use of (2) Semi-Structured Data, such as DOM extraction from web pages, and using information from Product profiles from Amazon.
Building Knowledge Take Aways
Xin Luna Dong was one of the leaders at Google in projects such as the Knowledge Vault (which explored extracting correct factual information from the Web) for a follow-up to the Knowledge Graph, and a process such as Knowledge-Based Trust, which worked on setting a metric to determine which pages on the Web tended to be the most accurate when it comes to sharing factual information.
Seeing presentations from her about those projects and the business decisions behind them at Google is interesting, and seeing a presentation from her which describes the differences between a knowledge graph and a product graph at her new employer, Amazon is also interesting because it shows us how information collected by Amazon can be used to inform that product graph using sources such as Amazon Movies and Amazon Product profiles.
On Xin Luna Dong’s Home page are a number of tutorials and papers and presentations she has worked on. She was a teaching assistant to Alon Halevy at the University of Washington before he joined Google and became the head of Structured Data there. Her papers on Knowledge extraction, Knowledge fusion, Knowledge mining, and Data Integration all look interesting. Given the growth of things such as the use of Structured Data on Web Sites, and the Extension of Schema Vocabulary into new industries, and the battles between personal assistants such as Amazon’s Alexa and the Google Now Assistant, it is worth following along with the information that Xin Luna Dong is sharing about what she has done at Google, and what she will be doing at Amazon.
There is a great slide in the “No Valuable Data Left Behind” presentation, which shows that popular sites aren’t always accurate sites, which I had to share:
Building knowledge on the Web appears to be an important part of its future. How accurate is the knowledge you are getting? Where are you getting it from?