Thoughts around Knowledge Graphs, the semantic nature of language, and the two main types of word ambiguity.
Depending on the dictionary you use to look this up, the word “word” can have 13 meanings or more, but no matter the final count (or why one dictionary would be different than another…), what is certain is that the total number isn’t 1. It’d be really hard for us to understand what people are saying if we were to take words just for what they are, words. A word carries meaning, and this meaning is different based on the context for that word. Ultimately, meaning is what matters, words are just a vehicle.
Linguists, NLP (Natural Language Processing) practitioners, developers and even search engines users, they are all somewhat aware of the concept of “word ambiguity” (polysemy). You look for “beds” and in your search results you do find beds (the ones we sleep in) but also flowerbeds and other types of beds. Cases like this one are only minimally annoying because the more frequent meaning of the word you’re using is very likely going to be also the one you’re interested in…but what if it’s the other way around? What if you’re looking for a “house” in the sense of a family dynasty instead of a building? What if you’re searching using a word that has 2 or 3 very common meanings instead of just 1 (like, for instance, “light”). It goes without saying that no technology can effectively support any activity that involves content, language, documents, communications without moving beyond words. This journey to happy document-processing land can only be completed if we are able to truly understand what the words in a document are trying to convey.
There’s a second type of ambiguity, one that’s focused on lack of information, but before getting into that I need to expand a little more on the first type; if I only say one word then it is impossible to determine what I’m talking about, but I can offer context for that word by placing it in a sentence (“house, like in I live in a lovely house”, “house, like in I studied the history of the house of Tudor”); another way that doesn’t require thinking of an actual example is to offer a synonyms chain: “house…apartment”, “house…family”. A synonyms chain, which sometimes require more than just one additional word to solve an ambiguity, is a common device when semantics are applied to NLP and Computational Linguistics (a science focused on technologies that process language, documents, communication). Semantics, that is the act of looking at words as concepts instead of just words, when integrated with NLP leads to Natural Language Understanding (NLU). NLU’s goal is to analyze text to extract concrete, clear information, not simply see a document as a sequence of characters.
The second main type of ambiguity I was mentioning above has to do with the fact that every concept carries information that goes beyond the concept itself. For instance, a dog is not just a dog (in terms of something that has very specific features like 4 legs, a tail, etc), a dog is also a mammal, an animal, a living being, and so on. All of those other things a dog is also come with features that imply a lot more (e.g., a dog is a mammal, therefore it doesn’t lay eggs). By the same token, a house/apartment is a man-made object, while a house/family is an abstract concept indicating a group of people related to each other. Being aware of a concept’s semantic ancestors (hyperonyms chain, or superordination) – the features it inherits from them -- is necessary to understand content, because real documents will rarely explain the word “house” is referring to an apartment, we’ll simply read “I live in a house”, and we are expected to know which meaning of “house” the one you can live in. Why is this valuable? Because when you search for different types of buildings, you want to be able to just write “buildings” and find every document that’s talking about houses, villas, castles, etc. And because if you’re interested in animals you can’t possibly write the name of every animal, you just want to search for “animals”. It’s valuable because the real world of content is full of implications that are not explicitly spelled out.
These relations and features belonging to different concepts, as well as the words that are used to represent those concepts in language, are what makes up a Knowledge Graph. This modern-day repository, halfway between a dictionary and a taxonomy of reality, is what AI (Artificial Intelligence) technology uses to understand documents at a deeper, more human-like level than the alternative, simpler approach that looks at words just as words. This highly detailed level of understanding is what unleashes new experiences and advanced forms of automation that were once hard to reach.