A quick word on Hybrid AI in Natural Language Processing: the new approach to Machine Learning applied to Text Analysis

  • 27 October 2021
  • 0 replies

Userlevel 4
Badge +1


Any solution aimed at processing unstructured data (i.e., language, specifically text in most cases) is today based on one of two main approaches: Machine Learning and Symbolic. Both can be delivered in multiple ways (different algorithms in the case of ML, from shallow linguistics to semantic technology in the case of Symbolic), but not much has been done so far in the realm of hybrid approaches. While choosing one over the other is always going to present a compromise between advantages and drawbacks (higher accuracy coming from Symbolic, more flexibility derived from ML), Hybrid AI—or Hybrid NL—is a revolutionary path to solve linguistic challenges that can leverage the best of both worlds and, ultimately, make your NLP practices graduate to NLU (Natural Language Understanding). I won’t spend time on explaining how ML or Symbolic work since there’s a ton of literature about that already, I’ll focus this page on Hybrid instead.



To frame this conversation in a practical fashion, we must look at two aspects: development, and workflow. At the development stage, going hybrid means that a Symbolic solution will support the creation of a Machine Learning model in order to either reduce the effort or enhance its quality. On the other hand, at the production stage, our workflow can be supported by both ML and Symbolic to deliver a more precise outcome. In a project that considers the Machine Learning piece the pivot of the solution, the first type of integration places Symbolic at the top (before even creating a Machine Learning model), and the second one at the bottom (curating or enhancing the final output). Naturally, both of these hybrid ways can be present at the same time in a linguistic project.

An example of how Hybrid is implemented at the development stage: let’s say we need to classify documents that talk about different sports, properly labeling them with a tag describing the sport discussed in each document; unless we have millions of pre-tagged documents to train our Machine Learning algorithm, simply looking at the words present in a document will likely deliver a poorly-performing model (since only a small percentage of words are going to be exclusive of a sport or another). A Symbolic solution can be placed in front of the training phase, pre-selecting and pre-tagging documents with the goal to make things clearer and easier for the ML algorithm, which will ultimately produce a more accurate model. Such a Symbolic solution can, often times, take very little effort if the right technology is chosen (e.g., semantics based on a solid knowledge graph). In other words, at this stage, Symbolic technology can add a sizable amount of information to every document that the ML algorithm is going to analyze for training purposes (topics, concepts, main actors, location, companies, sentiment, relations, …), ergo the algorithm won’t have to rely just on words or other non-linguistic features.


Example of a relation extracted from the sentence: “Heuking Kühn Lüer Wojtek, with a team led by Dr. John Smith and Dr. Frank Brown, advised Volkswagen GmbH on the acquisition of Porsche.” This information can become part of the feature engineering in an ML algorithm.


An example of Hybrid at the production stage: if we were to put together a solution that needs to extract the names of professional players and personalities from documents, and properly tag them with the sport they are connected to, we can imagine using an ML model to quickly discern the sport discussed in each document and then use a Symbolic solution to extract names. An even more common application of Hybrid AI in a production workflow is observed when a Symbolic approach is used to refine the output of an ML model which might have had high Recall but only moderate Precision (more on this here); in this case, the Symbolic piece doesn’t need to be uber-accurate because it’s relying on the ML model getting rid of a lot of noise, and at the same time the ML model doesn’t have to be extremely precise because it relies on the Symbolic portion at the end of the production pipeline to filter out mistakes. This example should make it obvious how flexible such a system would be, when we’re offered the possibility to find the right balance between the two approaches and make them work in concert.

Architecturally speaking, it’s worth noting that Hybrid at the development stage keeps ML and Symbolic separate (the ML algorithm confiding in the quality of the training data it works with, not entirely aware of who or what produced that data), while Hybrid at the production stage presents a longer pipeline where both Symbolic and ML participate in and contribute to the final result. This should be taken into account when it comes to speed and processing time, whenever an application happens to be demanding in those respects. On this note, a chapter that would be too long to be in this article is related to energy consumption and processing time, and the fact they don’t necessarily increase in a Hybrid application; it surely depends, as most things, on the specific challenge at hand, but nowadays ML models have become extremely heavy, and, more often than not, adding Symbolic to the workflow allows for a lighter model, and consequently the whole hybrid pipeline ends up being less computing-intensive than the equivalent based only on a much beefier ML solution.

This writing just wants to be an introductory article to Hybrid AI, so I won’t dive into the many other facets and advantages (that I will likely address one by one in future articles anyway), but I’d be remiss if I didn’t briefly mention a few: besides reducing effort and increasing accuracy, a Hybrid system can respond quickly to change; if a problem is encountered, if language changes, if a new player or terminology appears overnight, the symbolic portion of the pipeline can be quickly updated with laser-focus precision, while a pure ML system will have to be retrained (with all the uncertainties in terms of work and time that come with that course of action). Even if we were limiting this conversation to value in the strictest sense, this particular advantage is one with measurable impact, and not only in terms of effort but responsiveness as well (which is paramount in Customer Care use cases, missed opportunities in the Financial Services industry, etc.) Extending beyond ROI, a Hybrid system enabling a lighter ML model also comes with another perk: lowering environmental impact; energy consumption has become a serious topic in the world of Machine Learning, and, when Symbolic becomes part of a linguistic solution, it brings knowledge that the ML model doesn’t have to learn anymore.


This topic has been closed for comments