LLMs March 13, 2023

ChatGPT, Bard and Advanced Language Models at Owlin

It’s been impossible to miss: AI has gone mainstream! For a few weeks now, OpenAI’s ChatGPT has been all the rage on Twitter, LinkedIn, and the news. And with Microsoft now acquiring a 49% stake in OpenAI and Google launching their AI-powered chatbot Bard, the AI wars in Big Tech have officially begun! The seemingly human-like responses of these chatbots have captured the imagination all around the world, disrupting journalism, content creation, education, web search, and more.

The language models that power them, though, have already been around for quite a while. And at Owlin, you can find them in our machine learning stack — a lot! We use them for translation, finding entities in text, measuring document similarity at scale, even writing event summaries from scratch, and much more! But let’s first unpack why we even need all these capabilities.

Why language models are essential for Owlin’s risk insights

At Owlin, we analyze the global news and other alternative data for large players in the financial sector with a focus on third-party risk management, strategic insights, KYC, and ESG. Our clients need to know when one of their suppliers, clients, investments, or merchants is at risk of going bankrupt, involved in litigation, about to be acquired, or one of many other important signals that we track. To that end, we analyze hundreds of thousands of news articles from all over the world in many different languages, every single day. This is way too much for human analysts to process, so we use natural language processing (NLP) and various other types of machine learning to extract all relevant data points from these news articles. Large Language Models (LLMs), such as those powering ChatGPT, play a vital role in our NLP pipeline.

How Owlin uses language models in its machine learning stack

In essence, language models are huge neural networks that have been trained on massive amounts of text from the internet, books, publications, and other sources. They can be trained unsupervised by randomly removing parts of the input text and training the model to fill in the blanks. This results in models that have an understanding of syntax and semantics, both are crucial for understanding natural language. And they keep getting bigger and better!

What makes language models revolutionary?

What makes language models so revolutionary compared to older methods, is their ability to combine very deep neural network architectures with incredible amounts of training data, and a recent machine learning innovation called attention mechanisms. Through attention, the key ingredient of so-called transformer architectures, language models learn which words in a text form relevant combinations, enabling them to learn complex patterns and relationships. This is how they outperform earlier neural network architectures such as LSTMs, as well as the more rule-based approaches that have been around for decades.

Balancing performance, explainability, and adaptability at Owlin

At Owlin, we use a combination of the best available open-source language models, as well as a few that we finetune ourselves. Language models are great for when you want best-in-class performance for common tasks that do not change a lot over time, such as translation, entity recognition, and generating document identity fingerprints. And if you want to generate high-quality, human-readable text, generative language models (like those that power ChatGPT) are the only game in town.

Rule-based algorithms still have their place in certain NLP applications, however, because the increased performance of language models comes at a cost. Language models are expensive to train, contain hidden biases, are hard to incrementally adapt and it’s often hard to explain why a language model produces a certain outcome (the black box problem). Because we operate in a regulated industry, explainability and adaptability are key requirements for some of our core capabilities, such as finding and interpreting risk signals in the worldwide news. However, we still would like to leverage the power of language models to improve the quality of these systems. How does one go about balancing these priorities?

Hybrid systems: combining language models and rule-based algorithms

This is where we at Owlin get creative by coming up with hybrid systems. For instance by using a language model to suggest new risk triggers to look for in the news. Or using a language model to approve or reject the output of a rule-based system, making the composite system much more maintainable and explainable. This is where we as data scientists and machine learning engineers get to create unique, best-in-class systems to power our product.

What’s next? The future of language models and AI at Owlin

So with the AI space developing this rapidly, what is next? The next possible step could be the integration of generative language models with other intelligent systems, for instance by teaching them how to reason over facts and relationships from knowledge graphs. This would make the models less dependent on their training data to distill knowledge from and make it easier to put them to use in a setting where up-to-date information is key. Rest assured that both OpenAI and Google are working on this as we speak. You can count on us at Owlin to follow all the latest developments and keep our innovations going strong to ensure the quality of the insights we serve!

Owlin: risks evolve—stay ahead with real-time intelligence

Stop chasing risk—start anticipating it with Owlin.

Schedule Demo

Judith Landstra - de Graaf

Share this blog

ChatGPT, Bard and Advanced Language Models at Owlin

Why language models are essential for Owlin’s risk insights

How Owlin uses language models in its machine learning stack

What makes language models revolutionary?

Balancing performance, explainability, and adaptability at Owlin

Hybrid systems: combining language models and rule-based algorithms

What’s next? The future of language models and AI at Owlin

Owlin: risks evolve—stay ahead with real-time intelligence

To ChatGPT or Not to ChatGPT, That’s the Question

Floris Hermsen about the long, winding road to data & AI maturity

TPRM Challenge: Underestimating Requirements for Success

Monitor More Companies, Spend Less Time: Inside Owlin’s New Monitoring Environment

Frequently Asked Questions about New Owlin Monitoring

Podcast Episode 4: Explaining Owlin’s Risk Scores

Stefan Peekel on How Regulatory Pressure Reshapes Vendor Risk Management at Banks

Kristin’s Internship Journey at Owlin

Beyond Adverse Media Screening: How Owlin Brings Context

Podcast Episode 3: The Aprovall & Owlin Partnership