Floris Hermsen about the long, winding road to data & AI maturity
At Owlin, we couldn’t do what we are doing without the great work of all the different people working with us. In this interview series, The people of Owlin, we ask them about their daily work, background, and where they see Owlin going in the future. This month: Floris Hermsen, our Head of Data Science, tells us about the role of the data science team and the goals they are working towards.
What does a data scientist do at Owlin?
“As data scientists, our main responsibility is maintaining and enhancing Owlin’s machine learning and data analytics stack. Natural Language Processing (NLP) is the cornerstone of our offering, as it helps us analyze the worldwide news and other sources of text to find actionable insights for our clients. Our models perform a wide range of typical NLP tasks, ranging from translation, content classification, sentiment analysis, entity extraction to content deduplication (collating near-duplicate news articles and other information), and much more! After the NLP pipeline follows our analytics layer: we detect which relevant topics are trending around the entities we track and rank them accordingly in various ways through comparative analysis. Also, a lot of effort goes into maintaining our advanced multilingual search capabilities.
A less obvious but equally important responsibility of the data science team, I believe, is supporting other teams within the organization making their processes more efficient and effective through smart use of data. This ranges from helping monitor critical system components to predicting which data sources are experiencing issues through anomaly detection.”
Where lies the primary focus for you and your team at this moment?
“We’re currently at a stage where it becomes harder and harder to simply drop in a machine learning model to perform a certain task better or more efficiently. For instance, the complex parsing engines curated by our analysts look for many different signals in the news, with an ever changing and growing signal taxonomy. Replacing such a system outright with new AI solutions comes with many complications, such as a lack of adaptability and explainability. Both are important for our clients and our own day-to-day operations. Also, such a solution comes with highly custom, niche and fluid input data requirements. This poses quite the challenge!
For us, the answer lies in creating hybrid systems that let machine learning models and human curators work together, improving each other’s inputs and outputs in a closed, human-in-the-loop data ecosystem. The real challenge here revolves around putting a system in place that can effectively and quickly generate the right training data for machine learning models. In my opinion, this is actually a harder challenge than choosing the right model architectures. This is in line with the broader trend in the data science field from model-centric towards data-centric AI: models are just one part of the equation and getting the right training data is just as important, if not more. If you get this challenge right, the system takes over the role of the data scientist in terms of generating new models, and becomes a self-service solution for domain experts. This greatly improves the flexibility of the models you run in production and dramatically shortens the time to market for new ideas.
At this point you are no longer just developing and deploying models to a machine learning pipeline, but you are engaged in outright process and organizational transformation. You need to think about how data is stored in reliable ways, how people interact with the systems, which data contracts need to exist between different parts of the organization and how to design the metadata system that needs to orchestrate all of this.
Before machine learning can enter the scene, the processes they need to replace or augment need to be mature enough, as well as the part of the organization it touches. There are many different models out there that can help frame this question (just Google “data maturity” or “AI maturity”). But, what these models all have in common, is that you need robust and reliable data collection, good data accessibility, a solid & reproducible data analysis strategy and an organization that understands the value and required effort to get there.”
How are you working towards this as Head of the Data Science Team?
“Working as a data scientist at Owlin, you have to consider that the systems you are working on are actually being used by customers and are continuously being updated. You can almost compare it to upgrading an airplane mid-flight.
Therefore, you have to take small, incremental steps towards the desired state. It’s extremely complex to create an entirely new system and keep the plane in the air at the same time. Consequently, we perhaps first try to improve one of the buttons in the cockpit. Or, you try to improve the trolleys used in the passenger cabin. The trick is to make the incremental changes that slowly but surely lead to the desired new system state. This comes with the added advantages of slowly evolving existing workflows, which leads to easier and better adoption, as well as more predictable project timelines.
Of course, we can’t do this by ourselves. We need the rest of the company and these initiatives need to be aligned on the strategic level. This can be challenging because it requires concepts and skills that are sometimes new for me, our team and the company as a whole. Transformation and innovation are never easy! But hey, I like a challenge and therefore enjoy the process a lot.”
Thank you, Floris!
Interviews with other owls:
Willem Westera about how he applies physics in his daily work for Owlin