Automatic Speech Recognition, or ASR, is the process of converting human speech into legible text using Machine Learning or Artificial Intelligence (AI) technology. Over the last decade, the technology has exploded, with ASR systems appearing in everyday apps like TikTok and Instagram for real-time captioning, Spotify for podcast transcriptions, conference transcriptions for Zoom, and more!
We sat down with Esther van den Berg, Natural Language Processing Engineer at Amberscript to learn more about how ASR works at Amberscript and how it creates accurate transcripts and subtitles.
Hi Esther, can you tell us a little about yourself?
Well, I’m Dutch, I’m educated in linguistics, and then I learned how to program. I steadily got more education in language technology in the Netherlands and Germany. When it was time to look for a job, I knew I wanted to have the sense that what I was working on and building were contributing to the life of every day people.
Amberscript is my first full-time experience, and I work on using ASR as part of our product to create transcripts and subtitles.
What is an average day at Amberscript like for you?
I sit within the Engineering team. We develop software in an agile environment. That means that we both have a longer-term strategy of how we want to improve the product and short-term goals to make sure that the product in its current state is working well for our customers and new features and improvements are continuously being delivered. I mainly work on developing our in-house language models and on post-processing, which is where you apply language-specific rules to make the text that’s produced by the ASR engine even better.
Can you explain more about language models?
A language model is the part of an ASR engine that recognises patterns in what words are likely to follow other words. You can improve language models by training on more or more recent data. We use them to try to make transcripts as accurate as possible. We also try to generate text that will be easy and quick for transcribers to improve when a customer uses our Manual services.
Is there a difference between automatic speech recognition and natural language processing?
ASR is a kind of Natural Language Processing. Natural language processing is any text mining or processing where the input to the computer is language, either text or speech. ASR is specifically the processing of speech to generate text. You can actually also call ASR technology “Speech-to-Text” technology.
What caught your eye about the technology when you were applying for Amberscript?
And what’s eye-catching about tech roles at Amberscript is how concrete, tangible and valuable the product is. It’s clear to understand why it’s important to have subtitles and how they help all kinds of people. So, that’s very nice when you’re building and doing development.
And then the other thing that caught my eye is that Amberscript is a company where natural language processing is integral to the product and the success of the customer. So that makes it interesting for people who have ASR or a natural language processing background. That’s very motivating!