Skip to content
Blog
8 Jun 2021

Dutch punctuation model

Book with a text containing puctuation
Topics
Automatic transcription

Grammatical aspects like gerund, prepositions, and basic grammar rules play an important role in most known languages. Have you ever thought about the fact that punctuation also plays a critical part?
Punctuation matters in language. It implies the correct arrangement of small, sometimes hardly noticeable marks in the appropriate places to indicate the exact length and the meaning of the sentence.
In the following text, we’ll take a closer look into the AI punctuation model we have developed for our Dutch language speech recognition model.

The role of punctuation in language

Punctuation is an integral part of written text and helps in making text intelligible and coherent. The absence of punctuation hampers readability and can make texts incomprehensible. Furthermore, punctuation marks reduce ambiguity. Consider this example where a comma can completely alter the meaning of a sentence:

“Most of the time travellers worry about their luggage”

vs

“Most of the time, travellers worry about their luggage”

Missing punctuation can also lead to awkward sentences, as in this classic example:

“I find inspiration in cooking my family and my dog”

Punctuation in speech-to-text

Therefore, speech-to-text systems must include punctuation when they produce a transcript. Typical automatic speech recognition (ASR) systems, however, do not output punctuation marks since they don’t have a spoken form. Furthermore, the generated transcript is composed of only lowercase words, making it difficult to understand. A properly punctuated transcript also aids in the automatic generation of subtitles for videos.

This problem can be solved by incorporating a separate punctuation model that can automatically add punctuation to the output from an ASR model. It can be cast as a natural language processing (NLP) problem where the goal is to predict the punctuation mark (or the lack thereof) for every word in a transcript.

Language models

Deep learning has witnessed tremendous progress in the last few years, fuelled by the increase in computational power. The field of NLP was taken by storm by the introduction of BERT in 2018. Developed by Google AI, BERT is a large language model based on the transformer architecture. It was touted as NLP’s ImageNet moment, referring to how ImageNet steered progress in representation learning from images in the field of computer vision. BERT is a marked improvement over earlier language representation models such as GloVe embeddings, and contextual representations such as ELMo.

For an intuitive explanation of how BERT works, refer to this excellent blog post by Jay Alammar. In simple terms, it is trained on raw texts in a self-supervised manner, i.e., without human annotations. Specifically, it is trained on two tasks — masked language modeling and next sentence prediction. At the end of the training, the model is said to be “pre-trained” and captures the semantics of language with its word and sentence representations. A pre-trained BERT can then be fine-tuned on a downstream NLP task. When it was published, BERT produced state-of-the-art results after fine-tuning on a range of NLP tasks, including natural language inference (NLI), question answering, etc.

Punctuation model at Amberscript

At Amberscript, we develop custom ASR models, one of them for Dutch. As noted before, the transcripts produced by the model lack any punctuation marks. Currently, there are no open-source punctuation models available that are specific to the Dutch language. Therefore, we developed a punctuation model based on BERT to automatically add the following punctuation marks — question mark, period, exclamation mark, comma, colon, and semicolon. Other punctuation marks that occur in pairs, such as quotation marks and parentheses are much more difficult to determine solely based on the text.

Transform your audio and

video to text and subtitles

  • High accurate, on demand service
  • Competitive pricing with the fastest turnaround using AI
  • Upload, search edit and export with ease.

Pipeline

The entire ASR pipeline thus consists of three main components — the ASR model that produces lower-cased text, a post-processing module that capitalizes named entities (names of people, places, etc.), performs number denormalization, spelling corrections, etc., and finally, a punctuation model that adds the required punctuation marks.

Infographic explaining Amberscript's 5-step punctuation process - from audio to finished transcript
Infographic explaining Amberscript’s 5-step punctuation process – from audio to finished transcript.

Demo

To show the punctuation model in action, we can take this example output from the ASR model:

nog een laatste een likje verf zodat de attracties er piekfijn uitzien hier is alles bijna klaar om weer open te kunnen je merkt dat het nu weer begint te kriebelen eigenlijk bij ons alle monteurs zijn weer bezig de groendienst is weer bezig het park mooi te maken de schoonmaakdienst is alles weer aan het schoonmaken dus we zijn er echt gereed een maken om straks weer de poorten te openen

The result of applying post-processing and the punctuation model is as follows:

Nog een laatste: een likje verf, zodat de attracties er piekfijn uitzien. Hier is alles bijna klaar om weer open te kunnen. Je merkt dat het nu weer begint te kriebelen eigenlijk bij ons. Alle monteurs zijn weer bezig. De groendienst is weer bezig het park mooi te maken. De schoonmaakdienst is alles weer aan het schoonmaken, dus we zijn er echt gereed een maken om straks weer de poorten te openen.

Notice that the output from the ASR model is difficult to read, whereas the final transcript after adding punctuation marks is more natural.

Punctuation included in transcripts from Amberscript

If you’re looking for a clean, accurate transcript, that includes proper punctuation you should try using an automatic transcription service from Amberscript.
We offer fast, accurate, and affordable transcription options that will surely improve your workflows.
Moreover, if you need the most accurate transcript then you should try Amberscript’s manual transcription. Our language experts are native speakers and create the highest accuracy texts in clean read (text made more readable) or verbatim (all words typed exactly as said).

Our services allow you to create text and subtitles from audio or video.
  • Upload audio or video file
  • Automatic or manual speech to text
  • Edit and export text or subtitles