Advanced Speech-to-Text with unmatched accuracy, customized to your audio. Deploy in the cloud or on-premise.

Use the AmberScript's Speech-to-text API to transcribe audio from interviews, meetings, podcasts, phone calls and all types of recordings. Customize to your audio and use case for higher accuracy. Run your engine behind our secure, fast and affordable API or deploy on our own servers.

Try the API

Speech to text API by AmberScript with custom language models

How to use our API?

Automatic Speech Recognition (ASR) is a powerful discipline of Artificial Intelligence and Machine Learning. It can be used for countless business purposes. Do you Need to create a big amount of transcripts or subtitles? Do you want to index your video-archive? Do you want to get access to unused media assets? Or do you want to gain insights into your recorded (phone) conversations?
Then it is worthwhile to automate the workflow by integrating AmberScript’s highly technical speech-to-text API into your systems.
Our API is quite simple. It transfers audio or video files to our ASR server and returns the transcript in the desired format. 

Supported Formats:
  • XML / JSON: Include information such as start- and end-time per word, confidence scores, question indications, punctuation (...)
  • .doc / .txt: Possible to export with or without timestamps and speaker changes
  • .SRT / VTT / EBU-STL: Ideal to create automated subtitles. Settings for the appearance of the subtitles can be determined individually
You're in good company. Our customers include:

warner bros logo
amazon logo
german government logo
univeristy of amsterdam logo
dutch government logo
huberlin logo

Customer-specific engines for highest accuracy

At AmberScript we have a team of talented speech scientists. We are experts in developing customer-specific language models for distinctive use cases. We do so by creating a dataset or by exploiting existing datasets to develop language models that are tailored to the language of your organization.

This customization includes:

  • Accents
  • Acoustic environment
  • Adaptation of the vocabulary to recognize product names, special terms, abbreviations
  • Adaptation to domain-specific language such as law, healthcare, physics, tech or other domains

Why develop a specific language model?

Language is a complex structure, that constitutes communication. Recorded language can become even more complex as the audio quality, the way that people speak, the language of the speakers, the use of domain-specific vocabulary and many other factors influence the transcription quality. Therefore it may be challenging for language scientists to develop a general-purpose slanguage model that recognizes jargon used in politics, archeology, and social media at the same time.

By creating language models, suited to a specific context, that complexity can be reduced by eliminating factors that are not relevant to your organization. Speech recognition engine can be optimized for particular recording settings, speaking habits, vocabulary etc. Are you only recording high-quality audio for media-production or political speeches? Then your language model doesn’t need to be optimized for phone calls and the other way around.

How are customer specific language models created?

Data gathering

Together with your organization we exploit existing data and create new datasets, if necessary. Based on this specific dataset, our speech scientists can develop a highly-specialized language model, running behind our speech-to-text API.

Creation of the acoustic model

Acoustics is an important factor in ASR (Automatic Speech Recognition). For example, indoor phone calls have completely different audio properties from outdoor political speeches. Finding the right fit between the sound environment of your organization and the acoustic model is another way to vastly improve the accuracy of transcription.

Creation of a linguistic model

The linguistic model includes jargon, that is frequently used in your organization. By adding context-specific terms to the linguistic structure, the speech recognition engine is able to recognize the words outside of our everyday vocabulary.

Implementation into your workflow & creation of a feedback-loop

With the help of Machine learning, we are able to continually improve language models. Via our powerful API, we can integrate our automatic speech recognition software into your own systems. A feedback loop can be implemented in order to frequently update your language model and to boost accuracy even higher.

Add Custom Vocabulary

Easily boost accuracy for keywords or phrases that are important, or add thousands of custom words to the vocabulary, to fine-tune the recognition for your specific needs.

Build on top of AmberScript’s API

We developed our API in order to enable developers around the world to build amazing things on top of our core technology. By adding our speech-to-text API to your stack, you can easily equip your applications with speech-to-text capabilities. Using AmberScripts's technology you can transcribe and analyze audio and video files stored on any server.  Possibilities for ASR are endless.

Key Features

Optimized for readability:

You can choose the output format of your transcription based on your needs and preferences. Do you need a document that is easy-to-read? AmberScript adds punctuation and automatic formatting so that you get as much out of the text as possible.

Timestamps on every word:

For many purposes timestamps are crucial. AmberScript’s speech-to-text API delivers timestamps on every word. If you want to create subtitles, the delivered timestamps allow you to display the words with more precision than any human could do it.

Speaker distinction:

AmberScript developed a technologically complex feature that allows distinguishing between multiple speakers. All export formats include speaker distinction so that you can identify:

  • Who is speaking and when?
  • How long are they speaking?

Supports a variety of use cases:

Customer interviews, qualitative research, broadcasting material - those are some of the existing ASR models available at our disposal.

In case you want to reach the next level of accuracy, it is also possible to develop a specific language model that is tailored to the unique circumstances of your organization.

Channel separation:

Via our API it is possible to transcribe only single audio- or video channels. Do you need to transcribe isolated recordings from your last media production or phone conversation? Send us the audio channel that captured the highest quality and we’ll return an accurate transcript.

Stream your transcriptions:

AmberScript offers live transcriptions. Connect your audio- or video stream using our secured connections and receive your transcriptions in real-time.

Human-supported Automatic Speech Recognition

At AmberScript we believe that the best results come out from an interplay between artificial intelligence and human capabilities. Is there a need for 100% accurate transcripts where automatic speech recognition is simply not enough? For those scenarios, we have a large pool of qualified transcribers, that will review and adjust your transcript to ensure the highest possible accuracy.

Contact us to learn more!

More reasons to choose AmberScript

Private and Secure

We believe in privacy. We never store, copy or share the audio data you send to our secure API and it will never leave our continent. Your audio data is deleted from our servers immediately after our algorithm transcribes it.

Custom Models

Add thousands of custom terms to the vocabulary or create a model specific to your use case (accents, sound environment, language used) to boost accuracy.

Supports All Audio Formats

The API accepts virtually any audio format, even lossy and low bitrate audio commonly found in phone recordings. No need to worry about sample rates, bit rates, encodings or other tricky signal processing characteristics.

Request a demo