Skip to content
API & custom models

Most accurate 
speech-to-text API

  • Custom ASR models tailored to your needs
  • Easy to integrate with your software, prices up to 10x lower than self-upload
  • Specialized APIs for phone calls, texts perfected by humans, and real-time audio or video

Request a quote Request a quote Request API Key
Most accurate 
speech-to-text API
Most accurate 
speech-to-text API
Loved by over a million customers

                            Company Webcast Logo

                            

                            Landtag MV Logo

                            

                            Kaltura Logo

                            

                            Amazon logo

                            
What we do
Integrate speech recognition capabilities into your software by using our API. You can connect to generic models or even collaborate with us to create a customized speech recognition for your specific use-case!

API Integration and costs

Do you want to gain insights into your phone conversations? Do you want to subtitle videos at scale? Or do you want to index your video-archive? We deliver the most accurate solution.

You can easily automate workflows and transcribe large quantities of audio and video by using our speech-to-text API. Our API is quite simple. It transfers audio or video files to our ASR server and returns the transcript in the desired format. 

The prices for our automatic speech-recognition API are up to 10x lower than when uploading your audio and video. Our team will contact you to explain our pricing structure. Testing our API is for free.

Request API key
Integration
How it works

API Integration

Our API is available in more than 80 languages. We support dual-channel audio, automatic punctuation and casing, speaker labels, timestamps, and all audio/video file formats.

Please contact us for our specialized APIs for phone calls, texts perfected by humans, and real-time audio or video.

Contact us
Fast

Quick turnaround for all your files

Accurate

Enabling an accurate flow of audio-to-data

secure
Secure

GDPR compliant security and safety

Supported Formats

We make audio accessible

XML / JSON

Include information such as start- and end-time per word, confidence scores, question indications, punctuation (…)

.doc / .txt:

Possible to export with or without timestamps and speaker changes

.SRT / VTT / EBU-STL:

Ideal to create automated subtitles. Settings for the appearance of the subtitles can be determined individually

Enable audio-to-data to flow accurately

Integrate the speech-to-text API with ease

Our matrix

We outperform

Recognized by thousands of developers, startups and top
companies as outperforming our competition

TooltipFeaturesGoogle VideoGoogle DefaultAWS TranscribeAmberscript
info Independent tests in the media (seen news section) have found Amberscript to have the highest accuracy of the three. Please use our Word Error Rate measuring tool to compare for yourself. Accuracy good poor okay Great
info Accuracy updates every 6-12 months 6-12 months 6-12 months 6 weeks
info Amberscript prices vary with customization required and usage per month Price $2.19/HR $1.44/HR $1.44/HR $0.50 to $9/HR
info Time to integrate 3-4 days 3-4 days 3-4 days 1-2 hours
info Amberscript supports Arabic, Bulgarian, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Italian, Japanese, Korean, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Swedish and more. Language Support 35 + 35 + 9 84
info Speaker Distriction yes yes yes yes
info All word include the timestamps of when they were said Word Timecodes yes yes yes yes
info Confidence scores indicate the algorithm’s Confidence scores yes yes yes yes
info Punctuation/Casing yes yes yes yes
info Amberscript’s engines can be integrated with your software to transcribe or subtitle in real-time. Please contact us to learn more. Real time support yes yes yes yes
info Please contact us to discuss the possibilities of a custom models for the highest accuracies possible. Custom models no no no yes
info Amberscript natively supports MP3, MP4, WAV, M4A, M4V, MOV, WMA, AAC, OPUS, FLAC and MPG and can enable more file formats on request. All formats accepted no no no yes
info Transcribe data from GCP Buckets only GCP Buckets only S3 Buckets only Anywhere
info The Amberscript API can provide you with the main keywords of every file Keyword extraction no no no yes
info The Amberscript API can be used for subtitles by receiving the files in SRT, VTT or EBU-STL including advanced subtitle formatting Export as SRT/VTT/EBU-STL no no no yes
info Our transcribers will perfect the texts from the ASR to more than 99% accuracy. Prices differ per language. Human perfected option no no no yes
info Amberscript servers are located in Western Europe and none of your data will leave the EU Server location USA USA USA Western Europe
info Amberscript has GDPR level security and privacy and deletes your data immediately after processing. Data privacy deletion no no no yes
info We are always ready to help you when you need. Free 24/7 support no no no yes
Features
Google Video
Google Default
AWS Transcribe
Amberscript
info
Accuracy
good
poor
okay
Great
Accuracy updates every
6-12 months
6-12 months
6-12 months
6 weeks
info
Price
$2.19/HR
$1.44/HR
$1.44/HR
$0.50 to $9/HR
Time to integrate
3-4 days
3-4 days
3-4 days
1-2 hours
info
Language Support
35 +
35 +
9
84
Speaker Distriction
yes
yes
yes
yes
info
Word Timecodes
yes
yes
yes
yes
info
Confidence scores
yes
yes
yes
yes
Punctuation/Casing
yes
yes
yes
yes
info
Real time support
yes
yes
yes
yes
info
Custom models
no
no
no
yes
info
All formats accepted
no
no
no
yes
Transcribe data from
GCP Buckets only
GCP Buckets only
S3 Buckets only
Anywhere
info
Keyword extraction
no
no
no
yes
info
Export as SRT/VTT/EBU-STL
no
no
no
yes
info
Human perfected option
no
no
no
yes
info
Server location
USA
USA
USA
Western Europe
info
Data privacy deletion
no
no
no
yes
info
Free 24/7 support
no
no
no
yes

Compared by relative strength

How it works

Customized speech recognition models

We combine the world’s latest knowledge in technology, language, and science to develop customer-specific language models for distinctive use cases. We do so by exploiting existing datasets or by creating a new dataset from scratch. Our goal is to create language models that are fully tailored to the language use of your organization.

Customization

Customization improves the 
speech recognition for:

Different accents

Acoustic environment

Adaptation of the vocabulary to recognize product names, special terms, abbreviations

Adaptation to domain-specific languages such as politics, healthcare, physics, tech, or other domains

Request a demo
Are you interested in

Customized speech recognition models?

  • Highest possible accuracy
  • Recognizes critical words and nuances your users have
  • Product names, campaign names, and other specific terminology 
  • Tailored to accents, phone speech, and other factors that influence audio quality
Request a demo

Happy customers

Meet our

HVA (Amsterdam University of Applied Sciences) – Read case study.

Our research group conducts a lot of interviews. Previously, we worked with our own pool of transcribers.
I’m glad that now our interviews are all transcribed in one place, it saves a lot of time in arranging everything.

L. Van den Berg – Lecturer-researcher at the Hogeschool van Amsterdam
Our app is now available!

Our app is now available!

More information