Skip to content
API & custom models

The most accurate speech-to-text API

  • Custom ASR models tailored to your needs
  • Easy to integrate with your software
  • Specialized APIs for phone calls, texts perfected by humans, and real-time audio or video
Request a quote Request a quote See API docs
What we do
Integrate speech recognition technology into your software by using our audio to text API. You can connect to generic models or collaborate with us to create a customized speech recognition for your specific use case!

Speech-to-text API

Streamline workflows and drive productivity

Machine-made captions by Amberscript
  • Easy to integrate with your software
  • Prices up to 10x lower than self-upload
  • Available in more than 80 languages
  • Automate workflows and accurately transcribe large quantities of audio and video with ease

Custom ASR models

Build with the world's most accurate ASR model

  • Get the highest possible accuracy for different accents
  • Tailored to accents, phone speech, and other factors that influence audio quality
  • Adaptation of the vocabulary to recognize product names, special terms, abbreviation
  • Adaptation to domain-specific languages such as politics, healthcare, physics, tech, or other domains
Why Amberscript AI is the most accurate ASR in the World

We outperform

TooltipFeaturesGoogle VideoGoogle DefaultAWS TranscribeAmberscript
info Independent tests in the media (seen news section) have found Amberscript to have the highest accuracy of the three. Please use our Word Error Rate measuring tool to compare for yourself. Accuracy good poor okay Great
info Accuracy updates every 6-12 months 6-12 months 6-12 months 6 weeks
info Amberscript prices vary with customization required and usage per month Price $2.19/HR $1.44/HR $1.44/HR $0.50 to $9/HR
info Time to integrate 3-4 days 3-4 days 3-4 days 1-2 hours
info Amberscript supports Arabic, Bulgarian, Catalan, Danish, Dutch, English, Finnish, French, German, Greek, Hindi, Hungarian, Italian, Japanese, Korean, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Spanish, Turkish, Swedish and more. Language Support 35 + 35 + 9 84
info Speaker Distinction yes yes yes yes
info All word include the timestamps of when they were said Word Timecodes yes yes yes yes
info Confidence scores indicate the algorithm’s Confidence scores yes yes yes yes
info Punctuation/Casing yes yes yes yes
info Amberscript’s engines can be integrated with your software to transcribe or subtitle in real-time. Please contact us to learn more. Real time support yes yes yes yes
info Please contact us to discuss the possibilities of a custom models for the highest accuracies possible. Custom models no no no yes
info Amberscript natively supports MP3, MP4, WAV, M4A, M4V, MOV, WMA, AAC, OPUS, FLAC and MPG and can enable more file formats on request. All formats accepted no no no yes
info Transcribe data from GCP Buckets only GCP Buckets only S3 Buckets only Anywhere
info The Amberscript API can provide you with the main keywords of every file Keyword extraction no no no yes
info The Amberscript API can be used for subtitles by receiving the files in SRT, VTT or EBU-STL including advanced subtitle formatting Export as SRT/VTT/EBU-STL no no no yes
info Our transcribers will perfect the texts from the ASR to more than 99% accuracy. Prices differ per language.n Human perfected option no no no yes
info Amberscript servers are located in Western Europe and none of your data will leave the EU Server location USA USA USA Western Europe
info Amberscript has GDPR level security and privacy and deletes your data immediately after processing. Data privacy deletion no no no yes
info We are always ready to help you when you need. Free 24/7 support no no no yes
Features
Google Video
Google Default
AWS Transcribe
Amberscript
info
Accuracy
good
poor
okay
Great
Accuracy updates every
6-12 months
6-12 months
6-12 months
6 weeks
info
Price
$2.19/HR
$1.44/HR
$1.44/HR
$0.50 to $9/HR
Time to integrate
3-4 days
3-4 days
3-4 days
1-2 hours
info
Language Support
35 +
35 +
9
84
Speaker Distinction
yes
yes
yes
yes
info
Word Timecodes
yes
yes
yes
yes
info
Confidence scores
yes
yes
yes
yes
Punctuation/Casing
yes
yes
yes
yes
info
Real time support
yes
yes
yes
yes
info
Custom models
no
no
no
yes
info
All formats accepted
no
no
no
yes
Transcribe data from
GCP Buckets only
GCP Buckets only
S3 Buckets only
Anywhere
info
Keyword extraction
no
no
no
yes
info
Export as SRT/VTT/EBU-STL
no
no
no
yes
info
Human perfected option
no
no
no
yes
info
Server location
USA
USA
USA
Western Europe
info
Data privacy deletion
no
no
no
yes
info
Free 24/7 support
no
no
no
yes

Compared by relative strength

Why Amberscript AI

Ease of implementation

Set up and see results in no time. Our easy-to-use API is designed by developers for developers.

Best accuracy

We deliver a standard of speech-to-text accuracy greater than any other voice recognition software out there.

secure
Enterprise-grade security

You’re in safe hands. Amberscript is GDPR compliant and ISO27001 & ISO9001 certified.

Speech-to-text API integration and costs

We deliver the most accurate speech-to-text software

Do you want to gain insights into your phone conversations? Do you want to subtitle videos at scale? Or do you want to index your video-archive?

You can easily automate workflows and save time on your transcription process by using our speech-to-text API. Our API is quite simple. It transfers video or audio files to our ASR server and returns the transcript in the desired format.

The prices for our automatic speech-recognition API are up to 10x lower than when uploading your audio and video. Our team will contact you to explain our pricing structure. Testing our API is for free.

Request a quote
Integration
How it works

Speech-to-text API Integration

Our API is available in more than 80 languages. We support dual-channel audio, automatic punctuation and casing, speaker labels, timestamps, and all audio/video file formats.

Please contact us for our specialized APIs for phone calls, texts perfected by humans, and real-time audio or video.

See API docs
How it works

Customized speech recognition models

We combine the world’s latest knowledge in technology, language, and science to develop customer-specific language models for distinctive use cases. We do so by exploiting existing datasets or by creating a new dataset from scratch. Our goal is to create language models that are fully tailored to the language use of your organization.

Get a customized offer

Request a quote for Speech-to-Text API

Step 1 of 3

How many hours of audio / video do you expect to process through our Speech-to-Text API in the next 12 months?(Required)

Use cases and application

Transcribe calls and meetings

Speech-to-text API helps you create transcripts of calls and meetings, giving you better insights and understanding into your conversations.

Voice assistance

Assistive voice technology features capacities to convert spoken words and voice commands into text, based on speech-to-text APIs such as Amberscript.

Language learning

Modern language learning apps also benefit from using speech-to-text technology to recognize what users are saying in multiple languages.

Audio/video files documentation

Speech-to-text software is also useful for sorting large audio or video archives, enabling you to categorize a large number of audio and video files.

Accessibility solutions

For services that increase accessibility for people with hearing difficulties, speech-to-text software can help recognize voice commands accurately.

Creating subtitles

In subtitling and content creation, a speech-to-text API helps to create text transcription quicker, which helps content reach a wider audience.

Frequently Asked Questions

Supported Formats

We make audio accessible

XML / JSON

Include information such as start- and end-time per word, confidence scores, question indications, punctuation (…)

.doc / .txt:

Possible to export with or without timestamps and speaker changes

.SRT / VTT / EBU-STL:

Ideal to create automated subtitles. Settings for the appearance of the subtitles can be determined individually

Shape
Enable audio-to-data to flow accurately

Integrate the speech-to-text API with ease