Is Deepgram worth it for a small startup building a voice assistant?

Yes, if you need real-time STT/TTS and a unified Voice Agent API. The free $200 credit lets you prototype. Pay-as-you-go per-minute pricing scales with usage. For simple batch transcription, you might find cheaper alternatives.

Does Deepgram integrate with Twilio?

Yes, Deepgram integrates with Twilio via its WebSocket and REST APIs. You can use Deepgram's STT/TTS in Twilio Voice applications for real-time transcription and speech. Documentation shows how to set up a voice agent with Twilio.

How does Deepgram compare to AssemblyAI?

Both offer STT and TTS APIs, but Deepgram provides a unified Voice Agent API combining STT, TTS, and LLM orchestration – a single endpoint. AssemblyAI requires stitching separate components. Deepgram generally has lower latency with models like Nova-3. Pricing is similar per-minute.

What's the cheapest Deepgram tier?

The Pay As You Go tier is free to start with a $200 credit. After the credit, you pay per minute (e.g., Nova-3 streaming $0.0048/min). No monthly minimums. The Growth plan costs $4K+/year but saves up to 20% over pay-as-you-go.

What are Deepgram's biggest limitations?

Accuracy can vary by accent or domain compared to some competitors. The free tier is limited to a $200 credit – no perpetual free tier. Concurrency caps (STU up to 50 REST/150 WSS) require upgrading to Growth. Self-hosting and custom models are Enterprise-only.

Can Deepgram replace Google Cloud Speech-to-Text?

Yes, for many use cases. Deepgram offers lower latency and a unified Voice Agent API, while Google Cloud has a broader ecosystem. For real-time voice agents, Deepgram's Flux models and built-in turn detection may be better. For existing GCP users, migration requires API changes.

How long does Deepgram take to set up?

You can start using Deepgram in minutes: sign up, get an API key, and use the API Playground for testing. For integration into your app, most developers complete it within hours. Self-hosting or custom models (Enterprise) can take weeks.

Is Deepgram good for real-time voice agents?

Yes, Deepgram excels at real-time voice agents with its Flux models (English and Multilingual) that include built-in turn detection and interruption handling. The unified Voice Agent API reduces latency by combining STT, TTS, and LLM orchestration in one endpoint.

How do I migrate from AssemblyAI to Deepgram?

You can replace AssemblyAI's API calls with Deepgram's equivalent endpoints. For STT, use Nova-3 or Flux; for TTS, use Aura-2. If you used AssemblyAI's separate components, Deepgram's unified Voice Agent API can simplify your stack. Migrate your audio file processing pipeline accordingly.

Deepgram

Freemium

Real-time STT, TTS, and Voice Agent APIs for developers.

By Tanmay Verma, Founder · Last verified 06 Jul 2026

6.3k views

Added 4/3/2026

95/100Safe Bet

Visit Website

In short

Deepgram — Real-time STT, TTS, and Voice Agent APIs for developers. Best for Developers building real-time voice agents and conversational AI, Contact centers needing live transcription and agent assist, Healthcare providers automating medical transcription at scale. Free to start; paid plans from $4/mo.

Compared withvs Assemblyai vs Whisper

Is Deepgram actually worth it?

Live

See what real users actually say. We scan live discussions, reviews and complaints across the web and hand you an honest verdict — in under a minute.

3 free scans · no card needed · downloadable report

Run a free scan

Editorial Verdict

Best for

Developers building real-time voice agents and conversational AIContact centers needing live transcription and agent assistHealthcare providers automating medical transcription at scaleGlobal apps requiring multilingual speech recognition in one APIPlatforms embedding Voice AI into their products via partner program

Not ideal for

Simple transcription-only use cases (Nova-3 standalone is cheaper)Teams needing an out-of-the-box UI (no built-in frontend)Low-budget hobbyists—free tier is limited to $200 creditOn-premise-only deployments without self-hosting expertiseUse cases requiring real-time video analysis (no video support)

Deepgram's unified Voice Agent API and real-time Flux models make it a top pick for developers building conversational AI. If you need simple batch transcription, Nova-3 standalone is cheaper. Best for those who value a single API over stitching components.

Skip Deepgram if Skip Deepgram if you need a turnkey UI or are only doing batch transcription on a tight budget—the API-first design and per-minute costs may not fit.

Compare with: Deepgram vs AssemblyAI, Deepgram vs ElevenLabs, Deepgram vs Whisper Memos

Last verified: July 2026

What's new in Deepgram

Checked 2 days ago

Across the latest 6 updates: 1 feature update and 5 changelog entries.

FeatureChangelog·7 days agoNewest

llms.txt documentation index available at root level

Added /llms.txt documentation index for AI agent discoverability. Append /llms.txt or .md to any page for markdown versions.

ChangelogChangelog·8 days ago

July 1 changelog entry

Changelog entry for July 1, 2026. No further details provided in input.

ChangelogChangelog·9 days ago

June 30 changelog entry

Changelog entry for June 30, 2026. No further details provided in input.

ChangelogChangelog·22 days ago

June 17 changelog entry

Changelog entry for June 17, 2026. No further details provided in input.

ChangelogChangelog·24 days ago

June 15 changelog entry

Changelog entry for June 15, 2026. No further details provided in input.

ChangelogChangelog·28 days ago

June 11 changelog entry

Changelog entry for June 11, 2026. No further details provided in input.

Viability Score

95/100

Safe Bet

How likely is Deepgram to still be operational in 12 months? Based on 4 signals — momentum (how recently it shipped), wrapper dependency, revenue model, and web presence.

momentum

100

funding runway

website health

wrapper dependency

100

Last calculated: July 2026

How we score →

Key Features

Real-time speech-to-text with Flux and Nova-3 models
Text-to-speech with Aura-2 and Aura-1 voices
Unified Voice Agent API (STT+TTS+LLM orchestration)
Flux Multilingual: conversational STT in 10 languages with built-in turn detection and interruption handling
Batch transcription for pre-recorded audio
Self-hosted deployment option
Audio Intelligence API for emotion, sentiment, topic detection
Custom model training for domain-specific accuracy
Speaker diarization and sentiment analysis
WebSocket and REST API integration
Profanity filtering across models
Automatic language detection (Nova-3 Multilingual)
Keyterm Prompting for domain-specific jargon accuracy
Redaction of PII like social security numbers and credit cards
Smart Formatting for punctuation, casing, dates, currency

About Deepgram

FreemiumAdvancedAPI availableAPI

Deepgram is a Voice AI platform that provides real-time and batch APIs for speech-to-text (STT), text-to-speech (TTS), and voice agents. Developers and product teams use Deepgram to build conversational AI, contact center analytics, medical transcription, and voice-enabled applications. Key capabilities include Flux Multilingual (10 languages in a single model with built-in turn detection), Nova-3 for high-accuracy transcription, Aura-2 for TTS, and a unified Voice Agent API that combines STT, TTS, and LLM orchestration into one endpoint. You can deploy on cloud or self-hosted. Compared to alternatives like AssemblyAI or Google Cloud Speech-to-Text, Deepgram offers lower latency and a unified API that reduces integration complexity.

Behind the Verdict

Deepgram is for builders who want one API to handle the full voice interaction loop—STT, TTS, and LLM orchestration. The Voice Agent API cuts integration work dramatically compared to assembling separate providers. Flux Multilingual, launched May 2026, is a standout: it handles 10 languages in a single streaming model with automatic turn detection and interruption handling, which is rare in the market. Pricing is usage-based and transparent; the Pay As You Go tier gives $200 free credit. Growth tier saves up to 20% with annual pre-paid credits. Enterprise gets custom SLAs and self-hosting. We'd pick Deepgram for real-time voice agents, contact center analytics, and multilingual transcription at scale. Where it bites: there's no built-in frontend UI, so you must build your own interface. The free credit is limited, and streaming concurrency caps may pinch high-volume use cases without the Growth plan. Custom model training requires a sales conversation. Compared to AssemblyAI, Deepgram's unified API and lower latency give it an edge for interactive voice, but AssemblyAI's docs and SDKs are more beginner-friendly. For batch transcription only, Nova-3 is competitive but you pay per minute. In practice, the Voice Agent API is the real value—skip it if you only need STT.

Researching Deepgram? Get your full AI stack in 60 seconds.

Free, no signup — tell us your goal and get tools matched to your budget & existing stack.

Real-world workflow fit

Concrete scenarios for the personas Deepgram actually fits — and what changes day-one when you adopt it.

Developer building a voice agent

You want to create a multilingual customer support voice bot with natural turn-taking.

Outcome: Use Deepgram's Voice Agent API with Flux Multilingual. You get STT, TTS, and LLM orchestration in one endpoint. The built-in turn detection handles interruptions naturally. You can be live within hours.

Contact center analyst

You need to transcribe 10,000 hours of recorded calls per year and analyze sentiment.

Outcome: Use Nova-3 batch transcription (monolingual $0.0048/min) plus Audio Intelligence API. Deepgram's batch processing at scale with diarization and sentiment analysis gives you a full transcript and insights.

Platform partner embedding Voice AI

You want to add voice features to your SaaS product for multiple customers.

Outcome: Join the Deepgram Partner Program for API integration. Use the unified Voice Agent API to add STT/TTS without stitching components. Your customers get a seamless voice interface.

Use Cases

Build real-time voice agents for customer support with natural turn-taking
Transcribe live meetings with speaker labels using Nova-3
Analyze call center recordings for sentiment and compliance
Generate captions for video content with low latency
Create multilingual voice assistants with Flux conversational STT

Models Under the Hood

FluxNova-3Aura-2Aura-1

as of 2026-07-05

Limitations

Accuracy can vary by accent and domain compared to some competitors.
Free tier limited to $200 credit; no perpetual free tier.
Concurrency limits on lower tiers: STT up to 50 REST, 150 WSS on Pay-as-you-go, up to 225 WSS on Growth.
Self-hosted and custom models require Enterprise plan.
API-first design has a learning curve for non-developers.

as of 2026-06-30

12-month cost

Project the real annual outlay, including the implied monthly cost when only an annual tier is published.

Plan

Annual total

Free

Over 12 months

Effective monthly

Free

Billed monthly

Vendor list price only. Add-on usage, seat overages, and contract minimums are surfaced under Hidden costs & gotchas.

Plans compared

For each published Deepgram tier: who it actually fits, and what it adds vs. the previous tier. Cross-reference the cost calculator above for projected annual outlay.

Pay As You Go

$0/mo ($200 free credit)

Ideal for

Developers and startups exploring Voice AI with no upfront commitment

What this tier adds

Free $200 credit then pay-as-you-go; no minimums, no expiration

Growth

$4K+/year pre-paid credits

Ideal for

Growing applications with predictable usage that want to save up to 20%

What this tier adds

Pre-paid $4K+/year credits with higher concurrency (WSS STU up to 225, TTS up to 60, Voice Agent up to 60)

Enterprise

Contact Sales

Ideal for

Large businesses needing custom concurrency, self-hosting, or custom models

What this tier adds

Contact Sales for custom models, self-hosted deployment, and dedicated support with custom SLAs

Integrations

Amazon ConnectTwilioAsteriskPython SDKNode.js SDKGo SDK.NET SDKJava SDKREST APIWebSocketPipecatLiveKitGoogle Dialogflow CXGenesysAudioCodes

Hidden costs & gotchas

What the public pricing page doesn't put in bold. Captured from pricing-page footnotes, contract terms, and recurring complaints.

Going over concurrency limits on Pay As You Go (STT: 50 REST/150 WSS) requires Growth or Enterprise, which costs $4K+/year.
Custom model training and self-hosted deployment are only available on Enterprise plan; pricing is not public.
Voice Agent API usage is billed per minute of WebSocket connection time, including idle time—costs can accumulate during pauses.
Add-ons like Redaction and Keyterm Prompting cost extra ($0.002/min and $0.0013/min respectively on Pay As You Go).

Where the pricing makes sense

The company stage and team size where Deepgram's pricing actually pencils out — and where peers do it cheaper.

Deepgram's Pay As You Go tier with a $200 free credit suits small teams and startups. Growth saves up to 20% for growing apps ($4K+/year). Enterprise offers custom concurrency and self-hosting for large deployments. Compared to AssemblyAI and Google Cloud STT, Deepgram's unified API and lower per-minute rates (e.g., Nova-3 at $0.0048/min streaming) can be cheaper at scale.

Setup time & first value

How long it actually takes to get something useful out of Deepgram — broken out by persona, not the marketing-page minute.

Developers can start with Deepgram's API in minutes: sign up, get a free API key, and use the API Playground. For production, expect a few hours to integrate the Voice Agent API. Self-hosting or custom models (Enterprise) may take weeks.

Switching to or from Deepgram

How to bring data in from common predecessors and how to get it back out — written for the switcher, not the buyer.

Migrating in

→From AssemblyAI: Replace your AssemblyAI endpoint with Deepgram's unified Voice Agent API for STT+TTS+LLM—no separate orchestration needed.
→From Google Cloud Speech-to-Text: Switch to Deepgram's Nova-3 for lower latency and simpler billing—use the REST/WebSocket API.
→From Rev.ai: Migrate your batch transcription pipeline to Deepgram's Nova-3 batch endpoint with diarization.

Migrating out

↗To AssemblyAI: Export transcript data via Deepgram API, then import into AssemblyAI's API.
↗To Google Cloud Speech-to-Text: Replace Deepgram calls with Google's API for STT/TTS; note higher latency.
↗To Whisper (self-hosted): Use Deepgram's Whisper Cloud endpoint for compatibility, or train custom Whisper models.

Resources & Guides

Frequently Asked Questions

Tools that pair well with Deepgram

Common stack mates teams adopt alongside Deepgram, with the specific reason each pairing earns its keep.

AssemblyAI

Speech-to-text and voice agent APIs for building production-ready voice AI.

ElevenLabs

Ultra-realistic AI voice generation, cloning, and conversational agents platform.

Whisper Memos

AI voice recorder for iPhone & Apple Watch with email summaries and intelligent routing via Agents.

Featured Head-to-Head Comparisons

Assemblyai vs Deepgram

Deepgram vs Whisper

Alternatives to Deepgram

View all

AssemblyAI

Speech-to-text and voice agent APIs for building production-ready voice AI.

FreemiumTry

ElevenLabs

Ultra-realistic AI voice generation, cloning, and conversational agents platform.

FreemiumTry

Whisper Memos

AI voice recorder for iPhone & Apple Watch with email summaries and intelligent routing via Agents.

PaidTry

Used Deepgram? Help shape our editorial sentiment research.

Deepgram

Freemium

Real-time STT, TTS, and Voice Agent APIs for developers.

By Tanmay Verma, Founder · Last verified 06 Jul 2026

6.3k views

Added 4/3/2026

95/100Safe Bet

Visit Website

In short

Compared withvs Assemblyai vs Whisper