Disclosure: This post contains affiliate links, including a link to Vapi. If you click through and sign up for a paid plan, I may earn a commission at no extra cost to you. I only recommend platforms I have personally evaluated. Full affiliate disclosure here.

Home › SIP Telephony › How Voice AI handles DTMF: the complete guide

SIP Explainer

How Voice AI handles DTMF: the complete guide

Priyanka

Senior Voice AI PM · April 14, 2026 · 10 min read · 1,900 words

SIP telephony Voice AI DTMF

The short answer

DTMF - the tones generated when a caller presses keys on their phone - can be transmitted in three different ways over a SIP connection, and Voice AI systems need to be configured for the right one. The wrong DTMF mode means keypad presses are silently ignored - the caller hears nothing, the AI receives nothing, and the interaction fails without a visible error. This guide covers all three modes, how to configure them, and how to diagnose and fix DTMF failures in production.

DTMF is one of the most overlooked configuration details in Voice AI deployments. It rarely comes up in platform demos because demos are voice-only. It only becomes visible when a caller tries to press a key — to enter a PIN, navigate a menu, or confirm a choice - and nothing happens. No error. No feedback. The call just continues as if the keypress never occurred.

I have debugged DTMF failures on three separate enterprise deployments. In every case the root cause was the same: the SIP trunk and the Voice AI platform were configured for different DTMF transmission modes and neither side flagged the mismatch during the call setup. This guide is what I wish I had read before the first of those three incidents.

DTMF transmission modes

error messages on mode mismatch

RFC 2833

the mode to use in 2026

What DTMF is and why Voice AI needs to handle it

DTMF stands for Dual-Tone Multi-Frequency. When you press a key on a telephone keypad, your phone generates a unique combination of two audio tones - one from a low-frequency group and one from a high-frequency group. The combination identifies which key was pressed. Pressing 1 generates 697 Hz + 1209 Hz. Pressing # generates 941 Hz + 1477 Hz. These dual-tone signals travel over the phone line and can be detected by any system listening for them.

Voice AI systems need DTMF support for several common use cases. PIN entry and account verification - where the caller keys in their date of birth or account number rather than speaking it - relies entirely on DTMF. Legacy IVR menu navigation, where pressing 1 means "confirm" and pressing 2 means "cancel", uses DTMF. Payment card entry, where sensitive data is collected via keypad to avoid speech recognition capturing card numbers, is DTMF-only. Any Voice AI deployment that needs to collect structured numerical input or integrate with existing IVR infrastructure will encounter DTMF.

The three DTMF transmission modes - and which to use

When DTMF travels over a SIP connection, it can be encoded and transmitted in three different ways. Understanding which mode your SIP trunk uses and which mode your Voice AI platform expects is the entire DTMF configuration problem.

Mode 1 - In-band DTMF (the legacy approach)

In-band DTMF transmits the actual audio tones - the dual-frequency sounds - as part of the regular voice audio stream. When a caller presses 5, the 770 Hz + 1336 Hz tone combination is embedded directly in the RTP audio packets alongside their voice.

The problem: Modern audio codecs - particularly G.729 and Opus - use lossy compression that can distort or destroy the precise frequency combinations that DTMF relies on. A 770 Hz tone that gets slightly shifted or attenuated by codec compression may not be recognised at the receiving end. In-band DTMF is unreliable with compressed codecs and is the legacy approach that causes the most silent failures in modern deployments.

Mode 2 - RFC 2833 / RFC 4733 out-of-band DTMF (the recommended standard)

RFC 2833 (updated as RFC 4733) transmits DTMF events as separate RTP packets, completely independent of the voice audio stream. When a caller presses a key, a dedicated RTP event packet is generated with a specific payload type (typically payload type 101) that signals "this is a DTMF event, not audio."

Why it works: Because DTMF events travel in their own packets and are not subject to audio codec compression, they survive codec negotiation intact. RFC 2833 is the industry standard and the mode supported by virtually every modern SIP trunk provider. This is the mode you should configure for every Voice AI deployment in 2026.

Mode 3 - SIP INFO out-of-band DTMF (the enterprise legacy)

SIP INFO transmits DTMF events as SIP signalling messages rather than RTP packets. When a caller presses a key, a SIP INFO message is sent through the signalling channel (not the media channel) containing the key value.

When you encounter it: SIP INFO is used by some legacy enterprise PBX systems and older contact centre platforms. If your client has a Cisco or Avaya PBX from 2010–2015, it may send SIP INFO DTMF. Your Voice AI platform needs to be configured to receive it. The mismatch between a platform expecting RFC 2833 and a PBX sending SIP INFO is a common cause of DTMF failures in enterprise integrations.

How DTMF mode is negotiated during call setup

When a SIP call is set up, the two endpoints exchange SDP (Session Description Protocol) messages that describe the media parameters they support - including DTMF. RFC 2833 DTMF support is declared in the SDP as a telephone-event entry in the payload type list:

a=rtpmap:101 telephone-event/8000

a=fmtp:101 0-16

This tells the receiving side: "I will send DTMF events using RTP payload type 101, for keys 0 through 16 (the standard keypad plus A, B, C, D, *, #)." If your Voice AI platform's SDP offer includes this line and your SIP trunk's SDP answer acknowledges it, RFC 2833 DTMF will work correctly. If the SDP does not include this declaration, the trunk may fall back to in-band DTMF - which will fail silently with compressed codecs.

The most common misconfiguration

The Voice AI platform is configured for RFC 2833 but the SIP trunk is a legacy carrier that defaults to in-band. The SDP is exchanged, the call connects, and keypad presses produce no response. There is no error message. The SIP 200 OK was clean. The call is live. But DTMF events from the caller never reach the Voice AI application. Always verify DTMF mode compatibility with your carrier before go-live, not during UAT when a non-technical tester might not think to test keypad input.

What DTMF failure looks like in a real deployment

From my experience

On one financial services deployment, the Voice AI agent was configured to collect a 6-digit PIN for account verification - the caller would speak to the AI, which would then say "please enter your PIN using your keypad." Everything tested cleanly in the development environment using softphone clients, which sent RFC 2833 by default.

In production, callers on mobile networks were pressing their PINs and hearing silence. The AI was waiting for PIN input that never arrived. After 10 seconds of silence it repeated the prompt. Callers pressed again. More silence. Most callers gave up after two attempts. We saw a 34% abandonment rate on the PIN entry step in the first two days of go-live, which triggered a client escalation on day three.

The diagnosis: The carrier handling mobile-originated calls was sending in-band DTMF. The G.729 codec negotiated on that route was compressing the tones beyond recognition. The fix was to add G.711 (PCMU) as the preferred codec for inbound calls from that carrier and to explicitly declare RFC 2833 support in our SDP configuration. DTMF detection went from 0% to 99.2% on the affected route within 24 hours of the fix. The lesson: always test DTMF from a real mobile phone on the real carrier, not from a softphone on your office network.

How to configure DTMF correctly in your Voice AI platform

The exact configuration steps vary by platform, but the principles are consistent across Vapi, Retell AI, Bland AI, and any custom SIP implementation.

Step 1 - Confirm your SIP trunk's DTMF mode

Before touching platform settings, confirm what DTMF mode your carrier uses. For Twilio: RFC 2833 is the default and recommended setting - verify this in your SIP trunk's Voice settings under "DTMF Type." For Plivo: RFC 2833 is default. For Vonage: supports both RFC 2833 and SIP INFO - verify which is active on your account. For legacy enterprise PBX: check the PBX documentation or call the vendor's support line - many Cisco and Avaya systems default to SIP INFO.

Step 2 - Set matching DTMF mode in your Voice AI platform

In Vapi: navigate to your phone number settings → SIP Configuration → DTMF Mode → select RFC 2833. Ensure payload type 101 is declared. In your Twilio SIP trunk: Voice → General → DTMF → select RFC 2833. The two settings must match. If your PBX uses SIP INFO and your platform supports it, configure both sides for SIP INFO - consistency is more important than which mode you choose.

Step 3 - Use G.711 (PCMU) codec on routes that use DTMF

Even with RFC 2833 configured, using G.729 or Opus as the audio codec introduces risk if the carrier has any legacy systems in the routing chain that fall back to in-band DTMF. For deployments where DTMF is critical — PIN entry, payment card collection, IVR navigation - set G.711 PCMU as the preferred (and if possible only) audio codec. G.711 does not compress the audio stream and therefore preserves in-band DTMF tones even in fallback scenarios.

Step 4 - Handle DTMF in your system prompt and call flow

DTMF events received by your Voice AI platform are passed to the AI layer as structured input - typically as a function call result or a special event type depending on your platform. Your system prompt needs to handle this explicitly: define what the AI should do when it receives a DTMF sequence, how long to wait for input, how many digits to collect before proceeding, and what to do if the input is incomplete or invalid. An AI that receives DTMF events but has no instructions for how to handle them will behave unpredictably.

DTMF testing checklist before go-live

Test every item before any go-live involving keypad input

☐ Test from a real mobile phone on the production carrier - not from a softphone or desktop client

☐ Test all 12 keys: 0-9, *, # - not just the numbers your application uses

☐ Test rapid key entry - 6 digits entered quickly should all be captured

☐ Test DTMF while the AI is speaking - barge-in with keypad input should work

☐ Test invalid input - entering letters or extra digits should trigger a graceful reprompt

☐ Test timeout - if no input is received within 10 seconds, the AI should reprompt or escalate

☐ Verify DTMF mode match in SDP by capturing a SIP trace and checking the telephone-event payload

☐ Test on at least two different mobile carriers - DTMF behaviour can vary by network operator

"DTMF failures are the silent killers of Voice AI deployments. The call succeeds. The AI works. The SIP connection is clean. But the caller pressing keys gets nothing back - and they just give up."

- What I now say in every pre-go-live briefing where keypad input is involved

Quick reference: DTMF mode by carrier and platform

Provider	Default mode	Notes
Twilio	RFC 2833	Configurable - verify in SIP trunk Voice settings
Plivo	RFC 2833	Default - generally reliable on US/UK routes
Vonage	RFC 2833 / SIP INFO	Configurable - confirm with account team
Cisco PBX (legacy)	SIP INFO	May need explicit RFC 2833 configuration in dial-peer
Avaya PBX (legacy)	In-band or SIP INFO	Highly version-dependent - test on real hardware
Vapi	RFC 2833	Configurable per phone number in SIP settings
Retell AI / Bland AI	RFC 2833	Check platform docs for SIP INFO support status

The two-minute DTMF check that saves days of debugging

Before any Voice AI deployment where keypad input is required, do this: call your test number from a real mobile phone on the production carrier, wait for the AI to ask for keypad input, and press 1. If the AI responds — DTMF is working. If the AI repeats the prompt or goes silent - you have a DTMF mode mismatch to fix before go-live.

That two-minute test, run from the right device on the right carrier, would have caught every DTMF failure I have encountered in production. It is now the first thing on my DTMF testing checklist for every deployment - and it should be on yours too.

Platform with configurable DTMF support

Vapi - Voice AI Platform

RFC 2833 DTMF support · Configurable per phone number · SIP trace access · Bring-your-own SIP trunk · Free tier

Vapi's per-phone-number DTMF configuration means you can set different DTMF modes for different SIP trunks within the same account - useful when your deployment connects to both a modern carrier (RFC 2833) and a legacy PBX (SIP INFO). The SIP trace access in Vapi's dashboard is what made it possible to diagnose the PIN entry failure described above — being able to see the telephone-event payload type in the SDP was the key diagnostic step.

Try Vapi free affiliate link

More plain-English SIP telephony guides

I publish every week on Voice AI and SIP telephony from real enterprise deployments. Get in touch if you are working through a DTMF issue and need a second opinion.

About this blog Get in touch

Join this blog

Follow Voice AI Insider on Blogger

Follow with your Google account and get new posts in your Blogger reading list automatically.

Follow this blog

Build better Voice AI products.
Faster than your competitors.

Search This Blog

VOICEAIPM