How Voice AI handles DTMF: the complete guide
How Voice AI handles DTMF: the complete guide
DTMF - the tones generated when a caller presses keys on their phone - can be transmitted in three different ways over a SIP connection, and Voice AI systems need to be configured for the right one. The wrong DTMF mode means keypad presses are silently ignored - the caller hears nothing, the AI receives nothing, and the interaction fails without a visible error. This guide covers all three modes, how to configure them, and how to diagnose and fix DTMF failures in production.
DTMF is one of the most overlooked configuration details in Voice AI deployments. It rarely comes up in platform demos because demos are voice-only. It only becomes visible when a caller tries to press a key — to enter a PIN, navigate a menu, or confirm a choice - and nothing happens. No error. No feedback. The call just continues as if the keypress never occurred.
I have debugged DTMF failures on three separate enterprise deployments. In every case the root cause was the same: the SIP trunk and the Voice AI platform were configured for different DTMF transmission modes and neither side flagged the mismatch during the call setup. This guide is what I wish I had read before the first of those three incidents.
What DTMF is and why Voice AI needs to handle it
DTMF stands for Dual-Tone Multi-Frequency. When you press a key on a telephone keypad, your phone generates a unique combination of two audio tones - one from a low-frequency group and one from a high-frequency group. The combination identifies which key was pressed. Pressing 1 generates 697 Hz + 1209 Hz. Pressing # generates 941 Hz + 1477 Hz. These dual-tone signals travel over the phone line and can be detected by any system listening for them.
Voice AI systems need DTMF support for several common use cases. PIN entry and account verification - where the caller keys in their date of birth or account number rather than speaking it - relies entirely on DTMF. Legacy IVR menu navigation, where pressing 1 means "confirm" and pressing 2 means "cancel", uses DTMF. Payment card entry, where sensitive data is collected via keypad to avoid speech recognition capturing card numbers, is DTMF-only. Any Voice AI deployment that needs to collect structured numerical input or integrate with existing IVR infrastructure will encounter DTMF.
The three DTMF transmission modes - and which to use
When DTMF travels over a SIP connection, it can be encoded and transmitted in three different ways. Understanding which mode your SIP trunk uses and which mode your Voice AI platform expects is the entire DTMF configuration problem.
In-band DTMF transmits the actual audio tones - the dual-frequency sounds - as part of the regular voice audio stream. When a caller presses 5, the 770 Hz + 1336 Hz tone combination is embedded directly in the RTP audio packets alongside their voice.
The problem: Modern audio codecs - particularly G.729 and Opus - use lossy compression that can distort or destroy the precise frequency combinations that DTMF relies on. A 770 Hz tone that gets slightly shifted or attenuated by codec compression may not be recognised at the receiving end. In-band DTMF is unreliable with compressed codecs and is the legacy approach that causes the most silent failures in modern deployments.
RFC 2833 (updated as RFC 4733) transmits DTMF events as separate RTP packets, completely independent of the voice audio stream. When a caller presses a key, a dedicated RTP event packet is generated with a specific payload type (typically payload type 101) that signals "this is a DTMF event, not audio."
Why it works: Because DTMF events travel in their own packets and are not subject to audio codec compression, they survive codec negotiation intact. RFC 2833 is the industry standard and the mode supported by virtually every modern SIP trunk provider. This is the mode you should configure for every Voice AI deployment in 2026.
SIP INFO transmits DTMF events as SIP signalling messages rather than RTP packets. When a caller presses a key, a SIP INFO message is sent through the signalling channel (not the media channel) containing the key value.
When you encounter it: SIP INFO is used by some legacy enterprise PBX systems and older contact centre platforms. If your client has a Cisco or Avaya PBX from 2010–2015, it may send SIP INFO DTMF. Your Voice AI platform needs to be configured to receive it. The mismatch between a platform expecting RFC 2833 and a PBX sending SIP INFO is a common cause of DTMF failures in enterprise integrations.
How DTMF mode is negotiated during call setup
When a SIP call is set up, the two endpoints exchange SDP (Session Description Protocol) messages that describe the media parameters they support - including DTMF. RFC 2833 DTMF support is declared in the SDP as a telephone-event entry in the payload type list:
a=fmtp:101 0-16
This tells the receiving side: "I will send DTMF events using RTP payload type 101, for keys 0 through 16 (the standard keypad plus A, B, C, D, *, #)." If your Voice AI platform's SDP offer includes this line and your SIP trunk's SDP answer acknowledges it, RFC 2833 DTMF will work correctly. If the SDP does not include this declaration, the trunk may fall back to in-band DTMF - which will fail silently with compressed codecs.
What DTMF failure looks like in a real deployment
On one financial services deployment, the Voice AI agent was configured to collect a 6-digit PIN for account verification - the caller would speak to the AI, which would then say "please enter your PIN using your keypad." Everything tested cleanly in the development environment using softphone clients, which sent RFC 2833 by default.
In production, callers on mobile networks were pressing their PINs and hearing silence. The AI was waiting for PIN input that never arrived. After 10 seconds of silence it repeated the prompt. Callers pressed again. More silence. Most callers gave up after two attempts. We saw a 34% abandonment rate on the PIN entry step in the first two days of go-live, which triggered a client escalation on day three.
The diagnosis: The carrier handling mobile-originated calls was sending in-band DTMF. The G.729 codec negotiated on that route was compressing the tones beyond recognition. The fix was to add G.711 (PCMU) as the preferred codec for inbound calls from that carrier and to explicitly declare RFC 2833 support in our SDP configuration. DTMF detection went from 0% to 99.2% on the affected route within 24 hours of the fix. The lesson: always test DTMF from a real mobile phone on the real carrier, not from a softphone on your office network.
How to configure DTMF correctly in your Voice AI platform
The exact configuration steps vary by platform, but the principles are consistent across Vapi, Retell AI, Bland AI, and any custom SIP implementation.
Before touching platform settings, confirm what DTMF mode your carrier uses. For Twilio: RFC 2833 is the default and recommended setting - verify this in your SIP trunk's Voice settings under "DTMF Type." For Plivo: RFC 2833 is default. For Vonage: supports both RFC 2833 and SIP INFO - verify which is active on your account. For legacy enterprise PBX: check the PBX documentation or call the vendor's support line - many Cisco and Avaya systems default to SIP INFO.
In Vapi: navigate to your phone number settings → SIP Configuration → DTMF Mode → select RFC 2833. Ensure payload type 101 is declared. In your Twilio SIP trunk: Voice → General → DTMF → select RFC 2833. The two settings must match. If your PBX uses SIP INFO and your platform supports it, configure both sides for SIP INFO - consistency is more important than which mode you choose.
Even with RFC 2833 configured, using G.729 or Opus as the audio codec introduces risk if the carrier has any legacy systems in the routing chain that fall back to in-band DTMF. For deployments where DTMF is critical — PIN entry, payment card collection, IVR navigation - set G.711 PCMU as the preferred (and if possible only) audio codec. G.711 does not compress the audio stream and therefore preserves in-band DTMF tones even in fallback scenarios.
DTMF events received by your Voice AI platform are passed to the AI layer as structured input - typically as a function call result or a special event type depending on your platform. Your system prompt needs to handle this explicitly: define what the AI should do when it receives a DTMF sequence, how long to wait for input, how many digits to collect before proceeding, and what to do if the input is incomplete or invalid. An AI that receives DTMF events but has no instructions for how to handle them will behave unpredictably.
DTMF testing checklist before go-live
"DTMF failures are the silent killers of Voice AI deployments. The call succeeds. The AI works. The SIP connection is clean. But the caller pressing keys gets nothing back - and they just give up."
- What I now say in every pre-go-live briefing where keypad input is involvedQuick reference: DTMF mode by carrier and platform
| Provider | Default mode | Notes |
|---|---|---|
| Twilio | RFC 2833 | Configurable - verify in SIP trunk Voice settings |
| Plivo | RFC 2833 | Default - generally reliable on US/UK routes |
| Vonage | RFC 2833 / SIP INFO | Configurable - confirm with account team |
| Cisco PBX (legacy) | SIP INFO | May need explicit RFC 2833 configuration in dial-peer |
| Avaya PBX (legacy) | In-band or SIP INFO | Highly version-dependent - test on real hardware |
| Vapi | RFC 2833 | Configurable per phone number in SIP settings |
| Retell AI / Bland AI | RFC 2833 | Check platform docs for SIP INFO support status |
The two-minute DTMF check that saves days of debugging
Before any Voice AI deployment where keypad input is required, do this: call your test number from a real mobile phone on the production carrier, wait for the AI to ask for keypad input, and press 1. If the AI responds — DTMF is working. If the AI repeats the prompt or goes silent - you have a DTMF mode mismatch to fix before go-live.
That two-minute test, run from the right device on the right carrier, would have caught every DTMF failure I have encountered in production. It is now the first thing on my DTMF testing checklist for every deployment - and it should be on yours too.
More plain-English SIP telephony guides
I publish every week on Voice AI and SIP telephony from real enterprise deployments. Get in touch if you are working through a DTMF issue and need a second opinion.
Follow with your Google account and get new posts in your Blogger reading list automatically.

Comments
Post a Comment