How to scope a Voice AI project in 5 steps

Disclosure: This post contains affiliate links, including a link to Vapi. If you click through and sign up for a paid plan, I may earn a commission at no extra cost to you. I only recommend platforms I have personally evaluated. Full affiliate disclosure here.
Home Voice AI How to scope a Voice AI project in 5 steps
PM Story

How to scope a Voice AI project in 5 steps

P
Priyanka
Senior Voice AI PM  ·  April 17, 2026  ·  10 min read  ·  1,950 words
PM story Voice AI Project scoping
The short answer

Scoping a Voice AI project correctly in week one prevents three to six weeks of rework later. The five steps are: define the exact call type and outcome, quantify the volume and cost baseline, map the integration dependencies, define success criteria before choosing a platform, and scope the escalation design before scoping the AI. Most Voice AI projects that fail do so because at least one of these five steps was skipped, abbreviated, or done in the wrong order.

Voice AI projects have a scoping problem. The technology is compelling enough that stakeholders want to move fast - and "fast" usually means skipping the scoping work that determines whether the project will actually succeed. The demo looked good. The platform has an API. How hard can it be?

The answer, consistently, is: harder than the demo suggests and easier than it eventually becomes when scoping is done properly upfront. This guide is the five-step scoping framework I use on every Voice AI project. It is not a methodology for its own sake - it is the specific sequence of questions that, when answered correctly, determines everything from platform choice to go-live timeline.

5
scoping steps - in this order
Wk 3
when skipped scoping surfaces as problems
1
question that changes every other answer

Step 1 - Define the exact call type and the single outcome it needs to achieve

The most common scoping mistake is defining the use case too broadly. "Automate our customer service calls" is not a scoping statement - it is a business ambition. A scoping statement describes one specific call type with one specific outcome. "Automate inbound calls where the caller wants to reschedule an existing appointment, with the outcome that the appointment is rescheduled in the booking system before the call ends" is a scoping statement.

The reason precision matters at this stage: every other decision in the project flows from this definition. The STT model you choose depends on the domain vocabulary of this specific call type. The LLM prompt design depends on the single outcome the call needs to achieve. The integration dependencies depend on which systems the AI needs to touch to complete that outcome. A broad definition produces an architecture that is half-designed for five different use cases and fully designed for none of them.

The question to answer in Step 1
Complete this sentence: "The AI will handle calls where the caller wants to [specific caller intent], and the call is successful when [specific measurable outcome] has been achieved before the call ends." If you cannot complete this sentence with one specific intent and one specific outcome, your scope is too broad. Narrow it until you can.

Step 2 - Quantify the volume and establish the cost baseline

Before choosing a platform, before designing a conversation flow, before writing a single prompt - you need to know two numbers: how many calls of this specific type happen per month, and what each one currently costs to handle.

Volume determines whether the economics of Voice AI make sense for this use case. A use case with 50 calls per month will almost never justify a Voice AI deployment - the development and maintenance cost exceeds the savings from automation. A use case with 5,000 calls per month almost always justifies it, assuming reasonable call complexity. The threshold varies by organisation but 500–1,000 calls per month is typically where the economics start to work.

Cost baseline establishes what you are replacing. If a human-handled call costs £4.80 and the AI-handled equivalent costs £1.20, you have a clear saving to model. If you do not know the current cost per call, you cannot build a business case and you cannot measure success after go-live. The cost baseline is also your negotiating anchor when procurement asks for ROI projections.

How to calculate current cost per call
Agent fully loaded hourly cost (salary + NI + benefits + overhead) ÷ 60 = cost per minute
Average handle time for this specific call type in minutes
Cost per call = cost per minute × average handle time
Example: £28,000/year agent = ~£18/hr = £0.30/min. 8-minute average handle time = £2.40/call fully loaded before overhead. Add 40% overhead = ~£3.36/call true cost.

Step 3 - Map every integration dependency before writing a prompt

Voice AI projects fail in integration, not in conversation design. The LLM will handle the conversation. The API connections to the systems the AI needs to complete the outcome - the CRM, the booking system, the order management platform, the knowledge base - are where projects stall, run over budget, and miss go-live dates.

In Step 3, map every system the AI must interact with during a call. For each system, answer four questions: does it have an API that supports real-time calls (not batch processing)? What is the authentication mechanism? What is the API's typical response latency under production load? Who owns the API and what is their process for granting access?

The fourth question - API ownership and access process - is frequently the longest-duration item on the entire project plan. In large organisations, getting API access through internal security review can take four to eight weeks. Discovering this in week three, when the conversation design is already complete, extends the project by two months. Discover it in week one by asking the question in week one.

The integration question most PMs ask too late
"What is the API response time under peak load?" - not average load, peak load. An API that averages 200ms under normal conditions can spike to 2,000ms during peak periods. That 1,800ms spike adds directly to your Voice AI turn latency during exactly the moments when call volume is highest. Test API performance under load in staging before you commit to a go-live date.

Step 4 - Define success criteria before choosing a platform

Most teams choose a Voice AI platform first, then define success criteria that the chosen platform can meet. This is backwards. Success criteria should drive platform selection, not follow it.

Define four success metrics before you open a platform website. First, the containment rate: what percentage of calls must the AI resolve without human escalation? Second, the latency ceiling: what is the maximum acceptable turn latency before callers perceive the AI as broken? Third, the STT accuracy floor: what is the minimum word accuracy on domain-specific vocabulary before transcription errors break the call flow? Fourth, the CSAT target: what caller satisfaction score defines success?

Once these four numbers are defined, platform selection becomes a matching exercise - not a preference exercise. A platform that cannot consistently deliver sub-600ms P95 latency does not meet your latency ceiling regardless of how impressive the demo sounds. A platform whose default STT model achieves 89% accuracy on your domain vocabulary does not meet your accuracy floor even if it is 94% accurate on generic speech.

Step 5 - Scope the escalation design before scoping the AI

From my experience

The most consistent predictor of a smooth Voice AI go-live is not the quality of the AI conversation design. It is the quality of the escalation design. On every project where go-live was delayed or where a client escalation occurred in the first two weeks of production, the root cause traced back to an escalation path that had not been fully designed or tested.

What I do now: I scope the escalation design in the same week as the use case definition - Step 5 runs in parallel with Step 1, not after Step 4. The escalation design covers: what triggers a transfer to a human agent, how context is passed to the receiving agent, what happens when no human is available, and what the caller experience is at every failure point. Until these questions are answered, the AI conversation design is incomplete - because the AI's behaviour at the boundary of its capability defines the caller's experience more than its behaviour within scope.

Escalation design covers five scenarios that must each have a documented answer before go-live. What happens when the AI cannot understand the caller after two attempts? What happens when the caller explicitly asks for a human agent? What happens when the API the AI needs is unavailable? What happens when the call arrives outside business hours? What happens when the caller is distressed or says something the AI is not equipped to handle?

Each scenario needs a specific, tested response - not a general principle. "Transfer to a human agent" is not a sufficient answer if there is no human agent available at 11pm, or if the SIP transfer to the contact centre queue has not been tested, or if the agent receiving the transfer has no context about the conversation that just occurred.

The one-page scoping document - fill this in before kick-off

Voice AI project scoping template
Use case definition
The AI will handle calls where the caller wants to [intent]. Success = [outcome] achieved before call ends.
Volume and cost baseline
Monthly call volume: ___  ·  Current cost per call: £___  ·  Target AI cost per call: £___  ·  Projected monthly saving: £___
Integration dependencies
Systems the AI must access: [list]  ·  API availability confirmed: Y/N  ·  Access approval timeline: ___ weeks  ·  Peak API latency: ___ms
Success criteria
Containment rate target: ___%  ·  Max turn latency (P95): ___ms  ·  STT accuracy floor: ___%  ·  CSAT target: ___
Escalation design
Escalation triggers: [list]  ·  Transfer destination: ___  ·  Out-of-hours path: ___  ·  Context passing method: ___  ·  All scenarios tested: Y/N

"Voice AI projects do not fail because the AI could not handle the conversation. They fail because the escalation was not designed, the integration latency was not measured, or the success criteria were not defined before the platform was chosen."

- The pattern I see in every Voice AI project post-mortem
Platform for validating your scope before committing
V
Vapi - Voice AI Platform
Free tier for scoping validation  ·  Function calling for API integration testing  ·  Per-turn latency logs  ·  STT provider swapping  ·  Bring-your-own SIP
Vapi's free tier is useful specifically at the scoping stage - before any commercial commitment - because it lets you test API integration latency against your real systems, validate STT accuracy on domain vocabulary, and measure actual turn latency on your network. Use it to validate Steps 2, 3, and 4 of the scoping framework before presenting your platform recommendation to stakeholders.
Try Vapi free affiliate link

The week you spend scoping saves the month you lose rebuilding

A properly scoped Voice AI project takes approximately two weeks of discovery work before any technical build begins. During those two weeks you define the use case precisely, establish the cost baseline, map the integration dependencies, set measurable success criteria, and design the escalation paths. By the end of that fortnight, every significant decision in the project is already made - the technical build is execution, not design.

An improperly scoped Voice AI project skips those two weeks and starts building immediately. By week three, the integration dependency that needed eight weeks of internal approval has been discovered. By week five, the success criteria that should have guided platform selection are being retrofitted onto a platform that cannot quite meet them. By week eight, the escalation design that was supposed to be straightforward requires a clinical protocol sign-off that nobody anticipated. The two weeks saved at the start cost six weeks at the end - and the project that was supposed to go live in ten weeks goes live in sixteen.

Scoping a Voice AI project right now?

I write every week on Voice AI from real deployments. Get in touch if you want a second opinion on your scope or a review of your success criteria before kick-off.

Join this blog
Follow Voice AI Insider on Blogger

Follow with your Google account and get new posts in your Blogger reading list automatically.

Tags
Voice AI Project scoping Product management Enterprise AI PM guide
P
Priyanka
Senior Voice AI PM  ·  Voice AI Insider
I scope and manage Voice AI projects from discovery to go-live. The five steps in this post are the framework I use on every project - built from watching what goes wrong when any one of them is skipped.

Comments