How to scope a Voice AI project in 5 steps
How to scope a Voice AI project in 5 steps
Scoping a Voice AI project correctly in week one prevents three to six weeks of rework later. The five steps are: define the exact call type and outcome, quantify the volume and cost baseline, map the integration dependencies, define success criteria before choosing a platform, and scope the escalation design before scoping the AI. Most Voice AI projects that fail do so because at least one of these five steps was skipped, abbreviated, or done in the wrong order.
Voice AI projects have a scoping problem. The technology is compelling enough that stakeholders want to move fast - and "fast" usually means skipping the scoping work that determines whether the project will actually succeed. The demo looked good. The platform has an API. How hard can it be?
The answer, consistently, is: harder than the demo suggests and easier than it eventually becomes when scoping is done properly upfront. This guide is the five-step scoping framework I use on every Voice AI project. It is not a methodology for its own sake - it is the specific sequence of questions that, when answered correctly, determines everything from platform choice to go-live timeline.
Step 1 - Define the exact call type and the single outcome it needs to achieve
The most common scoping mistake is defining the use case too broadly. "Automate our customer service calls" is not a scoping statement - it is a business ambition. A scoping statement describes one specific call type with one specific outcome. "Automate inbound calls where the caller wants to reschedule an existing appointment, with the outcome that the appointment is rescheduled in the booking system before the call ends" is a scoping statement.
The reason precision matters at this stage: every other decision in the project flows from this definition. The STT model you choose depends on the domain vocabulary of this specific call type. The LLM prompt design depends on the single outcome the call needs to achieve. The integration dependencies depend on which systems the AI needs to touch to complete that outcome. A broad definition produces an architecture that is half-designed for five different use cases and fully designed for none of them.
Step 2 - Quantify the volume and establish the cost baseline
Before choosing a platform, before designing a conversation flow, before writing a single prompt - you need to know two numbers: how many calls of this specific type happen per month, and what each one currently costs to handle.
Volume determines whether the economics of Voice AI make sense for this use case. A use case with 50 calls per month will almost never justify a Voice AI deployment - the development and maintenance cost exceeds the savings from automation. A use case with 5,000 calls per month almost always justifies it, assuming reasonable call complexity. The threshold varies by organisation but 500–1,000 calls per month is typically where the economics start to work.
Cost baseline establishes what you are replacing. If a human-handled call costs £4.80 and the AI-handled equivalent costs £1.20, you have a clear saving to model. If you do not know the current cost per call, you cannot build a business case and you cannot measure success after go-live. The cost baseline is also your negotiating anchor when procurement asks for ROI projections.
Step 3 - Map every integration dependency before writing a prompt
Voice AI projects fail in integration, not in conversation design. The LLM will handle the conversation. The API connections to the systems the AI needs to complete the outcome - the CRM, the booking system, the order management platform, the knowledge base - are where projects stall, run over budget, and miss go-live dates.
In Step 3, map every system the AI must interact with during a call. For each system, answer four questions: does it have an API that supports real-time calls (not batch processing)? What is the authentication mechanism? What is the API's typical response latency under production load? Who owns the API and what is their process for granting access?
The fourth question - API ownership and access process - is frequently the longest-duration item on the entire project plan. In large organisations, getting API access through internal security review can take four to eight weeks. Discovering this in week three, when the conversation design is already complete, extends the project by two months. Discover it in week one by asking the question in week one.
Step 4 - Define success criteria before choosing a platform
Most teams choose a Voice AI platform first, then define success criteria that the chosen platform can meet. This is backwards. Success criteria should drive platform selection, not follow it.
Define four success metrics before you open a platform website. First, the containment rate: what percentage of calls must the AI resolve without human escalation? Second, the latency ceiling: what is the maximum acceptable turn latency before callers perceive the AI as broken? Third, the STT accuracy floor: what is the minimum word accuracy on domain-specific vocabulary before transcription errors break the call flow? Fourth, the CSAT target: what caller satisfaction score defines success?
Once these four numbers are defined, platform selection becomes a matching exercise - not a preference exercise. A platform that cannot consistently deliver sub-600ms P95 latency does not meet your latency ceiling regardless of how impressive the demo sounds. A platform whose default STT model achieves 89% accuracy on your domain vocabulary does not meet your accuracy floor even if it is 94% accurate on generic speech.
Step 5 - Scope the escalation design before scoping the AI
The most consistent predictor of a smooth Voice AI go-live is not the quality of the AI conversation design. It is the quality of the escalation design. On every project where go-live was delayed or where a client escalation occurred in the first two weeks of production, the root cause traced back to an escalation path that had not been fully designed or tested.
What I do now: I scope the escalation design in the same week as the use case definition - Step 5 runs in parallel with Step 1, not after Step 4. The escalation design covers: what triggers a transfer to a human agent, how context is passed to the receiving agent, what happens when no human is available, and what the caller experience is at every failure point. Until these questions are answered, the AI conversation design is incomplete - because the AI's behaviour at the boundary of its capability defines the caller's experience more than its behaviour within scope.
Escalation design covers five scenarios that must each have a documented answer before go-live. What happens when the AI cannot understand the caller after two attempts? What happens when the caller explicitly asks for a human agent? What happens when the API the AI needs is unavailable? What happens when the call arrives outside business hours? What happens when the caller is distressed or says something the AI is not equipped to handle?
Each scenario needs a specific, tested response - not a general principle. "Transfer to a human agent" is not a sufficient answer if there is no human agent available at 11pm, or if the SIP transfer to the contact centre queue has not been tested, or if the agent receiving the transfer has no context about the conversation that just occurred.
The one-page scoping document - fill this in before kick-off
"Voice AI projects do not fail because the AI could not handle the conversation. They fail because the escalation was not designed, the integration latency was not measured, or the success criteria were not defined before the platform was chosen."
- The pattern I see in every Voice AI project post-mortemThe week you spend scoping saves the month you lose rebuilding
A properly scoped Voice AI project takes approximately two weeks of discovery work before any technical build begins. During those two weeks you define the use case precisely, establish the cost baseline, map the integration dependencies, set measurable success criteria, and design the escalation paths. By the end of that fortnight, every significant decision in the project is already made - the technical build is execution, not design.
An improperly scoped Voice AI project skips those two weeks and starts building immediately. By week three, the integration dependency that needed eight weeks of internal approval has been discovered. By week five, the success criteria that should have guided platform selection are being retrofitted onto a platform that cannot quite meet them. By week eight, the escalation design that was supposed to be straightforward requires a clinical protocol sign-off that nobody anticipated. The two weeks saved at the start cost six weeks at the end - and the project that was supposed to go live in ten weeks goes live in sixteen.
Scoping a Voice AI project right now?
I write every week on Voice AI from real deployments. Get in touch if you want a second opinion on your scope or a review of your success criteria before kick-off.
Follow with your Google account and get new posts in your Blogger reading list automatically.

Comments
Post a Comment