SCAI - AI Voice Calling Agent

The Problem

Businesses needed automated, natural-sounding voice interactions to handle high-volume customer calls across multiple languages. Traditional IVR systems felt robotic and couldn't handle nuanced conversations in Hindi and Mexican Spanish, leading to poor customer satisfaction and high abandonment rates.

The challenge was to build a real-time conversational AI pipeline that could process speech-to-text, understand intent, generate intelligent responses, and convert them back to natural-sounding speech — all within a 2-second latency window.

System Architecture

Voice Input

STT Service

Conversational AI

TTS Engine

Voice Output

My Contributions

Backend API Development

Built Flask microservices powering the voice pipeline
Designed RESTful APIs for call management and analytics
Achieved sub-2-second end-to-end response latency

AI Pipeline Integration

Owned end-to-end STT, TTS, and conversational AI workflows
Integrated AWS, Google, and 11 Labs speech services
Achieved ~90-95% accuracy across Hindi and Mexican Spanish

Infrastructure & Deployment

Deployed on AWS EC2 with Docker containerization
Configured Nginx for load balancing and SSL termination
Handled live production issue resolution

Key Metrics

<2

Second Latency

0

% Accuracy

0

Languages Supported

Tech Stack Deep Dive

Python & Flask

Core microservice framework for all API endpoints and business logic

AWS (EC2, S3)

Cloud infrastructure for compute, storage, and speech services

Docker

Containerized deployments for consistent environments across dev and prod

AI/ML Services

STT, TTS, and conversational AI via AWS, Google, and 11 Labs APIs

Back to All Projects