SCAI
AI-Powered Voice Calling Agent
The Problem
Businesses needed automated, natural-sounding voice interactions to handle high-volume customer calls across multiple languages. Traditional IVR systems felt robotic and couldn't handle nuanced conversations in Hindi and Mexican Spanish, leading to poor customer satisfaction and high abandonment rates.
The challenge was to build a real-time conversational AI pipeline that could process speech-to-text, understand intent, generate intelligent responses, and convert them back to natural-sounding speech — all within a 2-second latency window.
System Architecture
My Contributions
Backend API Development
- Built Flask microservices powering the voice pipeline
- Designed RESTful APIs for call management and analytics
- Achieved sub-2-second end-to-end response latency
AI Pipeline Integration
- Owned end-to-end STT, TTS, and conversational AI workflows
- Integrated AWS, Google, and 11 Labs speech services
- Achieved ~90-95% accuracy across Hindi and Mexican Spanish
Infrastructure & Deployment
- Deployed on AWS EC2 with Docker containerization
- Configured Nginx for load balancing and SSL termination
- Handled live production issue resolution
Key Metrics
Tech Stack Deep Dive
Python & Flask
Core microservice framework for all API endpoints and business logic
AWS (EC2, S3)
Cloud infrastructure for compute, storage, and speech services
Docker
Containerized deployments for consistent environments across dev and prod
AI/ML Services
STT, TTS, and conversational AI via AWS, Google, and 11 Labs APIs