One-page case study
RAG-powered Podcast Generator
AI podcast engine that turns PDFs into multi-host audio with transcripts. Flask + Ollama for transcript generation, FAISS retrieval, MARS5-TTS on Apple M4 Metal for voices. Live as the "PDF to Podcast" tool in the AI Lab.
Proof Points
Multi-voice narration pipeline
RAG + OCR workflow
One-job concurrency on M4 GPU
Public web UI proxied to private Mac Mini backend
Challenges
- • Extracting clean text from PDFs
- • Voice assignment to characters
- • Bounding generation time on a single-GPU host
- • Safely exposing a Mac Mini service to a public web tool
Learnings
- • Text-to-speech synthesis on Metal
- • Prompt engineering for dialogue
- • Single-tenant job queueing across HTTP
- • CORS + Tailscale-fronted private services
Stack
PythonFlaskOllamaFAISSMARS5-TTSPyMuPDFpydubNext.js