Voice & Speech
Speech-to-text, text-to-speech, voice cloning, voice agents, and audio AI. From open-source toolkits like Whisper and SpeechBrain to production voice agent platforms.

Librosa
Python library for audio and music analysis, useful for AI voice processing.

OpenSMILE
Open-source feature extraction toolkit for audio, for AI emotion recognition.

Pyannote
Python library for speaker diarization, for AI audio processing.

Pyannote Audio
Python library for speaker diarization and audio analysis.

Whisper
Speech recognition model by OpenAI

SpeechBrain
PyTorch-based speech toolkit

DeepSpeech
Speech-to-text engine using a model trained by machine learning techniques

Kaldi
Speech recognition toolkit

ElevenLabs
AI voice generation and cloning platform
Vapi
Provider-agnostic voice AI orchestration platform. Build, test, and deploy advanced voice agents across 14+ STT/TTS/LLM providers.
Retell AI
Enterprise voice agent platform with sub-600ms response times. Build, deploy, and manage human-sounding voice agents at scale.
Bland AI
High-volume outbound voice agents with built-in telephony infrastructure. Automate calls without sacrificing quality.
Inworld TTS
#1-ranked text-to-speech with human-like expression and sub-200ms realtime latency. Voice cloning and multilingual.
Mistral Voxtral
Mistral's open-weight frontier speech-understanding models (24B and 3B). Apache 2.0 licensed; long-context, multilingual, function-calling.
Deepgram Voice Agent
Unified conversational voice AI API combining STT, LLM orchestration, and TTS at $4.50/hr. 54.2% lower WER on noisy audio.
15 tools in this category