Home › Audio Models › CSM (Conversational Speech Model)

CSM (Conversational Speech Model) - AI Audio Models Tool

Overview

CSM (Conversational Speech Model) is a conversational speech generation model from SesameAILabs. It generates RVQ audio codes from text and audio inputs using a Llama backbone and a specialized audio decoder to produce Mimi audio codes for interactive conversational speech synthesis.

Key Features

Generates RVQ audio codes from text and audio inputs
Uses a Llama backbone for language processing
Specialized audio decoder produces Mimi audio codes
Designed for interactive conversational speech synthesis
Source code and artifacts hosted on GitHub

Ideal Use Cases

Build conversational voice agents
Synthesize speech from combined text and audio prompts
Research and prototype speech generation techniques
Integrate conversational speech into applications and services

Getting Started

Visit the project's GitHub repository
Read the repository README and documentation
Clone the repository to your environment
Install the project's dependencies
Prepare text and optional audio prompts
Run the model to generate RVQ audio codes
Use the audio decoder to obtain Mimi audio codes

Key Information

Category: Audio Models
Type: AI Audio Models Tool

Visit Official Website

CSM (Conversational Speech Model) - AI Audio Models Tool

Overview

Key Features

Ideal Use Cases

Getting Started

Key Information

Related Tools

OpenVoice

WhisperX

Parler-TTS

SpeechBrain

Whisper Large

Retrieval-based Voice Conversion WebUI