The CSM (Conversational Speech Model) is a speech generation model developed by Sesame AI that creates RVQ audio codes from text and audio inputs. It uses a Llama backbone and a smaller audio decoder to produce audio codes for realistic speech synthesis. The model has been fine-tuned for interactive voice demos and is hosted on platforms like Hugging Face for testing. CSM offers a flexible setup and is compatible with CUDA-enabled GPUs for efficient execution.
Features
- Generates high-quality speech from text and audio inputs.
- Uses a Llama backbone with an optimized audio decoder.
- Fine-tuned for interactive voice applications.
- Hosted models available for easy access and testing.
- Compatible with CUDA-enabled GPUs for fast performance.
- Easy to integrate and test using example scripts.
- Requires Python 3.10 and certain audio processing tools like ffmpeg.
- Customizable for various conversational contexts.
- Available under an Apache-2.0 license for open-source usage.
License
Apache License V2.0Follow CSM (Conversational Speech Model)
You Might Also Like
Gen AI apps are built with MongoDB Atlas
MongoDB Atlas is the developer-friendly database used to build, scale, and run gen AI and LLM-powered apps—without needing a separate vector database. Atlas offers built-in vector search, global availability across 115+ regions, and flexible document modeling. Start building AI apps faster, all in one place.
Rate This Project
Login To Rate This Project
User Reviews
Be the first to post a review of CSM (Conversational Speech Model)!