DragonianVoice is a C++ inference library that unifies multiple speech synthesis, voice conversion, and singing voice synthesis models under a single, high-performance ONNX-based framework. It focuses on being a reusable native library rather than a full UI product, with bindings for C, C++, and C# so it can be embedded into other applications or engines. The project supports a wide range of model families: TTS models such as Tacotron2, VITS, EmotionalVITS, BERTVits2, GPT-SoVITS, SVC systems like SoVitsSvc (v2/v3/v4), RVC, DiffSvc, DiffusionSvc, FishDiffusion, ReflowSvc, and even singing systems like DiffSinger and related pitch/feature extractors like FCPE and RMVPE. It uses ONNX Runtime and other backends to accelerate inference, with notes on how different execution providers such as CUDA or DirectML affect operator support and numerical stability. Recent versions integrate with fish-speech via a dedicated fish-speech.cpp subproject using ggml.
Features
- C++ ONNX-based inference library supporting many TTS, SVC, and SVS model families in one place
- Bindings and templates for C, C++, and C# integration into external applications and engines
- Supports models like Tacotron2, VITS, EmotionalVITS, BERTVits2, GPT-SoVITS, SoVitsSvc, RVC, DiffusionSvc, FishDiffusion, ReflowSvc, and DiffSinger
- Integrated fish-speech.cpp subproject using ggml for efficient speech synthesis and conversion workflows
- Detailed guidance on execution providers (CUDA, DirectML, etc.) and pitfalls like unsupported operators or diffusion step limits
- Clear user agreement and FAQs covering licensing, non-commercial constraints, and best practices for safe, legal usage