feat: Integrate IndexTTS2 model and update related schemas and frontend components

2026-03-12 13:30:53 +08:00
parent e5b5a16364
commit 8aec4f6f44
151 changed files with 40077 additions and 85 deletions
--- a/README.md
+++ b/README.md
@@ -13,7 +13,8 @@
 - Custom Voice: Predefined speaker voices
 - Voice Design: Create voices from natural language descriptions
 - Voice Cloning: Clone voices from uploaded audio
- Audiobook Generation: Upload EPUB files and generate multi-character audiobooks with LLM-powered character extraction and voice assignment
+- **IndexTTS2**: High-quality voice cloning with emotion control (happy, angry, sad, fear, surprise, etc.) powered by [IndexTTS2](https://github.com/iszhanjiawei/indexTTS2)
+- Audiobook Generation: Upload EPUB files and generate multi-character audiobooks with LLM-powered character extraction and voice assignment; supports IndexTTS2 per character
 - Dual Backend Support: Switch between local model and Aliyun TTS API
 - Multi-language Support: English, 简体中文, 繁體中文, 日本語, 한국어
 - JWT auth, async tasks, voice cache, dark mode
@@ -148,6 +149,25 @@ hf download Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-0.
 hf download Qwen/Qwen3-TTS-12Hz-0.6B-Base --local-dir ./Qwen3-TTS-12Hz-0.6B-Base
 ```

+**IndexTTS2 Model (optional, for emotion-controlled voice cloning)**
+
+IndexTTS2 is an optional feature. Only download these files if you want to use it. Navigate to the same `Qwen/` directory and run:
+
+```bash
+# Only the required files — no need to download the full repository
+hf download IndexTeam/IndexTTS-2 \
+  bpe.model config.yaml feat1.pt feat2.pt gpt.pth s2mel.pth wav2vec2bert_stats.pt \
+  --local-dir ./IndexTTS2
+```
+
+Then install the indextts package:
+```bash
+git clone https://github.com/iszhanjiawei/indexTTS2.git
+cd indexTTS2
+pip install -e . --no-deps
+cd ..
+```
+
 **Final directory structure:**

 Docker deployment (`docker/models/`):
@@ -169,7 +189,15 @@ Qwen3-TTS-webUI/
        ├── Qwen3-TTS-Tokenizer-12Hz/
        ├── Qwen3-TTS-12Hz-1.7B-CustomVoice/
        ├── Qwen3-TTS-12Hz-1.7B-VoiceDesign/
-        └── Qwen3-TTS-12Hz-1.7B-Base/
+        ├── Qwen3-TTS-12Hz-1.7B-Base/
+        └── IndexTTS2/          ← optional, for IndexTTS2 feature
+            ├── bpe.model
+            ├── config.yaml
+            ├── feat1.pt
+            ├── feat2.pt
+            ├── gpt.pth
+            ├── s2mel.pth
+            └── wav2vec2bert_stats.pt
 ```

 ### 3. Backend Setup