feat: Integrate IndexTTS2 model and update related schemas and frontend components
This commit is contained in:
32
README.md
32
README.md
@@ -13,7 +13,8 @@
|
||||
- Custom Voice: Predefined speaker voices
|
||||
- Voice Design: Create voices from natural language descriptions
|
||||
- Voice Cloning: Clone voices from uploaded audio
|
||||
- Audiobook Generation: Upload EPUB files and generate multi-character audiobooks with LLM-powered character extraction and voice assignment
|
||||
- **IndexTTS2**: High-quality voice cloning with emotion control (happy, angry, sad, fear, surprise, etc.) powered by [IndexTTS2](https://github.com/iszhanjiawei/indexTTS2)
|
||||
- Audiobook Generation: Upload EPUB files and generate multi-character audiobooks with LLM-powered character extraction and voice assignment; supports IndexTTS2 per character
|
||||
- Dual Backend Support: Switch between local model and Aliyun TTS API
|
||||
- Multi-language Support: English, 简体中文, 繁體中文, 日本語, 한국어
|
||||
- JWT auth, async tasks, voice cache, dark mode
|
||||
@@ -148,6 +149,25 @@ hf download Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-0.
|
||||
hf download Qwen/Qwen3-TTS-12Hz-0.6B-Base --local-dir ./Qwen3-TTS-12Hz-0.6B-Base
|
||||
```
|
||||
|
||||
**IndexTTS2 Model (optional, for emotion-controlled voice cloning)**
|
||||
|
||||
IndexTTS2 is an optional feature. Only download these files if you want to use it. Navigate to the same `Qwen/` directory and run:
|
||||
|
||||
```bash
|
||||
# Only the required files — no need to download the full repository
|
||||
hf download IndexTeam/IndexTTS-2 \
|
||||
bpe.model config.yaml feat1.pt feat2.pt gpt.pth s2mel.pth wav2vec2bert_stats.pt \
|
||||
--local-dir ./IndexTTS2
|
||||
```
|
||||
|
||||
Then install the indextts package:
|
||||
```bash
|
||||
git clone https://github.com/iszhanjiawei/indexTTS2.git
|
||||
cd indexTTS2
|
||||
pip install -e . --no-deps
|
||||
cd ..
|
||||
```
|
||||
|
||||
**Final directory structure:**
|
||||
|
||||
Docker deployment (`docker/models/`):
|
||||
@@ -169,7 +189,15 @@ Qwen3-TTS-webUI/
|
||||
├── Qwen3-TTS-Tokenizer-12Hz/
|
||||
├── Qwen3-TTS-12Hz-1.7B-CustomVoice/
|
||||
├── Qwen3-TTS-12Hz-1.7B-VoiceDesign/
|
||||
└── Qwen3-TTS-12Hz-1.7B-Base/
|
||||
├── Qwen3-TTS-12Hz-1.7B-Base/
|
||||
└── IndexTTS2/ ← optional, for IndexTTS2 feature
|
||||
├── bpe.model
|
||||
├── config.yaml
|
||||
├── feat1.pt
|
||||
├── feat2.pt
|
||||
├── gpt.pth
|
||||
├── s2mel.pth
|
||||
└── wav2vec2bert_stats.pt
|
||||
```
|
||||
|
||||
### 3. Backend Setup
|
||||
|
||||
Reference in New Issue
Block a user