diff --git a/README.md b/README.md index bfd64df..dadf9a5 100644 --- a/README.md +++ b/README.md @@ -1,6 +1,8 @@ # Qwen3-TTS WebUI -A text-to-speech web application based on Qwen3-TTS, supporting custom voice, voice design, and voice cloning. +**Unofficial** text-to-speech web application based on Qwen3-TTS, supporting custom voice, voice design, and voice cloning with an intuitive interface. + +> This is an unofficial project. For the official Qwen3-TTS repository, please visit [QwenLM/Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS). [中文文档](./README.zh.md) @@ -48,41 +50,173 @@ A text-to-speech web application based on Qwen3-TTS, supporting custom voice, vo ## Tech Stack -Backend: FastAPI + SQLAlchemy + PyTorch + JWT -Frontend: React 19 + TypeScript + Vite + Tailwind + Shadcn/ui +**Backend**: FastAPI + SQLAlchemy + PyTorch + JWT +- Direct PyTorch inference with Qwen3-TTS models +- Async task processing with batch optimization +- Local model support + Aliyun API integration -## Quick Start +**Frontend**: React 19 + TypeScript + Vite + Tailwind + Shadcn/ui -### Backend +## Installation + +### Prerequisites + +- Python 3.9+ with CUDA support (for local model inference) +- Node.js 18+ (for frontend) +- Git + +### 1. Clone Repository + +```bash +git clone https://github.com/yourusername/Qwen3-TTS-webUI.git +cd Qwen3-TTS-webUI +``` + +### 2. Download Models + +**Important**: Models are **NOT** automatically downloaded. You need to manually download them first. + +For more details, visit the official repository: [Qwen3-TTS Models](https://github.com/QwenLM/Qwen3-TTS) + +Navigate to the backend directory: +```bash +cd qwen3-tts-backend +mkdir -p Qwen && cd Qwen +``` + +**Option 1: Download through ModelScope (Recommended for users in Mainland China)** + +```bash +pip install -U modelscope + +modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz --local_dir ./Qwen3-TTS-Tokenizer-12Hz +modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice +modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local_dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign +modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./Qwen3-TTS-12Hz-1.7B-Base +``` + +Optional 0.6B models (smaller, faster): +```bash +modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local_dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice +modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-Base --local_dir ./Qwen3-TTS-12Hz-0.6B-Base +``` + +**Option 2: Download through Hugging Face** + +```bash +pip install -U "huggingface_hub[cli]" + +huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz --local-dir ./Qwen3-TTS-Tokenizer-12Hz +huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice +huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local-dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign +huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base --local-dir ./Qwen3-TTS-12Hz-1.7B-Base +``` + +Optional 0.6B models (smaller, faster): +```bash +huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice +huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-Base --local-dir ./Qwen3-TTS-12Hz-0.6B-Base +``` + +**Final directory structure:** +``` +Qwen3-TTS-webUI/ +├── qwen3-tts-backend/ +│ └── Qwen/ +│ ├── Qwen3-TTS-Tokenizer-12Hz/ +│ ├── Qwen3-TTS-12Hz-1.7B-CustomVoice/ +│ ├── Qwen3-TTS-12Hz-1.7B-VoiceDesign/ +│ └── Qwen3-TTS-12Hz-1.7B-Base/ +``` + +### 3. Backend Setup ```bash cd qwen3-tts-backend + +# Create virtual environment python -m venv venv -source venv/bin/activate +source venv/bin/activate # On Windows: venv\Scripts\activate + +# Install dependencies pip install -r requirements.txt + +# Install Qwen3-TTS +pip install qwen-tts + +# Create configuration file cp .env.example .env -# Edit .env to configure MODEL_BASE_PATH and DEFAULT_BACKEND -# For local model: Ensure MODEL_BASE_PATH points to Qwen model directory -# For Aliyun: Set DEFAULT_BACKEND=aliyun and configure API key in web settings -uvicorn main:app --host 0.0.0.0 --port 8000 --reload + +# Edit .env file +# For local model: Set MODEL_BASE_PATH=./Qwen +# For Aliyun API only: Set DEFAULT_BACKEND=aliyun +nano .env # or use your preferred editor ``` -### Frontend +**Important Backend Configuration** (`.env`): +```env +MODEL_DEVICE=cuda:0 # Use GPU (or cpu for CPU-only) +MODEL_BASE_PATH=./Qwen # Path to your downloaded models +DEFAULT_BACKEND=local # Use 'local' for local models, 'aliyun' for API +DATABASE_URL=sqlite:///./qwen_tts.db +SECRET_KEY=your-secret-key-here # Change this! +``` + +Start the backend server: +```bash +# Using uvicorn directly +uvicorn main:app --host 0.0.0.0 --port 8000 --reload + +# Or using conda (if you prefer) +conda run -n qwen3-tts uvicorn main:app --host 0.0.0.0 --port 8000 --reload +``` + +Verify backend is running: +```bash +curl http://127.0.0.1:8000/health +``` + +### 4. Frontend Setup ```bash cd qwen3-tts-frontend + +# Install dependencies npm install + +# Create configuration file cp .env.example .env -# Edit .env to configure VITE_API_URL + +# Edit .env to set backend URL +echo "VITE_API_URL=http://localhost:8000" > .env + +# Start development server npm run dev ``` -Visit `http://localhost:5173` +### 5. Access the Application -**First Time Setup**: On first run, a default superuser account will be automatically created: +Open your browser and visit: `http://localhost:5173` + +**Default Credentials**: - Username: `admin` - Password: `admin123456` -- **IMPORTANT**: Please change the password immediately after first login for security! +- **IMPORTANT**: Change the password immediately after first login! + +### Production Build + +For production deployment: + +```bash +# Backend: Use gunicorn or similar WSGI server +cd qwen3-tts-backend +gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000 + +# Frontend: Build static files +cd qwen3-tts-frontend +npm run build +# Serve the 'dist' folder with nginx or another web server +``` ## Configuration @@ -164,6 +298,10 @@ All TTS endpoints support an optional `backend` parameter to specify the TTS bac - `backend: "aliyun"` - Use Aliyun TTS API - If not specified, uses the user's default backend setting +## Acknowledgments + +This project is built upon the excellent work of the official [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) repository by the Qwen Team at Alibaba Cloud. Special thanks to the Qwen Team for open-sourcing such a powerful text-to-speech model. + ## License Apache-2.0 license diff --git a/README.zh.md b/README.zh.md index 7868575..ee19bf5 100644 --- a/README.zh.md +++ b/README.zh.md @@ -1,6 +1,8 @@ # Qwen3-TTS WebUI -基于 Qwen3-TTS 的文本转语音 Web 应用,支持自定义语音、语音设计和语音克隆。 +**非官方** 基于 Qwen3-TTS 的文本转语音 Web 应用,支持自定义语音、语音设计和语音克隆,提供直观的 Web 界面。 + +> 这是一个非官方项目。如需查看官方 Qwen3-TTS 仓库,请访问 [QwenLM/Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS)。 [English Documentation](./README.md) @@ -48,41 +50,173 @@ ## 技术栈 -后端:FastAPI + SQLAlchemy + PyTorch + JWT -前端:React 19 + TypeScript + Vite + Tailwind + Shadcn/ui +**后端**: FastAPI + SQLAlchemy + PyTorch + JWT +- 使用 PyTorch 直接推理 Qwen3-TTS 模型 +- 异步任务处理与批量优化 +- 支持本地模型 + 阿里云 API 双后端 -## 快速开始 +**前端**: React 19 + TypeScript + Vite + Tailwind + Shadcn/ui -### 后端 +## 安装部署 + +### 环境要求 + +- Python 3.9+ 并支持 CUDA(用于本地模型推理) +- Node.js 18+(用于前端) +- Git + +### 1. 克隆仓库 + +```bash +git clone https://github.com/yourusername/Qwen3-TTS-webUI.git +cd Qwen3-TTS-webUI +``` + +### 2. 下载模型 + +**重要**: 模型**不会**自动下载,需要手动下载。 + +详细信息请访问官方仓库:[Qwen3-TTS 模型](https://github.com/QwenLM/Qwen3-TTS) + +进入后端目录: +```bash +cd qwen3-tts-backend +mkdir -p Qwen && cd Qwen +``` + +**方式一:通过 ModelScope 下载(推荐中国大陆用户)** + +```bash +pip install -U modelscope + +modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz --local_dir ./Qwen3-TTS-Tokenizer-12Hz +modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice +modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local_dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign +modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./Qwen3-TTS-12Hz-1.7B-Base +``` + +可选的 0.6B 模型(更小、更快): +```bash +modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local_dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice +modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-Base --local_dir ./Qwen3-TTS-12Hz-0.6B-Base +``` + +**方式二:通过 Hugging Face 下载** + +```bash +pip install -U "huggingface_hub[cli]" + +huggingface-cli download Qwen/Qwen3-TTS-Tokenizer-12Hz --local-dir ./Qwen3-TTS-Tokenizer-12Hz +huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice +huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local-dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign +huggingface-cli download Qwen/Qwen3-TTS-12Hz-1.7B-Base --local-dir ./Qwen3-TTS-12Hz-1.7B-Base +``` + +可选的 0.6B 模型(更小、更快): +```bash +huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice +huggingface-cli download Qwen/Qwen3-TTS-12Hz-0.6B-Base --local-dir ./Qwen3-TTS-12Hz-0.6B-Base +``` + +**最终目录结构:** +``` +Qwen3-TTS-webUI/ +├── qwen3-tts-backend/ +│ └── Qwen/ +│ ├── Qwen3-TTS-Tokenizer-12Hz/ +│ ├── Qwen3-TTS-12Hz-1.7B-CustomVoice/ +│ ├── Qwen3-TTS-12Hz-1.7B-VoiceDesign/ +│ └── Qwen3-TTS-12Hz-1.7B-Base/ +``` + +### 3. 后端配置 ```bash cd qwen3-tts-backend + +# 创建虚拟环境 python -m venv venv -source venv/bin/activate +source venv/bin/activate # Windows: venv\Scripts\activate + +# 安装依赖 pip install -r requirements.txt + +# 安装 Qwen3-TTS +pip install qwen-tts + +# 创建配置文件 cp .env.example .env -# 编辑 .env 配置 MODEL_BASE_PATH 和 DEFAULT_BACKEND -# 本地模型:确保 MODEL_BASE_PATH 指向 Qwen 模型目录 -# 阿里云:设置 DEFAULT_BACKEND=aliyun 并在 Web 设置页面配置 API 密钥 -uvicorn main:app --host 0.0.0.0 --port 8000 --reload + +# 编辑配置文件 +# 本地模型:设置 MODEL_BASE_PATH=./Qwen +# 仅阿里云 API:设置 DEFAULT_BACKEND=aliyun +nano .env # 或使用其他编辑器 ``` -### 前端 +**重要的后端配置** (`.env` 文件): +```env +MODEL_DEVICE=cuda:0 # 使用 GPU(或 cpu 使用 CPU) +MODEL_BASE_PATH=./Qwen # 已下载模型的路径 +DEFAULT_BACKEND=local # 使用本地模型用 'local',API 用 'aliyun' +DATABASE_URL=sqlite:///./qwen_tts.db +SECRET_KEY=your-secret-key-here # 请修改此项! +``` + +启动后端服务: +```bash +# 使用 uvicorn 直接启动 +uvicorn main:app --host 0.0.0.0 --port 8000 --reload + +# 或使用 conda(如果你喜欢) +conda run -n qwen3-tts uvicorn main:app --host 0.0.0.0 --port 8000 --reload +``` + +验证后端是否运行: +```bash +curl http://127.0.0.1:8000/health +``` + +### 4. 前端配置 ```bash cd qwen3-tts-frontend + +# 安装依赖 npm install + +# 创建配置文件 cp .env.example .env -# 编辑 .env 配置 VITE_API_URL + +# 编辑 .env 设置后端地址 +echo "VITE_API_URL=http://localhost:8000" > .env + +# 启动开发服务器 npm run dev ``` -访问 `http://localhost:5173` +### 5. 访问应用 -**首次运行**: 第一次运行时会自动初始化一个超级管理员账户: +在浏览器中打开:`http://localhost:5173` + +**默认账号**: - 用户名:`admin` - 密码:`admin123456` -- **重要**: 强烈建议登录后立刻修改密码! +- **重要**: 登录后请立即修改密码! + +### 生产环境部署 + +用于生产环境: + +```bash +# 后端:使用 gunicorn 或类似的 WSGI 服务器 +cd qwen3-tts-backend +gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000 + +# 前端:构建静态文件 +cd qwen3-tts-frontend +npm run build +# 使用 nginx 或其他 Web 服务器提供 'dist' 文件夹 +``` ## 配置 @@ -164,6 +298,10 @@ GET /jobs/{id}/download - 下载结果 - `backend: "aliyun"` - 使用阿里云 TTS API - 如果不指定,则使用用户的默认后端设置 +## 特别鸣谢 + +本项目基于阿里云 Qwen 团队开源的 [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) 官方仓库构建。特别感谢 Qwen 团队开源如此强大的文本转语音模型。 + ## 许可证 Apache-2.0 license