Compare commits
11 Commits
d12c1223f9
...
main
| Author | SHA1 | Date | |
|---|---|---|---|
| 1ab7bdef1c | |||
| 6d93025453 | |||
| 60489eab59 | |||
| 2fa9c1fcb6 | |||
| 777a7ec006 | |||
| a144540cbe | |||
| a8d6195cdb | |||
| b395cb0b98 | |||
| 2662b494c5 | |||
| 96b2eaf774 | |||
| d170ba3362 |
16
.gitignore
vendored
16
.gitignore
vendored
@@ -26,16 +26,16 @@ checkpoints/
|
|||||||
docker/models/
|
docker/models/
|
||||||
docker/data/
|
docker/data/
|
||||||
docker/.env
|
docker/.env
|
||||||
qwen3-tts-frontend/node_modules/
|
frontend/node_modules/
|
||||||
qwen3-tts-frontend/dist/
|
frontend/dist/
|
||||||
qwen3-tts-frontend/.env
|
frontend/.env
|
||||||
qwen3-tts-frontend/.env.local
|
frontend/.env.local
|
||||||
CLAUDE.md
|
CLAUDE.md
|
||||||
样本.mp3
|
样本.mp3
|
||||||
aliyun.md
|
aliyun.md
|
||||||
/nginx.conf
|
/nginx.conf
|
||||||
deploy.md
|
deploy.md
|
||||||
qwen3-tts-backend/scripts
|
backend/scripts
|
||||||
qwen3-tts-backend/examples
|
backend/examples
|
||||||
qwen3-tts-backend/qwen3-tts.service
|
backend/canto.service
|
||||||
qwen3-tts-frontend/.env.production
|
frontend/.env.production
|
||||||
|
|||||||
348
README.zh.md
348
README.zh.md
@@ -1,348 +0,0 @@
|
|||||||
# Qwen3-TTS WebUI
|
|
||||||
|
|
||||||
> **⚠️ 注意:** 本项目由大量 AI 生成,目前处于不稳定状态。稳定版将在 [Releases](../../releases) 中发布。
|
|
||||||
|
|
||||||
**非官方** 基于 Qwen3-TTS 的文本转语音 Web 应用,支持自定义语音、语音设计和语音克隆,提供直观的 Web 界面。
|
|
||||||
|
|
||||||
> 这是一个非官方项目。如需查看官方 Qwen3-TTS 仓库,请访问 [QwenLM/Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS)。
|
|
||||||
|
|
||||||
[English Documentation](./README.md)
|
|
||||||
|
|
||||||
## 功能特性
|
|
||||||
|
|
||||||
- 自定义语音:预定义说话人语音
|
|
||||||
- 语音设计:自然语言描述创建语音
|
|
||||||
- 语音克隆:上传音频克隆语音
|
|
||||||
- **IndexTTS2**:高质量语音克隆,支持情感控制(高兴、愤怒、悲伤、恐惧、惊讶等),由 [IndexTTS2](https://github.com/iszhanjiawei/indexTTS2) 驱动
|
|
||||||
- 有声书生成:上传 EPUB 文件,通过 LLM 自动提取角色并分配语音,生成多角色有声书;支持为每个角色单独启用 IndexTTS2
|
|
||||||
- 双后端支持:支持本地模型和阿里云 TTS API 切换
|
|
||||||
- 多语言支持:English、简体中文、繁體中文、日本語、한국어
|
|
||||||
- JWT 认证、异步任务、语音缓存、暗黑模式
|
|
||||||
|
|
||||||
## 界面预览
|
|
||||||
|
|
||||||
### 桌面端 - 亮色模式
|
|
||||||

|
|
||||||
|
|
||||||
### 桌面端 - 暗黑模式
|
|
||||||

|
|
||||||
|
|
||||||
### 移动端
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<td width="50%"><img src="./images/mobile-lightmode-custom.png" alt="移动端亮色模式" /></td>
|
|
||||||
<td width="50%"><img src="./images/mobile-settings.png" alt="移动端设置" /></td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
### 有声书生成
|
|
||||||

|
|
||||||
|
|
||||||
<table>
|
|
||||||
<tr>
|
|
||||||
<td width="50%"><img src="./images/audiobook-characters.png" alt="有声书角色列表" /></td>
|
|
||||||
<td width="50%"><img src="./images/audiobook-chapters.png" alt="有声书章节列表" /></td>
|
|
||||||
</tr>
|
|
||||||
</table>
|
|
||||||
|
|
||||||
## 技术栈
|
|
||||||
|
|
||||||
**后端**: FastAPI + SQLAlchemy + PyTorch + JWT
|
|
||||||
- 使用 PyTorch 直接推理 Qwen3-TTS 模型
|
|
||||||
- 异步任务处理与批量优化
|
|
||||||
- 支持本地模型 + 阿里云 API 双后端
|
|
||||||
|
|
||||||
**前端**: React 19 + TypeScript + Vite + Tailwind + Shadcn/ui
|
|
||||||
|
|
||||||
## Docker 部署
|
|
||||||
|
|
||||||
预构建镜像已发布至 Docker Hub:[bdim404/qwen3-tts-backend](https://hub.docker.com/r/bdim404/qwen3-tts-backend)、[bdim404/qwen3-tts-frontend](https://hub.docker.com/r/bdim404/qwen3-tts-frontend)
|
|
||||||
|
|
||||||
**前置要求**:Docker、Docker Compose、NVIDIA GPU + [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/bdim404/Qwen3-TTS-WebUI.git
|
|
||||||
cd Qwen3-TTS-webUI
|
|
||||||
|
|
||||||
# 下载模型到 docker/models/(参见下方"安装部署 > 下载模型")
|
|
||||||
mkdir -p docker/models docker/data
|
|
||||||
|
|
||||||
# 配置
|
|
||||||
cp docker/.env.example docker/.env
|
|
||||||
# 编辑 docker/.env,设置 SECRET_KEY
|
|
||||||
|
|
||||||
cd docker
|
|
||||||
|
|
||||||
# 拉取预构建镜像
|
|
||||||
docker compose pull
|
|
||||||
|
|
||||||
# 启动(仅 CPU)
|
|
||||||
docker compose up -d
|
|
||||||
|
|
||||||
# 启动(GPU 加速)
|
|
||||||
docker compose -f docker-compose.yml -f docker-compose.gpu.yml up -d
|
|
||||||
```
|
|
||||||
|
|
||||||
访问 `http://localhost`,默认账号:`admin` / `admin123456`
|
|
||||||
|
|
||||||
## 安装部署
|
|
||||||
|
|
||||||
### 环境要求
|
|
||||||
|
|
||||||
- Python 3.9+ 并支持 CUDA(用于本地模型推理)
|
|
||||||
- Node.js 18+(用于前端)
|
|
||||||
- Git
|
|
||||||
|
|
||||||
### 1. 克隆仓库
|
|
||||||
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/bdim404/Qwen3-TTS-WebUI.git
|
|
||||||
cd Qwen3-TTS-webUI
|
|
||||||
```
|
|
||||||
|
|
||||||
### 2. 下载模型
|
|
||||||
|
|
||||||
**重要**: 模型**不会**自动下载,需要手动下载。
|
|
||||||
|
|
||||||
详细信息请访问官方仓库:[Qwen3-TTS 模型](https://github.com/QwenLM/Qwen3-TTS)
|
|
||||||
|
|
||||||
进入模型目录:
|
|
||||||
```bash
|
|
||||||
# Docker 部署
|
|
||||||
mkdir -p docker/models && cd docker/models
|
|
||||||
|
|
||||||
# 本地部署
|
|
||||||
cd qwen3-tts-backend && mkdir -p Qwen && cd Qwen
|
|
||||||
```
|
|
||||||
|
|
||||||
**方式一:通过 ModelScope 下载(推荐中国大陆用户)**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pip install -U modelscope
|
|
||||||
|
|
||||||
modelscope download --model Qwen/Qwen3-TTS-Tokenizer-12Hz --local_dir ./Qwen3-TTS-Tokenizer-12Hz
|
|
||||||
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local_dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice
|
|
||||||
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local_dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign
|
|
||||||
modelscope download --model Qwen/Qwen3-TTS-12Hz-1.7B-Base --local_dir ./Qwen3-TTS-12Hz-1.7B-Base
|
|
||||||
```
|
|
||||||
|
|
||||||
可选的 0.6B 模型(更小、更快):
|
|
||||||
```bash
|
|
||||||
modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local_dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice
|
|
||||||
modelscope download --model Qwen/Qwen3-TTS-12Hz-0.6B-Base --local_dir ./Qwen3-TTS-12Hz-0.6B-Base
|
|
||||||
```
|
|
||||||
|
|
||||||
**方式二:通过 Hugging Face 下载**
|
|
||||||
|
|
||||||
```bash
|
|
||||||
pip install -U "huggingface_hub[cli]"
|
|
||||||
|
|
||||||
hf download Qwen/Qwen3-TTS-Tokenizer-12Hz --local-dir ./Qwen3-TTS-Tokenizer-12Hz
|
|
||||||
hf download Qwen/Qwen3-TTS-12Hz-1.7B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-1.7B-CustomVoice
|
|
||||||
hf download Qwen/Qwen3-TTS-12Hz-1.7B-VoiceDesign --local-dir ./Qwen3-TTS-12Hz-1.7B-VoiceDesign
|
|
||||||
hf download Qwen/Qwen3-TTS-12Hz-1.7B-Base --local-dir ./Qwen3-TTS-12Hz-1.7B-Base
|
|
||||||
```
|
|
||||||
|
|
||||||
可选的 0.6B 模型(更小、更快):
|
|
||||||
```bash
|
|
||||||
hf download Qwen/Qwen3-TTS-12Hz-0.6B-CustomVoice --local-dir ./Qwen3-TTS-12Hz-0.6B-CustomVoice
|
|
||||||
hf download Qwen/Qwen3-TTS-12Hz-0.6B-Base --local-dir ./Qwen3-TTS-12Hz-0.6B-Base
|
|
||||||
```
|
|
||||||
|
|
||||||
**IndexTTS2 模型(可选,用于情感控制语音克隆)**
|
|
||||||
|
|
||||||
IndexTTS2 是可选功能。如需使用,在同一 `Qwen/` 目录下运行:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 仅下载所需文件,无需下载完整仓库
|
|
||||||
hf download IndexTeam/IndexTTS-2 \
|
|
||||||
bpe.model config.yaml feat1.pt feat2.pt gpt.pth s2mel.pth wav2vec2bert_stats.pt \
|
|
||||||
--local-dir ./IndexTTS2
|
|
||||||
```
|
|
||||||
|
|
||||||
然后安装 indextts 包:
|
|
||||||
```bash
|
|
||||||
git clone https://github.com/iszhanjiawei/indexTTS2.git
|
|
||||||
cd indexTTS2
|
|
||||||
pip install -e . --no-deps
|
|
||||||
cd ..
|
|
||||||
```
|
|
||||||
|
|
||||||
**最终目录结构:**
|
|
||||||
|
|
||||||
Docker 部署(`docker/models/`):
|
|
||||||
```
|
|
||||||
Qwen3-TTS-webUI/
|
|
||||||
└── docker/
|
|
||||||
└── models/
|
|
||||||
├── Qwen3-TTS-Tokenizer-12Hz/
|
|
||||||
├── Qwen3-TTS-12Hz-1.7B-CustomVoice/
|
|
||||||
├── Qwen3-TTS-12Hz-1.7B-VoiceDesign/
|
|
||||||
└── Qwen3-TTS-12Hz-1.7B-Base/
|
|
||||||
```
|
|
||||||
|
|
||||||
本地部署(`qwen3-tts-backend/Qwen/`):
|
|
||||||
```
|
|
||||||
Qwen3-TTS-webUI/
|
|
||||||
└── qwen3-tts-backend/
|
|
||||||
└── Qwen/
|
|
||||||
├── Qwen3-TTS-Tokenizer-12Hz/
|
|
||||||
├── Qwen3-TTS-12Hz-1.7B-CustomVoice/
|
|
||||||
├── Qwen3-TTS-12Hz-1.7B-VoiceDesign/
|
|
||||||
├── Qwen3-TTS-12Hz-1.7B-Base/
|
|
||||||
└── IndexTTS2/ ← 可选,用于 IndexTTS2 功能
|
|
||||||
├── bpe.model
|
|
||||||
├── config.yaml
|
|
||||||
├── feat1.pt
|
|
||||||
├── feat2.pt
|
|
||||||
├── gpt.pth
|
|
||||||
├── s2mel.pth
|
|
||||||
└── wav2vec2bert_stats.pt
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3. 后端配置
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd qwen3-tts-backend
|
|
||||||
|
|
||||||
# 创建虚拟环境
|
|
||||||
python -m venv venv
|
|
||||||
source venv/bin/activate # Windows: venv\Scripts\activate
|
|
||||||
|
|
||||||
# 安装依赖
|
|
||||||
pip install -r requirements.txt
|
|
||||||
|
|
||||||
# 安装 Qwen3-TTS
|
|
||||||
pip install qwen-tts
|
|
||||||
|
|
||||||
# 创建配置文件
|
|
||||||
cp .env.example .env
|
|
||||||
|
|
||||||
# 编辑配置文件
|
|
||||||
# 本地模型:设置 MODEL_BASE_PATH=./Qwen
|
|
||||||
# 仅阿里云 API:设置 DEFAULT_BACKEND=aliyun
|
|
||||||
nano .env # 或使用其他编辑器
|
|
||||||
```
|
|
||||||
|
|
||||||
**重要的后端配置** (`.env` 文件):
|
|
||||||
```env
|
|
||||||
MODEL_DEVICE=cuda:0 # 使用 GPU(或 cpu 使用 CPU)
|
|
||||||
MODEL_BASE_PATH=./Qwen # 已下载模型的路径
|
|
||||||
DEFAULT_BACKEND=local # 使用本地模型用 'local',API 用 'aliyun'
|
|
||||||
DATABASE_URL=sqlite:///./qwen_tts.db
|
|
||||||
SECRET_KEY=your-secret-key-here # 请修改此项!
|
|
||||||
```
|
|
||||||
|
|
||||||
启动后端服务:
|
|
||||||
```bash
|
|
||||||
# 使用 uvicorn 直接启动
|
|
||||||
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
|
|
||||||
|
|
||||||
# 或使用 conda(如果你喜欢)
|
|
||||||
conda run -n qwen3-tts uvicorn main:app --host 0.0.0.0 --port 8000 --reload
|
|
||||||
```
|
|
||||||
|
|
||||||
验证后端是否运行:
|
|
||||||
```bash
|
|
||||||
curl http://127.0.0.1:8000/health
|
|
||||||
```
|
|
||||||
|
|
||||||
### 4. 前端配置
|
|
||||||
|
|
||||||
```bash
|
|
||||||
cd qwen3-tts-frontend
|
|
||||||
|
|
||||||
# 安装依赖
|
|
||||||
npm install
|
|
||||||
|
|
||||||
# 创建配置文件
|
|
||||||
cp .env.example .env
|
|
||||||
|
|
||||||
# 启动开发服务器
|
|
||||||
npm run dev
|
|
||||||
```
|
|
||||||
|
|
||||||
### 5. 访问应用
|
|
||||||
|
|
||||||
在浏览器中打开:`http://localhost:5173`
|
|
||||||
|
|
||||||
**默认账号**:
|
|
||||||
- 用户名:`admin`
|
|
||||||
- 密码:`admin123456`
|
|
||||||
- **重要**: 登录后请立即修改密码!
|
|
||||||
|
|
||||||
### 生产环境部署
|
|
||||||
|
|
||||||
用于生产环境:
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# 后端:使用 gunicorn 或类似的 WSGI 服务器
|
|
||||||
cd qwen3-tts-backend
|
|
||||||
gunicorn main:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000
|
|
||||||
|
|
||||||
# 前端:构建静态文件
|
|
||||||
cd qwen3-tts-frontend
|
|
||||||
npm run build
|
|
||||||
# 使用 nginx 或其他 Web 服务器提供 'dist' 文件夹
|
|
||||||
```
|
|
||||||
|
|
||||||
## 配置
|
|
||||||
|
|
||||||
### 后端配置
|
|
||||||
|
|
||||||
后端 `.env` 关键配置:
|
|
||||||
|
|
||||||
```env
|
|
||||||
SECRET_KEY=your-secret-key
|
|
||||||
MODEL_DEVICE=cuda:0
|
|
||||||
MODEL_BASE_PATH=../Qwen
|
|
||||||
DATABASE_URL=sqlite:///./qwen_tts.db
|
|
||||||
|
|
||||||
DEFAULT_BACKEND=local
|
|
||||||
|
|
||||||
ALIYUN_REGION=beijing
|
|
||||||
ALIYUN_MODEL_FLASH=qwen3-tts-flash-realtime
|
|
||||||
ALIYUN_MODEL_VC=qwen3-tts-vc-realtime-2026-01-15
|
|
||||||
ALIYUN_MODEL_VD=qwen3-tts-vd-realtime-2026-01-15
|
|
||||||
```
|
|
||||||
|
|
||||||
**后端选项:**
|
|
||||||
|
|
||||||
- `DEFAULT_BACKEND`: 默认 TTS 后端,可选值:`local` 或 `aliyun`
|
|
||||||
- **本地模式**: 使用本地 Qwen3-TTS 模型(需要配置 `MODEL_BASE_PATH`)
|
|
||||||
- **阿里云模式**: 使用阿里云 TTS API(需要用户在设置页面配置 API 密钥)
|
|
||||||
|
|
||||||
**阿里云配置:**
|
|
||||||
|
|
||||||
- 用户需要在 Web 界面的设置页面添加阿里云 API 密钥
|
|
||||||
- API 密钥经过加密后安全存储在数据库中
|
|
||||||
- 超级管理员可以控制是否为所有用户启用本地模型
|
|
||||||
- 获取阿里云 API 密钥,请访问 [阿里云控制台](https://dashscope.console.aliyun.com/)
|
|
||||||
|
|
||||||
## 使用说明
|
|
||||||
|
|
||||||
### 切换后端
|
|
||||||
|
|
||||||
1. 登录 Web 界面
|
|
||||||
2. 进入设置页面
|
|
||||||
3. 配置您偏好的后端:
|
|
||||||
- **本地模型**:选择"本地模型"(需要超级管理员启用本地模型)
|
|
||||||
- **阿里云 API**:选择"阿里云"并添加您的 API 密钥
|
|
||||||
4. 选择的后端将默认用于所有 TTS 操作
|
|
||||||
5. 也可以通过 API 的 `backend` 参数为单次请求指定不同的后端
|
|
||||||
|
|
||||||
### 管理阿里云 API 密钥
|
|
||||||
|
|
||||||
1. 在设置页面找到"阿里云 API 密钥"部分
|
|
||||||
2. 输入您的阿里云 API 密钥
|
|
||||||
3. 点击"更新密钥"保存并验证
|
|
||||||
4. 系统会在保存前验证密钥的有效性
|
|
||||||
5. 可随时使用删除按钮删除密钥
|
|
||||||
|
|
||||||
## 特别鸣谢
|
|
||||||
|
|
||||||
本项目基于阿里云 Qwen 团队开源的 [Qwen3-TTS](https://github.com/QwenLM/Qwen3-TTS) 官方仓库构建。特别感谢 Qwen 团队开源如此强大的文本转语音模型。
|
|
||||||
|
|
||||||
## 许可证
|
|
||||||
|
|
||||||
Apache-2.0 license
|
|
||||||
@@ -69,7 +69,7 @@ def _char_to_response(c, db: Session) -> AudiobookCharacterResponse:
|
|||||||
if vd:
|
if vd:
|
||||||
vd_name = vd.name
|
vd_name = vd.name
|
||||||
meta = vd.meta_data or {}
|
meta = vd.meta_data or {}
|
||||||
vd_speaker = meta.get('speaker') or vd.aliyun_voice_id or vd.instruct or None
|
vd_speaker = meta.get('speaker') or vd.instruct or None
|
||||||
return AudiobookCharacterResponse(
|
return AudiobookCharacterResponse(
|
||||||
id=c.id,
|
id=c.id,
|
||||||
project_id=c.project_id,
|
project_id=c.project_id,
|
||||||
@@ -80,7 +80,7 @@ def _char_to_response(c, db: Session) -> AudiobookCharacterResponse:
|
|||||||
voice_design_id=c.voice_design_id,
|
voice_design_id=c.voice_design_id,
|
||||||
voice_design_name=vd_name,
|
voice_design_name=vd_name,
|
||||||
voice_design_speaker=vd_speaker,
|
voice_design_speaker=vd_speaker,
|
||||||
use_indextts2=c.use_indextts2 or False,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@@ -561,7 +561,7 @@ async def regenerate_character_preview_endpoint(
|
|||||||
from core.audiobook_service import generate_character_preview
|
from core.audiobook_service import generate_character_preview
|
||||||
|
|
||||||
try:
|
try:
|
||||||
await generate_character_preview(project_id, char_id, current_user, db)
|
await generate_character_preview(project_id, char_id, current_user, db, force_recreate=True)
|
||||||
return {"message": "Preview generated successfully"}
|
return {"message": "Preview generated successfully"}
|
||||||
except ValueError as e:
|
except ValueError as e:
|
||||||
raise HTTPException(status_code=400, detail=str(e))
|
raise HTTPException(status_code=400, detail=str(e))
|
||||||
@@ -740,14 +740,18 @@ async def update_character(
|
|||||||
description=data.description,
|
description=data.description,
|
||||||
instruct=data.instruct,
|
instruct=data.instruct,
|
||||||
voice_design_id=data.voice_design_id,
|
voice_design_id=data.voice_design_id,
|
||||||
use_indextts2=data.use_indextts2,
|
|
||||||
)
|
)
|
||||||
|
|
||||||
if data.instruct is not None and char.voice_design_id:
|
if (data.instruct is not None or data.gender is not None) and char.voice_design_id:
|
||||||
voice_design = crud.get_voice_design(db, char.voice_design_id, current_user.id)
|
voice_design = crud.get_voice_design(db, char.voice_design_id, current_user.id)
|
||||||
|
logger.info(f"update_character: char_id={char_id}, voice_design_id={char.voice_design_id}, found={voice_design is not None}")
|
||||||
if voice_design:
|
if voice_design:
|
||||||
voice_design.instruct = data.instruct
|
if data.instruct is not None:
|
||||||
|
voice_design.instruct = data.instruct
|
||||||
|
voice_design.voice_cache_id = None
|
||||||
db.commit()
|
db.commit()
|
||||||
|
logger.info(f"update_character: cleared voice_cache_id for design {voice_design.id}")
|
||||||
|
|
||||||
return _char_to_response(char, db)
|
return _char_to_response(char, db)
|
||||||
|
|
||||||
@@ -1,5 +1,5 @@
|
|||||||
from datetime import timedelta
|
from datetime import timedelta
|
||||||
from typing import Annotated
|
from typing import Annotated, Optional
|
||||||
from fastapi import APIRouter, Depends, HTTPException, status, Request
|
from fastapi import APIRouter, Depends, HTTPException, status, Request
|
||||||
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
|
from fastapi.security import OAuth2PasswordBearer, OAuth2PasswordRequestForm
|
||||||
from sqlalchemy.orm import Session
|
from sqlalchemy.orm import Session
|
||||||
@@ -14,26 +14,34 @@ from core.security import (
|
|||||||
decode_access_token
|
decode_access_token
|
||||||
)
|
)
|
||||||
from db.database import get_db
|
from db.database import get_db
|
||||||
from db.crud import get_user_by_username, get_user_by_email, create_user, change_user_password, get_user_preferences, update_user_preferences, can_user_use_local_model, can_user_use_nsfw, get_system_setting
|
from db.crud import get_user_by_username, get_user_by_email, create_user, change_user_password, get_user_preferences, update_user_preferences, can_user_use_nsfw, get_system_setting
|
||||||
from schemas.user import User, UserCreate, Token, PasswordChange, AliyunKeyVerifyResponse, UserPreferences, UserPreferencesResponse
|
from schemas.user import User, UserCreate, Token, PasswordChange, UserPreferences, UserPreferencesResponse
|
||||||
from schemas.audiobook import LLMConfigResponse
|
from schemas.audiobook import LLMConfigResponse
|
||||||
|
|
||||||
router = APIRouter(prefix="/auth", tags=["authentication"])
|
router = APIRouter(prefix="/auth", tags=["authentication"])
|
||||||
|
|
||||||
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/auth/token")
|
oauth2_scheme = OAuth2PasswordBearer(tokenUrl="/auth/token", auto_error=not settings.DEV_MODE)
|
||||||
|
|
||||||
limiter = Limiter(key_func=get_remote_address)
|
limiter = Limiter(key_func=get_remote_address)
|
||||||
|
|
||||||
async def get_current_user(
|
async def get_current_user(
|
||||||
token: Annotated[str, Depends(oauth2_scheme)],
|
token: Annotated[Optional[str], Depends(oauth2_scheme)],
|
||||||
db: Session = Depends(get_db)
|
db: Session = Depends(get_db)
|
||||||
) -> User:
|
) -> User:
|
||||||
|
if settings.DEV_MODE and not token:
|
||||||
|
user = get_user_by_username(db, username="admin")
|
||||||
|
if user:
|
||||||
|
return user
|
||||||
|
|
||||||
credentials_exception = HTTPException(
|
credentials_exception = HTTPException(
|
||||||
status_code=status.HTTP_401_UNAUTHORIZED,
|
status_code=status.HTTP_401_UNAUTHORIZED,
|
||||||
detail="Could not validate credentials",
|
detail="Could not validate credentials",
|
||||||
headers={"WWW-Authenticate": "Bearer"},
|
headers={"WWW-Authenticate": "Bearer"},
|
||||||
)
|
)
|
||||||
|
|
||||||
|
if token is None:
|
||||||
|
raise credentials_exception
|
||||||
|
|
||||||
username = decode_access_token(token)
|
username = decode_access_token(token)
|
||||||
if username is None:
|
if username is None:
|
||||||
raise credentials_exception
|
raise credentials_exception
|
||||||
@@ -99,6 +107,16 @@ async def login(
|
|||||||
|
|
||||||
return {"access_token": access_token, "token_type": "bearer"}
|
return {"access_token": access_token, "token_type": "bearer"}
|
||||||
|
|
||||||
|
@router.get("/dev-token", response_model=Token)
|
||||||
|
async def dev_token(db: Session = Depends(get_db)):
|
||||||
|
if not settings.DEV_MODE:
|
||||||
|
raise HTTPException(status_code=status.HTTP_403_FORBIDDEN, detail="Not available outside DEV_MODE")
|
||||||
|
user = get_user_by_username(db, username="admin")
|
||||||
|
if not user:
|
||||||
|
raise HTTPException(status_code=status.HTTP_404_NOT_FOUND, detail="Admin user not found")
|
||||||
|
access_token = create_access_token(data={"sub": user.username})
|
||||||
|
return {"access_token": access_token, "token_type": "bearer"}
|
||||||
|
|
||||||
@router.get("/me", response_model=User)
|
@router.get("/me", response_model=User)
|
||||||
@limiter.limit("30/minute")
|
@limiter.limit("30/minute")
|
||||||
async def get_current_user_info(
|
async def get_current_user_info(
|
||||||
@@ -137,31 +155,6 @@ async def change_password(
|
|||||||
|
|
||||||
return user
|
return user
|
||||||
|
|
||||||
@router.get("/aliyun-key/verify", response_model=AliyunKeyVerifyResponse)
|
|
||||||
@limiter.limit("10/minute")
|
|
||||||
async def verify_aliyun_key(
|
|
||||||
request: Request,
|
|
||||||
current_user: Annotated[User, Depends(get_current_user)],
|
|
||||||
db: Session = Depends(get_db)
|
|
||||||
):
|
|
||||||
from core.security import decrypt_api_key
|
|
||||||
from core.tts_service import AliyunTTSBackend
|
|
||||||
|
|
||||||
encrypted = get_system_setting(db, "aliyun_api_key")
|
|
||||||
if not encrypted:
|
|
||||||
return AliyunKeyVerifyResponse(valid=False, message="No Aliyun API key configured")
|
|
||||||
|
|
||||||
api_key = decrypt_api_key(encrypted)
|
|
||||||
if not api_key:
|
|
||||||
return AliyunKeyVerifyResponse(valid=False, message="Failed to decrypt API key")
|
|
||||||
|
|
||||||
aliyun_backend = AliyunTTSBackend(api_key=api_key, region=settings.ALIYUN_REGION)
|
|
||||||
health = await aliyun_backend.health_check()
|
|
||||||
|
|
||||||
if health.get("available", False):
|
|
||||||
return AliyunKeyVerifyResponse(valid=True, message="Aliyun API key is valid and working")
|
|
||||||
return AliyunKeyVerifyResponse(valid=False, message="Aliyun API key is not working.")
|
|
||||||
|
|
||||||
@router.get("/preferences", response_model=UserPreferencesResponse)
|
@router.get("/preferences", response_model=UserPreferencesResponse)
|
||||||
@limiter.limit("30/minute")
|
@limiter.limit("30/minute")
|
||||||
async def get_preferences(
|
async def get_preferences(
|
||||||
@@ -171,14 +164,10 @@ async def get_preferences(
|
|||||||
):
|
):
|
||||||
prefs = get_user_preferences(db, current_user.id)
|
prefs = get_user_preferences(db, current_user.id)
|
||||||
|
|
||||||
available_backends = ["aliyun"]
|
|
||||||
if can_user_use_local_model(current_user):
|
|
||||||
available_backends.append("local")
|
|
||||||
|
|
||||||
return {
|
return {
|
||||||
"default_backend": prefs.get("default_backend", "aliyun"),
|
"default_backend": "local",
|
||||||
"onboarding_completed": prefs.get("onboarding_completed", False),
|
"onboarding_completed": prefs.get("onboarding_completed", False),
|
||||||
"available_backends": available_backends
|
"available_backends": ["local"]
|
||||||
}
|
}
|
||||||
|
|
||||||
@router.put("/preferences")
|
@router.put("/preferences")
|
||||||
@@ -189,13 +178,6 @@ async def update_preferences(
|
|||||||
current_user: Annotated[User, Depends(get_current_user)],
|
current_user: Annotated[User, Depends(get_current_user)],
|
||||||
db: Session = Depends(get_db)
|
db: Session = Depends(get_db)
|
||||||
):
|
):
|
||||||
if preferences.default_backend == "local":
|
|
||||||
if not can_user_use_local_model(current_user):
|
|
||||||
raise HTTPException(
|
|
||||||
status_code=status.HTTP_403_FORBIDDEN,
|
|
||||||
detail="Local model is not available. Please contact administrator."
|
|
||||||
)
|
|
||||||
|
|
||||||
updated_user = update_user_preferences(
|
updated_user = update_user_preferences(
|
||||||
db,
|
db,
|
||||||
current_user.id,
|
current_user.id,
|
||||||
@@ -70,14 +70,7 @@ async def process_custom_voice_job(
|
|||||||
|
|
||||||
logger.info(f"Processing custom-voice job {job_id} with backend {backend_type}")
|
logger.info(f"Processing custom-voice job {job_id} with backend {backend_type}")
|
||||||
|
|
||||||
user_api_key = None
|
backend = await TTSServiceFactory.get_backend()
|
||||||
if backend_type == "aliyun":
|
|
||||||
from db.crud import get_system_setting
|
|
||||||
encrypted = get_system_setting(db, "aliyun_api_key")
|
|
||||||
if encrypted:
|
|
||||||
user_api_key = decrypt_api_key(encrypted)
|
|
||||||
|
|
||||||
backend = await TTSServiceFactory.get_backend(backend_type, user_api_key)
|
|
||||||
|
|
||||||
audio_bytes, sample_rate = await backend.generate_custom_voice(request_data)
|
audio_bytes, sample_rate = await backend.generate_custom_voice(request_data)
|
||||||
|
|
||||||
@@ -133,19 +126,9 @@ async def process_voice_design_job(
|
|||||||
|
|
||||||
logger.info(f"Processing voice-design job {job_id} with backend {backend_type}")
|
logger.info(f"Processing voice-design job {job_id} with backend {backend_type}")
|
||||||
|
|
||||||
user_api_key = None
|
backend = await TTSServiceFactory.get_backend()
|
||||||
if backend_type == "aliyun":
|
|
||||||
from db.crud import get_system_setting
|
|
||||||
encrypted = get_system_setting(db, "aliyun_api_key")
|
|
||||||
if encrypted:
|
|
||||||
user_api_key = decrypt_api_key(encrypted)
|
|
||||||
|
|
||||||
backend = await TTSServiceFactory.get_backend(backend_type, user_api_key)
|
audio_bytes, sample_rate = await backend.generate_voice_design(request_data)
|
||||||
|
|
||||||
if backend_type == "aliyun" and saved_voice_id:
|
|
||||||
audio_bytes, sample_rate = await backend.generate_voice_design(request_data, saved_voice_id)
|
|
||||||
else:
|
|
||||||
audio_bytes, sample_rate = await backend.generate_voice_design(request_data)
|
|
||||||
|
|
||||||
timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
|
timestamp = datetime.utcnow().strftime("%Y%m%d_%H%M%S")
|
||||||
filename = f"{user_id}_{job_id}_{timestamp}.wav"
|
filename = f"{user_id}_{job_id}_{timestamp}.wav"
|
||||||
@@ -200,14 +183,6 @@ async def process_voice_clone_job(
|
|||||||
|
|
||||||
logger.info(f"Processing voice-clone job {job_id} with backend {backend_type}")
|
logger.info(f"Processing voice-clone job {job_id} with backend {backend_type}")
|
||||||
|
|
||||||
from core.security import decrypt_api_key
|
|
||||||
user_api_key = None
|
|
||||||
if backend_type == "aliyun":
|
|
||||||
from db.crud import get_system_setting
|
|
||||||
encrypted = get_system_setting(db, "aliyun_api_key")
|
|
||||||
if encrypted:
|
|
||||||
user_api_key = decrypt_api_key(encrypted)
|
|
||||||
|
|
||||||
with open(ref_audio_path, 'rb') as f:
|
with open(ref_audio_path, 'rb') as f:
|
||||||
ref_audio_data = f.read()
|
ref_audio_data = f.read()
|
||||||
|
|
||||||
@@ -233,7 +208,7 @@ async def process_voice_clone_job(
|
|||||||
ref_audio_data = f.read()
|
ref_audio_data = f.read()
|
||||||
ref_audio_hash = cache_manager.get_audio_hash(ref_audio_data)
|
ref_audio_hash = cache_manager.get_audio_hash(ref_audio_data)
|
||||||
|
|
||||||
if request_data.get('x_vector_only_mode', False) and backend_type == "local":
|
if request_data.get('x_vector_only_mode', False):
|
||||||
x_vector = None
|
x_vector = None
|
||||||
cache_id = None
|
cache_id = None
|
||||||
|
|
||||||
@@ -287,9 +262,9 @@ async def process_voice_clone_job(
|
|||||||
logger.info(f"Job {job_id} completed (x_vector_only_mode)")
|
logger.info(f"Job {job_id} completed (x_vector_only_mode)")
|
||||||
return
|
return
|
||||||
|
|
||||||
backend = await TTSServiceFactory.get_backend(backend_type, user_api_key)
|
backend = await TTSServiceFactory.get_backend()
|
||||||
|
|
||||||
if voice_design_id and backend_type == "local":
|
if voice_design_id:
|
||||||
from db.crud import get_voice_design
|
from db.crud import get_voice_design
|
||||||
design = get_voice_design(db, voice_design_id, user_id)
|
design = get_voice_design(db, voice_design_id, user_id)
|
||||||
cached = await cache_manager.get_cache_by_id(design.voice_cache_id, db)
|
cached = await cache_manager.get_cache_by_id(design.voice_cache_id, db)
|
||||||
@@ -339,34 +314,20 @@ async def create_custom_voice_job(
|
|||||||
current_user: User = Depends(get_current_user),
|
current_user: User = Depends(get_current_user),
|
||||||
db: Session = Depends(get_db)
|
db: Session = Depends(get_db)
|
||||||
):
|
):
|
||||||
from core.security import decrypt_api_key
|
from db.crud import can_user_use_local_model
|
||||||
from db.crud import get_user_preferences, can_user_use_local_model
|
|
||||||
|
|
||||||
user_prefs = get_user_preferences(db, current_user.id)
|
if not can_user_use_local_model(current_user):
|
||||||
preferred_backend = user_prefs.get("default_backend", "aliyun")
|
|
||||||
|
|
||||||
can_use_local = can_user_use_local_model(current_user)
|
|
||||||
|
|
||||||
backend_type = req_data.backend if hasattr(req_data, 'backend') and req_data.backend else preferred_backend
|
|
||||||
|
|
||||||
if backend_type == "local" and not can_use_local:
|
|
||||||
raise HTTPException(
|
raise HTTPException(
|
||||||
status_code=status.HTTP_403_FORBIDDEN,
|
status_code=status.HTTP_403_FORBIDDEN,
|
||||||
detail="Local model is not available. Please contact administrator."
|
detail="Local model is not available. Please contact administrator."
|
||||||
)
|
)
|
||||||
|
|
||||||
if backend_type == "aliyun":
|
backend_type = "local"
|
||||||
from db.crud import get_system_setting
|
|
||||||
if not get_system_setting(db, "aliyun_api_key"):
|
|
||||||
raise HTTPException(
|
|
||||||
status_code=status.HTTP_400_BAD_REQUEST,
|
|
||||||
detail="Aliyun API key not configured. Please contact administrator."
|
|
||||||
)
|
|
||||||
|
|
||||||
try:
|
try:
|
||||||
validate_text_length(req_data.text)
|
validate_text_length(req_data.text)
|
||||||
language = validate_language(req_data.language)
|
language = validate_language(req_data.language)
|
||||||
speaker = validate_speaker(req_data.speaker, backend_type)
|
speaker = validate_speaker(req_data.speaker)
|
||||||
|
|
||||||
params = validate_generation_params({
|
params = validate_generation_params({
|
||||||
'max_new_tokens': req_data.max_new_tokens,
|
'max_new_tokens': req_data.max_new_tokens,
|
||||||
@@ -430,48 +391,24 @@ async def create_voice_design_job(
|
|||||||
current_user: User = Depends(get_current_user),
|
current_user: User = Depends(get_current_user),
|
||||||
db: Session = Depends(get_db)
|
db: Session = Depends(get_db)
|
||||||
):
|
):
|
||||||
from core.security import decrypt_api_key
|
from db.crud import can_user_use_local_model, get_voice_design, update_voice_design_usage
|
||||||
from db.crud import get_user_preferences, can_user_use_local_model, get_voice_design, update_voice_design_usage
|
|
||||||
|
|
||||||
user_prefs = get_user_preferences(db, current_user.id)
|
if not can_user_use_local_model(current_user):
|
||||||
preferred_backend = user_prefs.get("default_backend", "aliyun")
|
raise HTTPException(
|
||||||
|
status_code=status.HTTP_403_FORBIDDEN,
|
||||||
|
detail="Local model is not available. Please contact administrator."
|
||||||
|
)
|
||||||
|
|
||||||
can_use_local = can_user_use_local_model(current_user)
|
backend_type = "local"
|
||||||
|
|
||||||
backend_type = req_data.backend if hasattr(req_data, 'backend') and req_data.backend else preferred_backend
|
|
||||||
|
|
||||||
saved_voice_id = None
|
|
||||||
|
|
||||||
if req_data.saved_design_id:
|
if req_data.saved_design_id:
|
||||||
saved_design = get_voice_design(db, req_data.saved_design_id, current_user.id)
|
saved_design = get_voice_design(db, req_data.saved_design_id, current_user.id)
|
||||||
if not saved_design:
|
if not saved_design:
|
||||||
raise HTTPException(status_code=404, detail="Saved voice design not found")
|
raise HTTPException(status_code=404, detail="Saved voice design not found")
|
||||||
|
|
||||||
if saved_design.backend_type != backend_type:
|
|
||||||
raise HTTPException(
|
|
||||||
status_code=400,
|
|
||||||
detail=f"Saved design backend ({saved_design.backend_type}) doesn't match current backend ({backend_type})"
|
|
||||||
)
|
|
||||||
|
|
||||||
req_data.instruct = saved_design.instruct
|
req_data.instruct = saved_design.instruct
|
||||||
saved_voice_id = saved_design.aliyun_voice_id
|
|
||||||
|
|
||||||
update_voice_design_usage(db, req_data.saved_design_id, current_user.id)
|
update_voice_design_usage(db, req_data.saved_design_id, current_user.id)
|
||||||
|
|
||||||
if backend_type == "local" and not can_use_local:
|
|
||||||
raise HTTPException(
|
|
||||||
status_code=status.HTTP_403_FORBIDDEN,
|
|
||||||
detail="Local model is not available. Please contact administrator."
|
|
||||||
)
|
|
||||||
|
|
||||||
if backend_type == "aliyun":
|
|
||||||
from db.crud import get_system_setting
|
|
||||||
if not get_system_setting(db, "aliyun_api_key"):
|
|
||||||
raise HTTPException(
|
|
||||||
status_code=status.HTTP_400_BAD_REQUEST,
|
|
||||||
detail="Aliyun API key not configured. Please contact administrator."
|
|
||||||
)
|
|
||||||
|
|
||||||
try:
|
try:
|
||||||
validate_text_length(req_data.text)
|
validate_text_length(req_data.text)
|
||||||
language = validate_language(req_data.language)
|
language = validate_language(req_data.language)
|
||||||
@@ -553,29 +490,15 @@ async def create_voice_clone_job(
|
|||||||
current_user: User = Depends(get_current_user),
|
current_user: User = Depends(get_current_user),
|
||||||
db: Session = Depends(get_db)
|
db: Session = Depends(get_db)
|
||||||
):
|
):
|
||||||
from core.security import decrypt_api_key
|
from db.crud import can_user_use_local_model, get_voice_design
|
||||||
from db.crud import get_user_preferences, can_user_use_local_model, get_voice_design
|
|
||||||
|
|
||||||
user_prefs = get_user_preferences(db, current_user.id)
|
if not can_user_use_local_model(current_user):
|
||||||
preferred_backend = user_prefs.get("default_backend", "aliyun")
|
|
||||||
|
|
||||||
can_use_local = can_user_use_local_model(current_user)
|
|
||||||
|
|
||||||
backend_type = backend if backend else preferred_backend
|
|
||||||
|
|
||||||
if backend_type == "local" and not can_use_local:
|
|
||||||
raise HTTPException(
|
raise HTTPException(
|
||||||
status_code=status.HTTP_403_FORBIDDEN,
|
status_code=status.HTTP_403_FORBIDDEN,
|
||||||
detail="Local model is not available. Please contact administrator."
|
detail="Local model is not available. Please contact administrator."
|
||||||
)
|
)
|
||||||
|
|
||||||
if backend_type == "aliyun":
|
backend_type = "local"
|
||||||
from db.crud import get_system_setting
|
|
||||||
if not get_system_setting(db, "aliyun_api_key"):
|
|
||||||
raise HTTPException(
|
|
||||||
status_code=status.HTTP_400_BAD_REQUEST,
|
|
||||||
detail="Aliyun API key not configured. Please contact administrator."
|
|
||||||
)
|
|
||||||
|
|
||||||
ref_audio_data = None
|
ref_audio_data = None
|
||||||
ref_audio_hash = None
|
ref_audio_hash = None
|
||||||
@@ -600,9 +523,6 @@ async def create_voice_clone_job(
|
|||||||
if not design:
|
if not design:
|
||||||
raise ValueError("Voice design not found")
|
raise ValueError("Voice design not found")
|
||||||
|
|
||||||
if design.backend_type != backend_type:
|
|
||||||
raise ValueError(f"Voice design backend ({design.backend_type}) doesn't match request backend ({backend_type})")
|
|
||||||
|
|
||||||
if not design.voice_cache_id:
|
if not design.voice_cache_id:
|
||||||
raise ValueError("Voice design has no prepared clone prompt. Please call /voice-designs/{id}/prepare-clone first")
|
raise ValueError("Voice design has no prepared clone prompt. Please call /voice-designs/{id}/prepare-clone first")
|
||||||
|
|
||||||
@@ -5,7 +5,6 @@ from slowapi import Limiter
|
|||||||
from slowapi.util import get_remote_address
|
from slowapi.util import get_remote_address
|
||||||
|
|
||||||
from api.auth import get_current_user
|
from api.auth import get_current_user
|
||||||
from config import settings
|
|
||||||
from core.security import get_password_hash
|
from core.security import get_password_hash
|
||||||
from db.database import get_db
|
from db.database import get_db
|
||||||
from db.crud import (
|
from db.crud import (
|
||||||
@@ -17,7 +16,7 @@ from db.crud import (
|
|||||||
update_user,
|
update_user,
|
||||||
delete_user
|
delete_user
|
||||||
)
|
)
|
||||||
from schemas.user import User, UserCreateByAdmin, UserUpdate, UserListResponse, AliyunKeyUpdate, AliyunKeyVerifyResponse
|
from schemas.user import User, UserCreateByAdmin, UserUpdate, UserListResponse
|
||||||
from schemas.audiobook import LLMConfigUpdate, LLMConfigResponse, NsfwSynopsisGenerationRequest, NsfwScriptGenerationRequest
|
from schemas.audiobook import LLMConfigUpdate, LLMConfigResponse, NsfwSynopsisGenerationRequest, NsfwScriptGenerationRequest
|
||||||
|
|
||||||
router = APIRouter(prefix="/users", tags=["users"])
|
router = APIRouter(prefix="/users", tags=["users"])
|
||||||
@@ -181,63 +180,6 @@ async def delete_user_by_id(
|
|||||||
)
|
)
|
||||||
|
|
||||||
|
|
||||||
@router.post("/system/aliyun-key")
|
|
||||||
@limiter.limit("5/minute")
|
|
||||||
async def set_system_aliyun_key(
|
|
||||||
request: Request,
|
|
||||||
key_data: AliyunKeyUpdate,
|
|
||||||
db: Session = Depends(get_db),
|
|
||||||
_: User = Depends(require_superuser)
|
|
||||||
):
|
|
||||||
from core.security import encrypt_api_key
|
|
||||||
from core.tts_service import AliyunTTSBackend
|
|
||||||
from db.crud import set_system_setting
|
|
||||||
|
|
||||||
api_key = key_data.api_key.strip()
|
|
||||||
aliyun_backend = AliyunTTSBackend(api_key=api_key, region=settings.ALIYUN_REGION)
|
|
||||||
health = await aliyun_backend.health_check()
|
|
||||||
if not health.get("available", False):
|
|
||||||
raise HTTPException(status_code=status.HTTP_400_BAD_REQUEST, detail="Invalid Aliyun API key.")
|
|
||||||
set_system_setting(db, "aliyun_api_key", encrypt_api_key(api_key))
|
|
||||||
return {"message": "Aliyun API key updated"}
|
|
||||||
|
|
||||||
|
|
||||||
@router.delete("/system/aliyun-key")
|
|
||||||
@limiter.limit("5/minute")
|
|
||||||
async def delete_system_aliyun_key(
|
|
||||||
request: Request,
|
|
||||||
db: Session = Depends(get_db),
|
|
||||||
_: User = Depends(require_superuser)
|
|
||||||
):
|
|
||||||
from db.crud import delete_system_setting
|
|
||||||
delete_system_setting(db, "aliyun_api_key")
|
|
||||||
return {"message": "Aliyun API key deleted"}
|
|
||||||
|
|
||||||
|
|
||||||
@router.get("/system/aliyun-key/verify", response_model=AliyunKeyVerifyResponse)
|
|
||||||
@limiter.limit("10/minute")
|
|
||||||
async def verify_system_aliyun_key(
|
|
||||||
request: Request,
|
|
||||||
db: Session = Depends(get_db),
|
|
||||||
_: User = Depends(require_superuser)
|
|
||||||
):
|
|
||||||
from core.security import decrypt_api_key
|
|
||||||
from core.tts_service import AliyunTTSBackend
|
|
||||||
from db.crud import get_system_setting
|
|
||||||
|
|
||||||
encrypted = get_system_setting(db, "aliyun_api_key")
|
|
||||||
if not encrypted:
|
|
||||||
return AliyunKeyVerifyResponse(valid=False, message="No Aliyun API key configured")
|
|
||||||
api_key = decrypt_api_key(encrypted)
|
|
||||||
if not api_key:
|
|
||||||
return AliyunKeyVerifyResponse(valid=False, message="Failed to decrypt API key")
|
|
||||||
aliyun_backend = AliyunTTSBackend(api_key=api_key, region=settings.ALIYUN_REGION)
|
|
||||||
health = await aliyun_backend.health_check()
|
|
||||||
if health.get("available", False):
|
|
||||||
return AliyunKeyVerifyResponse(valid=True, message="Aliyun API key is valid and working")
|
|
||||||
return AliyunKeyVerifyResponse(valid=False, message="Aliyun API key is not working.")
|
|
||||||
|
|
||||||
|
|
||||||
@router.put("/system/llm-config")
|
@router.put("/system/llm-config")
|
||||||
@limiter.limit("10/minute")
|
@limiter.limit("10/minute")
|
||||||
async def set_system_llm_config(
|
async def set_system_llm_config(
|
||||||
@@ -33,9 +33,7 @@ def to_voice_design_response(design) -> VoiceDesignResponse:
|
|||||||
id=design.id,
|
id=design.id,
|
||||||
user_id=design.user_id,
|
user_id=design.user_id,
|
||||||
name=design.name,
|
name=design.name,
|
||||||
backend_type=design.backend_type,
|
|
||||||
instruct=design.instruct,
|
instruct=design.instruct,
|
||||||
aliyun_voice_id=design.aliyun_voice_id,
|
|
||||||
meta_data=meta_data,
|
meta_data=meta_data,
|
||||||
preview_text=design.preview_text,
|
preview_text=design.preview_text,
|
||||||
ref_audio_path=design.ref_audio_path,
|
ref_audio_path=design.ref_audio_path,
|
||||||
@@ -58,8 +56,6 @@ async def save_voice_design(
|
|||||||
user_id=current_user.id,
|
user_id=current_user.id,
|
||||||
name=data.name,
|
name=data.name,
|
||||||
instruct=data.instruct,
|
instruct=data.instruct,
|
||||||
backend_type=data.backend_type,
|
|
||||||
aliyun_voice_id=data.aliyun_voice_id,
|
|
||||||
meta_data=data.meta_data,
|
meta_data=data.meta_data,
|
||||||
preview_text=data.preview_text
|
preview_text=data.preview_text
|
||||||
)
|
)
|
||||||
@@ -153,7 +149,6 @@ async def prepare_and_create_voice_design(
|
|||||||
user_id=current_user.id,
|
user_id=current_user.id,
|
||||||
name=data.name,
|
name=data.name,
|
||||||
instruct=data.instruct,
|
instruct=data.instruct,
|
||||||
backend_type="local",
|
|
||||||
meta_data=data.meta_data,
|
meta_data=data.meta_data,
|
||||||
preview_text=data.preview_text,
|
preview_text=data.preview_text,
|
||||||
voice_cache_id=cache_id,
|
voice_cache_id=cache_id,
|
||||||
@@ -200,12 +195,6 @@ async def prepare_voice_clone_prompt(
|
|||||||
if not design:
|
if not design:
|
||||||
raise HTTPException(status_code=404, detail="Voice design not found")
|
raise HTTPException(status_code=404, detail="Voice design not found")
|
||||||
|
|
||||||
if design.backend_type != "local":
|
|
||||||
raise HTTPException(
|
|
||||||
status_code=400,
|
|
||||||
detail="Voice clone prompt preparation is only supported for local backend"
|
|
||||||
)
|
|
||||||
|
|
||||||
if not can_user_use_local_model(current_user):
|
if not can_user_use_local_model(current_user):
|
||||||
raise HTTPException(
|
raise HTTPException(
|
||||||
status_code=403,
|
status_code=403,
|
||||||
@@ -25,6 +25,7 @@ class Settings(BaseSettings):
|
|||||||
WORKERS: int = Field(default=1)
|
WORKERS: int = Field(default=1)
|
||||||
LOG_LEVEL: str = Field(default="info")
|
LOG_LEVEL: str = Field(default="info")
|
||||||
LOG_FILE: str = Field(default="./app.log")
|
LOG_FILE: str = Field(default="./app.log")
|
||||||
|
DEV_MODE: bool = Field(default=False)
|
||||||
|
|
||||||
RATE_LIMIT_PER_MINUTE: int = Field(default=50)
|
RATE_LIMIT_PER_MINUTE: int = Field(default=50)
|
||||||
RATE_LIMIT_PER_HOUR: int = Field(default=1000)
|
RATE_LIMIT_PER_HOUR: int = Field(default=1000)
|
||||||
@@ -36,12 +37,6 @@ class Settings(BaseSettings):
|
|||||||
MAX_TEXT_LENGTH: int = Field(default=1000)
|
MAX_TEXT_LENGTH: int = Field(default=1000)
|
||||||
MAX_AUDIO_SIZE_MB: int = Field(default=10)
|
MAX_AUDIO_SIZE_MB: int = Field(default=10)
|
||||||
|
|
||||||
ALIYUN_REGION: str = Field(default="beijing")
|
|
||||||
|
|
||||||
ALIYUN_MODEL_FLASH: str = Field(default="qwen3-tts-flash-realtime")
|
|
||||||
ALIYUN_MODEL_VC: str = Field(default="qwen3-tts-vc-realtime-2026-01-15")
|
|
||||||
ALIYUN_MODEL_VD: str = Field(default="qwen3-tts-vd-realtime-2026-01-15")
|
|
||||||
|
|
||||||
DEFAULT_BACKEND: str = Field(default="local")
|
DEFAULT_BACKEND: str = Field(default="local")
|
||||||
|
|
||||||
AUDIOBOOK_PARSE_CONCURRENCY: int = Field(default=3)
|
AUDIOBOOK_PARSE_CONCURRENCY: int = Field(default=3)
|
||||||
@@ -60,7 +55,10 @@ class Settings(BaseSettings):
|
|||||||
return v
|
return v
|
||||||
|
|
||||||
def validate(self):
|
def validate(self):
|
||||||
if self.SECRET_KEY == "your-secret-key-change-this-in-production":
|
if self.DEV_MODE:
|
||||||
|
import warnings
|
||||||
|
warnings.warn("DEV_MODE is enabled — authentication is bypassed. Do NOT use in production.")
|
||||||
|
elif self.SECRET_KEY == "your-secret-key-change-this-in-production":
|
||||||
raise ValueError("Insecure default SECRET_KEY is not allowed. Please set a strong SECRET_KEY in environment.")
|
raise ValueError("Insecure default SECRET_KEY is not allowed. Please set a strong SECRET_KEY in environment.")
|
||||||
|
|
||||||
Path(self.CACHE_DIR).mkdir(parents=True, exist_ok=True)
|
Path(self.CACHE_DIR).mkdir(parents=True, exist_ok=True)
|
||||||
@@ -335,7 +335,7 @@ async def generate_ai_script(project_id: int, user: User, db: Session) -> None:
|
|||||||
crud.delete_audiobook_segments(db, project_id)
|
crud.delete_audiobook_segments(db, project_id)
|
||||||
crud.delete_audiobook_characters(db, project_id)
|
crud.delete_audiobook_characters(db, project_id)
|
||||||
|
|
||||||
backend_type = user.user_preferences.get("default_backend", "aliyun") if user.user_preferences else "aliyun"
|
backend_type = "local"
|
||||||
|
|
||||||
for char_data in characters_data:
|
for char_data in characters_data:
|
||||||
name = char_data.get("name", "旁白")
|
name = char_data.get("name", "旁白")
|
||||||
@@ -449,7 +449,7 @@ async def generate_ai_script_chapters(project_id: int, user: User, db: Session)
|
|||||||
for c in db_characters
|
for c in db_characters
|
||||||
]
|
]
|
||||||
char_map = {c.name: c for c in db_characters}
|
char_map = {c.name: c for c in db_characters}
|
||||||
backend_type = user.user_preferences.get("default_backend", "aliyun") if user.user_preferences else "aliyun"
|
backend_type = "local"
|
||||||
|
|
||||||
ps.append_line(key, f"[AI剧本] 开始生成 {num_chapters} 章大纲...\n")
|
ps.append_line(key, f"[AI剧本] 开始生成 {num_chapters} 章大纲...\n")
|
||||||
ps.append_line(key, "")
|
ps.append_line(key, "")
|
||||||
@@ -618,7 +618,7 @@ async def continue_ai_script_chapters(project_id: int, additional_chapters: int,
|
|||||||
for c in db_characters
|
for c in db_characters
|
||||||
]
|
]
|
||||||
char_map = {c.name: c for c in db_characters}
|
char_map = {c.name: c for c in db_characters}
|
||||||
backend_type = user.user_preferences.get("default_backend", "aliyun") if user.user_preferences else "aliyun"
|
backend_type = "local"
|
||||||
|
|
||||||
existing_chapters = crud.list_audiobook_chapters(db, project_id)
|
existing_chapters = crud.list_audiobook_chapters(db, project_id)
|
||||||
existing_chapters_data = [
|
existing_chapters_data = [
|
||||||
@@ -839,7 +839,7 @@ async def analyze_project(project_id: int, user: User, db: Session, turbo: bool
|
|||||||
crud.delete_audiobook_segments(db, project_id)
|
crud.delete_audiobook_segments(db, project_id)
|
||||||
crud.delete_audiobook_characters(db, project_id)
|
crud.delete_audiobook_characters(db, project_id)
|
||||||
|
|
||||||
backend_type = user.user_preferences.get("default_backend", "aliyun") if user.user_preferences else "aliyun"
|
backend_type = "local"
|
||||||
|
|
||||||
for char_data in characters_data:
|
for char_data in characters_data:
|
||||||
name = char_data.get("name", "旁白")
|
name = char_data.get("name", "旁白")
|
||||||
@@ -1437,7 +1437,7 @@ async def process_all(project_id: int, user: User, db: Session) -> None:
|
|||||||
logger.info(f"process_all: project={project_id} complete")
|
logger.info(f"process_all: project={project_id} complete")
|
||||||
|
|
||||||
|
|
||||||
async def generate_character_preview(project_id: int, char_id: int, user: User, db: Session) -> None:
|
async def generate_character_preview(project_id: int, char_id: int, user: User, db: Session, force_recreate: bool = False) -> None:
|
||||||
"""Generate a short audio preview for a specific character."""
|
"""Generate a short audio preview for a specific character."""
|
||||||
project = crud.get_audiobook_project(db, project_id, user.id)
|
project = crud.get_audiobook_project(db, project_id, user.id)
|
||||||
if not project:
|
if not project:
|
||||||
@@ -1470,21 +1470,17 @@ async def generate_character_preview(project_id: int, char_id: int, user: User,
|
|||||||
preview_text = f"你好,我是{preview_name}{preview_desc}"
|
preview_text = f"你好,我是{preview_name}{preview_desc}"
|
||||||
|
|
||||||
from core.tts_service import TTSServiceFactory
|
from core.tts_service import TTSServiceFactory
|
||||||
from core.security import decrypt_api_key
|
|
||||||
|
|
||||||
backend_type = user.user_preferences.get("default_backend", "aliyun") if user.user_preferences else "aliyun"
|
backend = await TTSServiceFactory.get_backend()
|
||||||
user_api_key = None
|
|
||||||
if backend_type == "aliyun":
|
|
||||||
encrypted = crud.get_system_setting(db, "aliyun_api_key")
|
|
||||||
if encrypted:
|
|
||||||
user_api_key = decrypt_api_key(encrypted)
|
|
||||||
elif user.aliyun_api_key:
|
|
||||||
user_api_key = decrypt_api_key(user.aliyun_api_key)
|
|
||||||
|
|
||||||
backend = await TTSServiceFactory.get_backend(backend_type, user_api_key)
|
|
||||||
|
|
||||||
try:
|
try:
|
||||||
if backend_type == "local" and not design.voice_cache_id:
|
if force_recreate and design.voice_cache_id:
|
||||||
|
design.voice_cache_id = None
|
||||||
|
db.commit()
|
||||||
|
db.refresh(design)
|
||||||
|
logger.info(f"Cleared voice_cache_id for char {char_id} (force_recreate)")
|
||||||
|
|
||||||
|
if not design.voice_cache_id:
|
||||||
logger.info(f"Local voice cache missing for char {char_id}. Bootstrapping now...")
|
logger.info(f"Local voice cache missing for char {char_id}. Bootstrapping now...")
|
||||||
from core.model_manager import ModelManager
|
from core.model_manager import ModelManager
|
||||||
from core.cache_manager import VoiceCacheManager
|
from core.cache_manager import VoiceCacheManager
|
||||||
@@ -1524,73 +1520,46 @@ async def generate_character_preview(project_id: int, char_id: int, user: User,
|
|||||||
db.commit()
|
db.commit()
|
||||||
logger.info(f"Bootstrapped local voice cache for preview: design_id={design.id}, cache_id={cache_id}")
|
logger.info(f"Bootstrapped local voice cache for preview: design_id={design.id}, cache_id={cache_id}")
|
||||||
|
|
||||||
if backend_type == "aliyun" and not design.aliyun_voice_id:
|
if design.voice_cache_id:
|
||||||
from core.tts_service import AliyunTTSBackend
|
from core.cache_manager import VoiceCacheManager
|
||||||
if isinstance(backend, AliyunTTSBackend):
|
cache_manager = await VoiceCacheManager.get_instance()
|
||||||
try:
|
cache_result = await cache_manager.get_cache_by_id(design.voice_cache_id, db)
|
||||||
voice_id = await backend._create_voice_design(
|
x_vector = cache_result['data'] if cache_result else None
|
||||||
instruct=_get_gendered_instruct(char.gender, design.instruct),
|
if x_vector:
|
||||||
preview_text=preview_text,
|
audio_bytes, _ = await backend.generate_voice_clone(
|
||||||
)
|
{
|
||||||
design.aliyun_voice_id = voice_id
|
|
||||||
db.commit()
|
|
||||||
logger.info(f"Bootstrapped aliyun voice_id for preview: design_id={design.id}, voice_id={voice_id}")
|
|
||||||
except Exception as e:
|
|
||||||
logger.warning(f"Failed to bootstrap aliyun voice_id for preview, falling back to instruct: {e}")
|
|
||||||
|
|
||||||
if backend_type == "aliyun":
|
|
||||||
if design.aliyun_voice_id:
|
|
||||||
audio_bytes, _ = await backend.generate_voice_design(
|
|
||||||
{"text": preview_text, "language": "zh"},
|
|
||||||
saved_voice_id=design.aliyun_voice_id
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
audio_bytes, _ = await backend.generate_voice_design({
|
|
||||||
"text": preview_text,
|
|
||||||
"language": "zh",
|
|
||||||
"instruct": _get_gendered_instruct(char.gender, design.instruct),
|
|
||||||
})
|
|
||||||
else:
|
|
||||||
if design.voice_cache_id:
|
|
||||||
from core.cache_manager import VoiceCacheManager
|
|
||||||
cache_manager = await VoiceCacheManager.get_instance()
|
|
||||||
cache_result = await cache_manager.get_cache_by_id(design.voice_cache_id, db)
|
|
||||||
x_vector = cache_result['data'] if cache_result else None
|
|
||||||
if x_vector:
|
|
||||||
audio_bytes, _ = await backend.generate_voice_clone(
|
|
||||||
{
|
|
||||||
"text": preview_text,
|
|
||||||
"language": "Auto",
|
|
||||||
"max_new_tokens": 512,
|
|
||||||
"temperature": 0.3,
|
|
||||||
"top_k": 10,
|
|
||||||
"top_p": 0.9,
|
|
||||||
"repetition_penalty": 1.05,
|
|
||||||
},
|
|
||||||
x_vector=x_vector
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
audio_bytes, _ = await backend.generate_voice_design({
|
|
||||||
"text": preview_text,
|
"text": preview_text,
|
||||||
"language": "Auto",
|
"language": "Auto",
|
||||||
"instruct": _get_gendered_instruct(char.gender, design.instruct),
|
|
||||||
"max_new_tokens": 512,
|
"max_new_tokens": 512,
|
||||||
"temperature": 0.3,
|
"temperature": 0.3,
|
||||||
"top_k": 10,
|
"top_k": 10,
|
||||||
"top_p": 0.9,
|
"top_p": 0.9,
|
||||||
"repetition_penalty": 1.05,
|
"repetition_penalty": 1.05,
|
||||||
})
|
},
|
||||||
|
x_vector=x_vector
|
||||||
|
)
|
||||||
else:
|
else:
|
||||||
audio_bytes, _ = await backend.generate_voice_design({
|
audio_bytes, _ = await backend.generate_voice_design({
|
||||||
"text": preview_text,
|
"text": preview_text,
|
||||||
"language": "Auto",
|
"language": "Auto",
|
||||||
"instruct": design.instruct,
|
"instruct": _get_gendered_instruct(char.gender, design.instruct),
|
||||||
"max_new_tokens": 512,
|
"max_new_tokens": 512,
|
||||||
"temperature": 0.3,
|
"temperature": 0.3,
|
||||||
"top_k": 10,
|
"top_k": 10,
|
||||||
"top_p": 0.9,
|
"top_p": 0.9,
|
||||||
"repetition_penalty": 1.05,
|
"repetition_penalty": 1.05,
|
||||||
})
|
})
|
||||||
|
else:
|
||||||
|
audio_bytes, _ = await backend.generate_voice_design({
|
||||||
|
"text": preview_text,
|
||||||
|
"language": "Auto",
|
||||||
|
"instruct": design.instruct,
|
||||||
|
"max_new_tokens": 512,
|
||||||
|
"temperature": 0.3,
|
||||||
|
"top_k": 10,
|
||||||
|
"top_p": 0.9,
|
||||||
|
"repetition_penalty": 1.05,
|
||||||
|
})
|
||||||
|
|
||||||
with open(audio_path, "wb") as f:
|
with open(audio_path, "wb") as f:
|
||||||
f.write(audio_bytes)
|
f.write(audio_bytes)
|
||||||
@@ -1672,7 +1641,7 @@ async def generate_ai_script_nsfw(project_id: int, user: User, db: Session) -> N
|
|||||||
crud.delete_audiobook_segments(db, project_id)
|
crud.delete_audiobook_segments(db, project_id)
|
||||||
crud.delete_audiobook_characters(db, project_id)
|
crud.delete_audiobook_characters(db, project_id)
|
||||||
|
|
||||||
backend_type = user.user_preferences.get("default_backend", "aliyun") if user.user_preferences else "aliyun"
|
backend_type = "local"
|
||||||
|
|
||||||
for char_data in characters_data:
|
for char_data in characters_data:
|
||||||
name = char_data.get("name", "旁白")
|
name = char_data.get("name", "旁白")
|
||||||
@@ -321,26 +321,17 @@ class LLMService:
|
|||||||
|
|
||||||
@staticmethod
|
@staticmethod
|
||||||
def _emotion_limits(violence_level: int, eroticism_level: int) -> tuple[str, str]:
|
def _emotion_limits(violence_level: int, eroticism_level: int) -> tuple[str, str]:
|
||||||
v = violence_level / 10
|
|
||||||
e = eroticism_level / 10
|
|
||||||
female_happy = round(0.20 + 0.45 * e, 2)
|
|
||||||
angry = round(0.15 + 0.65 * v, 2)
|
|
||||||
sad = round(0.10 + 0.40 * v, 2)
|
|
||||||
fear = round(0.10 + 0.60 * v, 2)
|
|
||||||
hate = round(0.35 + 0.25 * max(v, e), 2)
|
|
||||||
low = round(0.35 + 0.45 * e, 2)
|
|
||||||
surprise= round(0.10 + 0.35 * max(v, e), 2)
|
|
||||||
limits = (
|
|
||||||
f"愤怒={angry}、悲伤={sad}、恐惧={fear}、厌恶={hate}、低沉={low}、惊讶={surprise}、"
|
|
||||||
f"开心:男性角色上限=0.20,女性角色上限={female_happy}"
|
|
||||||
)
|
|
||||||
guidance_parts = []
|
guidance_parts = []
|
||||||
if violence_level >= 4:
|
if violence_level >= 7:
|
||||||
guidance_parts.append(f"暴力程度{violence_level}/10,台词中的愤怒、恐惧、悲伤情绪必须强烈外露,不得克制")
|
guidance_parts.append(f"暴力程度{violence_level}/10,激烈场景的愤怒/恐惧/悲伤强度应用7-10级,不得克制")
|
||||||
if eroticism_level >= 4:
|
elif violence_level >= 4:
|
||||||
guidance_parts.append(f"色情程度{eroticism_level}/10,女性台词中的开心、低沉、挑逗情绪应充分表达")
|
guidance_parts.append(f"暴力程度{violence_level}/10,台词中的愤怒/恐惧/悲伤情绪可用4-7级")
|
||||||
|
if eroticism_level >= 7:
|
||||||
|
guidance_parts.append(f"色情程度{eroticism_level}/10,女性台词中的开心/低沉情绪应用7-10级充分表达")
|
||||||
|
elif eroticism_level >= 4:
|
||||||
|
guidance_parts.append(f"色情程度{eroticism_level}/10,女性台词中的开心/低沉情绪可用4-7级")
|
||||||
guidance = ";".join(guidance_parts)
|
guidance = ";".join(guidance_parts)
|
||||||
return limits, guidance
|
return "", guidance
|
||||||
|
|
||||||
async def generate_chapter_script(
|
async def generate_chapter_script(
|
||||||
self,
|
self,
|
||||||
@@ -383,11 +374,9 @@ class LLMService:
|
|||||||
" 【角色名】\"对话内容\"(情感词:强度)\n\n"
|
" 【角色名】\"对话内容\"(情感词:强度)\n\n"
|
||||||
"情感标注规则:\n"
|
"情感标注规则:\n"
|
||||||
"- 情感词可选:开心、愤怒、悲伤、恐惧、厌恶、低沉、惊讶\n"
|
"- 情感词可选:开心、愤怒、悲伤、恐惧、厌恶、低沉、惊讶\n"
|
||||||
"- 单一情感:(情感词:强度),如(开心:0.5)、(悲伤:0.3)\n"
|
"- 每行只允许标注一个情感词,格式:(情感词:强度级别),强度为1–10的整数,10最强\n"
|
||||||
"- 混合情感:(情感1:比重+情感2:比重),如(开心:0.6+悲伤:0.2)、(愤怒:0.3+恐惧:0.4)\n"
|
"- 示例:(开心:6)、(悲伤:3)、(愤怒:8)\n"
|
||||||
"- 混合情感时每个情感的比重独立设定,反映各自对情绪的贡献\n"
|
"- 鼓励使用低值(1–3)表达微弱、内敛或一闪而过的情绪,无需非强即无\n"
|
||||||
f"- 各情感比重上限(严格不超过):{limits_str}\n"
|
|
||||||
"- 鼓励使用低值(0.05–0.10)表达微弱、内敛或一闪而过的情绪,无需非强即无\n"
|
|
||||||
"- 确实没有任何情绪色彩时可省略整个括号\n"
|
"- 确实没有任何情绪色彩时可省略整个括号\n"
|
||||||
+ char_personality_str
|
+ char_personality_str
|
||||||
+ narrator_rule
|
+ narrator_rule
|
||||||
@@ -468,18 +457,15 @@ class LLMService:
|
|||||||
"所有非对话的叙述文字归属于旁白角色。\n"
|
"所有非对话的叙述文字归属于旁白角色。\n"
|
||||||
"同时根据语境为每个片段判断是否有明显情绪,有则在 emo_text 中标注,无则留空。\n"
|
"同时根据语境为每个片段判断是否有明显情绪,有则在 emo_text 中标注,无则留空。\n"
|
||||||
"可选情绪词:开心、愤怒、悲伤、恐惧、厌恶、低沉、惊讶。\n"
|
"可选情绪词:开心、愤怒、悲伤、恐惧、厌恶、低沉、惊讶。\n"
|
||||||
"emo_text 格式规则:\n"
|
"emo_text 只允许单一情感词;emo_alpha 为1–10的整数表示强度(10最强);完全无情绪色彩时 emo_text 置空,emo_alpha 为 0。\n"
|
||||||
" 单一情感:直接填情感词,用 emo_alpha 设置强度,如 emo_text=\"开心\", emo_alpha=0.3\n"
|
"鼓励用低值(1–3)表达微弱或内敛的情绪,不要非强即无。\n"
|
||||||
" 混合情感:用 情感词:比重 格式拼接,emo_alpha 设为 1.0,如 emo_text=\"开心:0.6+悲伤:0.2\", emo_alpha=1.0\n"
|
|
||||||
"各情感比重上限(严格不超过):开心=0.20、愤怒=0.15、悲伤=0.1、恐惧=0.1、厌恶=0.35、低沉=0.35、惊讶=0.10。\n"
|
|
||||||
"鼓励用低值(0.05–0.10)表达微弱或内敛的情绪,不要非强即无;完全无情绪色彩时 emo_text 置空。\n"
|
|
||||||
+ personality_str
|
+ personality_str
|
||||||
+ "同一角色的连续台词,情绪应尽量保持一致或仅有微弱变化,避免相邻片段间情绪跳跃。\n"
|
+ "同一角色的连续台词,情绪应尽量保持一致或仅有微弱变化,避免相邻片段间情绪跳跃。\n"
|
||||||
"只输出JSON数组,不要有其他文字,格式如下:\n"
|
"只输出JSON数组,不要有其他文字,格式如下:\n"
|
||||||
'[{"character": "旁白", "text": "叙述文字", "emo_text": "", "emo_alpha": 0}, '
|
'[{"character": "旁白", "text": "叙述文字", "emo_text": "", "emo_alpha": 0}, '
|
||||||
'{"character": "角色名", "text": "淡淡的问候", "emo_text": "开心", "emo_alpha": 0.08}, '
|
'{"character": "角色名", "text": "淡淡的问候", "emo_text": "开心", "emo_alpha": 3}, '
|
||||||
'{"character": "角色名", "text": "激动的欢呼", "emo_text": "开心", "emo_alpha": 0.18}, '
|
'{"character": "角色名", "text": "激动的欢呼", "emo_text": "开心", "emo_alpha": 8}, '
|
||||||
'{"character": "角色名", "text": "含泪的笑", "emo_text": "开心:0.12+悲伤:0.08", "emo_alpha": 1.0}]'
|
'{"character": "角色名", "text": "愤怒的质问", "emo_text": "愤怒", "emo_alpha": 7}]'
|
||||||
)
|
)
|
||||||
user_message = f"请解析以下章节文本:\n\n{chapter_text}"
|
user_message = f"请解析以下章节文本:\n\n{chapter_text}"
|
||||||
result = await self.stream_chat_json(system_prompt, user_message, on_token, max_tokens=16384, usage_callback=usage_callback)
|
result = await self.stream_chat_json(system_prompt, user_message, on_token, max_tokens=16384, usage_callback=usage_callback)
|
||||||
286
backend/core/tts_service.py
Normal file
286
backend/core/tts_service.py
Normal file
@@ -0,0 +1,286 @@
|
|||||||
|
import asyncio
|
||||||
|
import functools
|
||||||
|
import logging
|
||||||
|
from abc import ABC, abstractmethod
|
||||||
|
from typing import Tuple, Optional
|
||||||
|
|
||||||
|
logger = logging.getLogger(__name__)
|
||||||
|
|
||||||
|
|
||||||
|
class TTSBackend(ABC):
|
||||||
|
@abstractmethod
|
||||||
|
async def generate_custom_voice(self, params: dict) -> Tuple[bytes, int]:
|
||||||
|
pass
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
async def generate_voice_design(self, params: dict) -> Tuple[bytes, int]:
|
||||||
|
pass
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
async def generate_voice_clone(self, params: dict, ref_audio_bytes: bytes) -> Tuple[bytes, int]:
|
||||||
|
pass
|
||||||
|
|
||||||
|
@abstractmethod
|
||||||
|
async def health_check(self) -> dict:
|
||||||
|
pass
|
||||||
|
|
||||||
|
|
||||||
|
class LocalTTSBackend(TTSBackend):
|
||||||
|
def __init__(self):
|
||||||
|
self.model_manager = None
|
||||||
|
# Add a lock to prevent concurrent VRAM contention and CUDA errors on local GPU models
|
||||||
|
self._gpu_lock = asyncio.Lock()
|
||||||
|
|
||||||
|
async def initialize(self):
|
||||||
|
from core.model_manager import ModelManager
|
||||||
|
self.model_manager = await ModelManager.get_instance()
|
||||||
|
|
||||||
|
async def generate_custom_voice(self, params: dict) -> Tuple[bytes, int]:
|
||||||
|
await self.model_manager.load_model("custom-voice")
|
||||||
|
_, tts = await self.model_manager.get_current_model()
|
||||||
|
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
async with self._gpu_lock:
|
||||||
|
result = await loop.run_in_executor(
|
||||||
|
None,
|
||||||
|
functools.partial(
|
||||||
|
tts.generate_custom_voice,
|
||||||
|
text=params['text'],
|
||||||
|
language=params['language'],
|
||||||
|
speaker=params['speaker'],
|
||||||
|
instruct=params.get('instruct', ''),
|
||||||
|
max_new_tokens=params['max_new_tokens'],
|
||||||
|
temperature=params['temperature'],
|
||||||
|
top_k=params['top_k'],
|
||||||
|
top_p=params['top_p'],
|
||||||
|
repetition_penalty=params['repetition_penalty'],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
wavs, sample_rate = result if isinstance(result, tuple) else (result, 24000)
|
||||||
|
audio_data = wavs[0] if isinstance(wavs, list) else wavs
|
||||||
|
return self._numpy_to_bytes(audio_data), sample_rate
|
||||||
|
|
||||||
|
async def generate_voice_design(self, params: dict) -> Tuple[bytes, int]:
|
||||||
|
await self.model_manager.load_model("voice-design")
|
||||||
|
_, tts = await self.model_manager.get_current_model()
|
||||||
|
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
async with self._gpu_lock:
|
||||||
|
result = await loop.run_in_executor(
|
||||||
|
None,
|
||||||
|
functools.partial(
|
||||||
|
tts.generate_voice_design,
|
||||||
|
text=params['text'],
|
||||||
|
language=params['language'],
|
||||||
|
instruct=params['instruct'],
|
||||||
|
max_new_tokens=params['max_new_tokens'],
|
||||||
|
temperature=params['temperature'],
|
||||||
|
top_k=params['top_k'],
|
||||||
|
top_p=params['top_p'],
|
||||||
|
repetition_penalty=params['repetition_penalty'],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
wavs, sample_rate = result if isinstance(result, tuple) else (result, 24000)
|
||||||
|
audio_data = wavs[0] if isinstance(wavs, list) else wavs
|
||||||
|
return self._numpy_to_bytes(audio_data), sample_rate
|
||||||
|
|
||||||
|
async def generate_voice_clone(self, params: dict, ref_audio_bytes: bytes = None, x_vector=None) -> Tuple[bytes, int]:
|
||||||
|
from utils.audio import process_ref_audio
|
||||||
|
|
||||||
|
await self.model_manager.load_model("base")
|
||||||
|
_, tts = await self.model_manager.get_current_model()
|
||||||
|
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
|
||||||
|
async with self._gpu_lock:
|
||||||
|
if x_vector is None:
|
||||||
|
if ref_audio_bytes is None:
|
||||||
|
raise ValueError("Either ref_audio_bytes or x_vector must be provided")
|
||||||
|
|
||||||
|
ref_audio_array, ref_sr = process_ref_audio(ref_audio_bytes)
|
||||||
|
|
||||||
|
x_vector = await loop.run_in_executor(
|
||||||
|
None,
|
||||||
|
functools.partial(
|
||||||
|
tts.create_voice_clone_prompt,
|
||||||
|
ref_audio=(ref_audio_array, ref_sr),
|
||||||
|
ref_text=params.get('ref_text', ''),
|
||||||
|
x_vector_only_mode=False,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
wavs, sample_rate = await loop.run_in_executor(
|
||||||
|
None,
|
||||||
|
functools.partial(
|
||||||
|
tts.generate_voice_clone,
|
||||||
|
text=params['text'],
|
||||||
|
language=params['language'],
|
||||||
|
voice_clone_prompt=x_vector,
|
||||||
|
max_new_tokens=params['max_new_tokens'],
|
||||||
|
temperature=params['temperature'],
|
||||||
|
top_k=params['top_k'],
|
||||||
|
top_p=params['top_p'],
|
||||||
|
repetition_penalty=params['repetition_penalty'],
|
||||||
|
)
|
||||||
|
)
|
||||||
|
|
||||||
|
import numpy as np
|
||||||
|
audio_data = wavs[0] if isinstance(wavs, list) else wavs
|
||||||
|
if isinstance(audio_data, list):
|
||||||
|
audio_data = np.array(audio_data)
|
||||||
|
return self._numpy_to_bytes(audio_data), sample_rate
|
||||||
|
|
||||||
|
async def health_check(self) -> dict:
|
||||||
|
return {
|
||||||
|
"available": self.model_manager is not None,
|
||||||
|
"current_model": self.model_manager.current_model_name if self.model_manager else None
|
||||||
|
}
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _numpy_to_bytes(audio_array) -> bytes:
|
||||||
|
import numpy as np
|
||||||
|
import io
|
||||||
|
import wave
|
||||||
|
|
||||||
|
if isinstance(audio_array, list):
|
||||||
|
audio_array = np.array(audio_array)
|
||||||
|
|
||||||
|
audio_array = np.clip(audio_array, -1.0, 1.0)
|
||||||
|
audio_int16 = (audio_array * 32767).astype(np.int16)
|
||||||
|
|
||||||
|
buffer = io.BytesIO()
|
||||||
|
with wave.open(buffer, 'wb') as wav_file:
|
||||||
|
wav_file.setnchannels(1)
|
||||||
|
wav_file.setsampwidth(2)
|
||||||
|
wav_file.setframerate(24000)
|
||||||
|
wav_file.writeframes(audio_int16.tobytes())
|
||||||
|
|
||||||
|
buffer.seek(0)
|
||||||
|
return buffer.read()
|
||||||
|
|
||||||
|
|
||||||
|
class IndexTTS2Backend:
|
||||||
|
_gpu_lock = asyncio.Lock()
|
||||||
|
|
||||||
|
# Level 10 = these raw weights. Scale linearly: level N → N/10 * max
|
||||||
|
EMO_LEVEL_MAX: dict[str, float] = {
|
||||||
|
"开心": 0.75, "happy": 0.75,
|
||||||
|
"愤怒": 0.08, "angry": 0.08,
|
||||||
|
"悲伤": 0.90, "sad": 0.90,
|
||||||
|
"恐惧": 0.10, "fear": 0.10,
|
||||||
|
"厌恶": 0.50, "hate": 0.50,
|
||||||
|
"低沉": 0.35, "low": 0.35,
|
||||||
|
"惊讶": 0.35, "surprise": 0.35,
|
||||||
|
}
|
||||||
|
|
||||||
|
# Emotion keyword → index mapping
|
||||||
|
# Order: [happy, angry, sad, fear, hate, low, surprise, neutral]
|
||||||
|
_EMO_KEYWORDS = [
|
||||||
|
['喜', '开心', '快乐', '高兴', '欢乐', '愉快', 'happy', '热情', '兴奋', '愉悦', '激动'],
|
||||||
|
['怒', '愤怒', '生气', '恼', 'angry', '气愤', '愤慨'],
|
||||||
|
['哀', '悲伤', '难过', '忧郁', '伤心', '悲', 'sad', '感慨', '沉重', '沉痛', '哭'],
|
||||||
|
['惧', '恐惧', '害怕', '恐', 'fear', '担心', '紧张'],
|
||||||
|
['厌恶', '厌', 'hate', '讨厌', '反感'],
|
||||||
|
['低落', '沮丧', '消沉', 'low', '抑郁', '颓废'],
|
||||||
|
['惊喜', '惊讶', '意外', 'surprise', '惊', '吃惊', '震惊'],
|
||||||
|
]
|
||||||
|
|
||||||
|
@staticmethod
|
||||||
|
def _emo_text_to_vector(emo_text: str) -> Optional[list]:
|
||||||
|
tokens = [t.strip() for t in emo_text.split('+') if t.strip()]
|
||||||
|
matched = []
|
||||||
|
for tok in tokens:
|
||||||
|
if ':' in tok:
|
||||||
|
name_part, w_str = tok.rsplit(':', 1)
|
||||||
|
try:
|
||||||
|
weight: Optional[float] = float(w_str)
|
||||||
|
except ValueError:
|
||||||
|
weight = None
|
||||||
|
else:
|
||||||
|
name_part = tok
|
||||||
|
weight = None
|
||||||
|
name_lower = name_part.lower().strip()
|
||||||
|
for idx, words in enumerate(IndexTTS2Backend._EMO_KEYWORDS):
|
||||||
|
for word in words:
|
||||||
|
if word in name_lower:
|
||||||
|
matched.append((idx, weight))
|
||||||
|
break
|
||||||
|
if not matched:
|
||||||
|
return None
|
||||||
|
vec = [0.0] * 8
|
||||||
|
has_explicit = any(w is not None for _, w in matched)
|
||||||
|
if has_explicit:
|
||||||
|
for idx, w in matched:
|
||||||
|
vec[idx] = w if w is not None else 0.5
|
||||||
|
else:
|
||||||
|
score = 0.8 if len(matched) == 1 else 0.5
|
||||||
|
for idx, _ in matched:
|
||||||
|
vec[idx] = 0.2 if idx == 1 else score
|
||||||
|
return vec
|
||||||
|
|
||||||
|
async def generate(
|
||||||
|
self,
|
||||||
|
text: str,
|
||||||
|
spk_audio_prompt: str,
|
||||||
|
output_path: str,
|
||||||
|
emo_text: str = None,
|
||||||
|
emo_alpha: float = 0.6,
|
||||||
|
) -> bytes:
|
||||||
|
from core.model_manager import IndexTTS2ModelManager
|
||||||
|
manager = await IndexTTS2ModelManager.get_instance()
|
||||||
|
tts = await manager.get_model()
|
||||||
|
loop = asyncio.get_event_loop()
|
||||||
|
|
||||||
|
emo_vector = None
|
||||||
|
if emo_text and len(emo_text.strip()) > 0:
|
||||||
|
resolved_emo_text = emo_text
|
||||||
|
resolved_emo_alpha = emo_alpha
|
||||||
|
if emo_alpha is not None and emo_alpha > 1:
|
||||||
|
level = min(10, max(1, round(emo_alpha)))
|
||||||
|
name = emo_text.strip()
|
||||||
|
max_val = self.EMO_LEVEL_MAX.get(name)
|
||||||
|
if max_val is None:
|
||||||
|
name_lower = name.lower()
|
||||||
|
for key, val in self.EMO_LEVEL_MAX.items():
|
||||||
|
if key in name_lower or name_lower in key:
|
||||||
|
max_val = val
|
||||||
|
break
|
||||||
|
if max_val is None:
|
||||||
|
max_val = 0.20
|
||||||
|
weight = round(level / 10 * max_val, 4)
|
||||||
|
resolved_emo_text = f"{name}:{weight}"
|
||||||
|
resolved_emo_alpha = 1.0
|
||||||
|
raw_vector = self._emo_text_to_vector(resolved_emo_text)
|
||||||
|
if raw_vector is not None:
|
||||||
|
emo_vector = [v * resolved_emo_alpha for v in raw_vector]
|
||||||
|
logger.info(f"IndexTTS2 emo_text={repr(emo_text)} emo_alpha={emo_alpha} → resolved={repr(resolved_emo_text)} emo_vector={emo_vector}")
|
||||||
|
|
||||||
|
async with IndexTTS2Backend._gpu_lock:
|
||||||
|
await loop.run_in_executor(
|
||||||
|
None,
|
||||||
|
functools.partial(
|
||||||
|
tts.infer,
|
||||||
|
spk_audio_prompt=spk_audio_prompt,
|
||||||
|
text=text,
|
||||||
|
output_path=output_path,
|
||||||
|
emo_vector=emo_vector,
|
||||||
|
emo_alpha=1.0,
|
||||||
|
)
|
||||||
|
)
|
||||||
|
with open(output_path, 'rb') as f:
|
||||||
|
return f.read()
|
||||||
|
|
||||||
|
|
||||||
|
class TTSServiceFactory:
|
||||||
|
_local_backend: Optional[LocalTTSBackend] = None
|
||||||
|
|
||||||
|
@classmethod
|
||||||
|
async def get_backend(cls, backend_type: str = None, user_api_key: Optional[str] = None) -> TTSBackend:
|
||||||
|
if cls._local_backend is None:
|
||||||
|
cls._local_backend = LocalTTSBackend()
|
||||||
|
await cls._local_backend.initialize()
|
||||||
|
return cls._local_backend
|
||||||
@@ -114,21 +114,6 @@ def change_user_password(
|
|||||||
db.refresh(user)
|
db.refresh(user)
|
||||||
return user
|
return user
|
||||||
|
|
||||||
def update_user_aliyun_key(
|
|
||||||
db: Session,
|
|
||||||
user_id: int,
|
|
||||||
encrypted_api_key: Optional[str]
|
|
||||||
) -> Optional[User]:
|
|
||||||
user = get_user_by_id(db, user_id)
|
|
||||||
if not user:
|
|
||||||
return None
|
|
||||||
|
|
||||||
user.aliyun_api_key = encrypted_api_key
|
|
||||||
user.updated_at = datetime.utcnow()
|
|
||||||
db.commit()
|
|
||||||
db.refresh(user)
|
|
||||||
return user
|
|
||||||
|
|
||||||
def create_job(db: Session, user_id: int, job_type: str, input_data: Dict[str, Any]) -> Job:
|
def create_job(db: Session, user_id: int, job_type: str, input_data: Dict[str, Any]) -> Job:
|
||||||
job = Job(
|
job = Job(
|
||||||
user_id=user_id,
|
user_id=user_id,
|
||||||
@@ -244,8 +229,11 @@ def delete_cache_entry(db: Session, cache_id: int, user_id: int) -> bool:
|
|||||||
def get_user_preferences(db: Session, user_id: int) -> dict:
|
def get_user_preferences(db: Session, user_id: int) -> dict:
|
||||||
user = get_user_by_id(db, user_id)
|
user = get_user_by_id(db, user_id)
|
||||||
if not user or not user.user_preferences:
|
if not user or not user.user_preferences:
|
||||||
return {"default_backend": "aliyun", "onboarding_completed": False}
|
return {"default_backend": "local", "onboarding_completed": False}
|
||||||
return user.user_preferences
|
prefs = dict(user.user_preferences)
|
||||||
|
if prefs.get("default_backend") == "aliyun":
|
||||||
|
prefs["default_backend"] = "local"
|
||||||
|
return prefs
|
||||||
|
|
||||||
def update_user_preferences(db: Session, user_id: int, preferences: dict) -> Optional[User]:
|
def update_user_preferences(db: Session, user_id: int, preferences: dict) -> Optional[User]:
|
||||||
user = get_user_by_id(db, user_id)
|
user = get_user_by_id(db, user_id)
|
||||||
@@ -276,7 +264,7 @@ def update_system_setting(db: Session, key: str, value: dict) -> SystemSettings:
|
|||||||
return setting
|
return setting
|
||||||
|
|
||||||
def can_user_use_local_model(user: User) -> bool:
|
def can_user_use_local_model(user: User) -> bool:
|
||||||
return user.is_superuser or user.can_use_local_model
|
return True
|
||||||
|
|
||||||
def can_user_use_nsfw(user: User) -> bool:
|
def can_user_use_nsfw(user: User) -> bool:
|
||||||
return user.is_superuser or user.can_use_nsfw
|
return user.is_superuser or user.can_use_nsfw
|
||||||
@@ -286,8 +274,6 @@ def create_voice_design(
|
|||||||
user_id: int,
|
user_id: int,
|
||||||
name: str,
|
name: str,
|
||||||
instruct: str,
|
instruct: str,
|
||||||
backend_type: str,
|
|
||||||
aliyun_voice_id: Optional[str] = None,
|
|
||||||
meta_data: Optional[Dict[str, Any]] = None,
|
meta_data: Optional[Dict[str, Any]] = None,
|
||||||
preview_text: Optional[str] = None,
|
preview_text: Optional[str] = None,
|
||||||
voice_cache_id: Optional[int] = None,
|
voice_cache_id: Optional[int] = None,
|
||||||
@@ -297,9 +283,7 @@ def create_voice_design(
|
|||||||
design = VoiceDesign(
|
design = VoiceDesign(
|
||||||
user_id=user_id,
|
user_id=user_id,
|
||||||
name=name,
|
name=name,
|
||||||
backend_type=backend_type,
|
|
||||||
instruct=instruct,
|
instruct=instruct,
|
||||||
aliyun_voice_id=aliyun_voice_id,
|
|
||||||
meta_data=meta_data,
|
meta_data=meta_data,
|
||||||
preview_text=preview_text,
|
preview_text=preview_text,
|
||||||
voice_cache_id=voice_cache_id,
|
voice_cache_id=voice_cache_id,
|
||||||
@@ -331,8 +315,6 @@ def list_voice_designs(
|
|||||||
VoiceDesign.user_id == user_id,
|
VoiceDesign.user_id == user_id,
|
||||||
VoiceDesign.is_active == True
|
VoiceDesign.is_active == True
|
||||||
)
|
)
|
||||||
if backend_type:
|
|
||||||
query = query.filter(VoiceDesign.backend_type == backend_type)
|
|
||||||
return query.order_by(VoiceDesign.last_used.desc()).offset(skip).limit(limit).all()
|
return query.order_by(VoiceDesign.last_used.desc()).offset(skip).limit(limit).all()
|
||||||
|
|
||||||
def count_voice_designs(
|
def count_voice_designs(
|
||||||
@@ -340,13 +322,10 @@ def count_voice_designs(
|
|||||||
user_id: int,
|
user_id: int,
|
||||||
backend_type: Optional[str] = None
|
backend_type: Optional[str] = None
|
||||||
) -> int:
|
) -> int:
|
||||||
query = db.query(VoiceDesign).filter(
|
return db.query(VoiceDesign).filter(
|
||||||
VoiceDesign.user_id == user_id,
|
VoiceDesign.user_id == user_id,
|
||||||
VoiceDesign.is_active == True
|
VoiceDesign.is_active == True
|
||||||
)
|
).count()
|
||||||
if backend_type:
|
|
||||||
query = query.filter(VoiceDesign.backend_type == backend_type)
|
|
||||||
return query.count()
|
|
||||||
|
|
||||||
def delete_voice_design(db: Session, design_id: int, user_id: int) -> bool:
|
def delete_voice_design(db: Session, design_id: int, user_id: int) -> bool:
|
||||||
design = get_voice_design(db, design_id, user_id)
|
design = get_voice_design(db, design_id, user_id)
|
||||||
@@ -609,7 +588,6 @@ def update_audiobook_character(
|
|||||||
description: Optional[str] = None,
|
description: Optional[str] = None,
|
||||||
instruct: Optional[str] = None,
|
instruct: Optional[str] = None,
|
||||||
voice_design_id: Optional[int] = None,
|
voice_design_id: Optional[int] = None,
|
||||||
use_indextts2: Optional[bool] = None,
|
|
||||||
) -> Optional[AudiobookCharacter]:
|
) -> Optional[AudiobookCharacter]:
|
||||||
char = db.query(AudiobookCharacter).filter(AudiobookCharacter.id == char_id).first()
|
char = db.query(AudiobookCharacter).filter(AudiobookCharacter.id == char_id).first()
|
||||||
if not char:
|
if not char:
|
||||||
@@ -624,8 +602,6 @@ def update_audiobook_character(
|
|||||||
char.instruct = instruct
|
char.instruct = instruct
|
||||||
if voice_design_id is not None:
|
if voice_design_id is not None:
|
||||||
char.voice_design_id = voice_design_id
|
char.voice_design_id = voice_design_id
|
||||||
if use_indextts2 is not None:
|
|
||||||
char.use_indextts2 = use_indextts2
|
|
||||||
db.commit()
|
db.commit()
|
||||||
db.refresh(char)
|
db.refresh(char)
|
||||||
return char
|
return char
|
||||||
@@ -34,13 +34,12 @@ class User(Base):
|
|||||||
hashed_password = Column(String(255), nullable=False)
|
hashed_password = Column(String(255), nullable=False)
|
||||||
is_active = Column(Boolean, default=True, nullable=False)
|
is_active = Column(Boolean, default=True, nullable=False)
|
||||||
is_superuser = Column(Boolean, default=False, nullable=False)
|
is_superuser = Column(Boolean, default=False, nullable=False)
|
||||||
aliyun_api_key = Column(Text, nullable=True)
|
|
||||||
llm_api_key = Column(Text, nullable=True)
|
llm_api_key = Column(Text, nullable=True)
|
||||||
llm_base_url = Column(String(500), nullable=True)
|
llm_base_url = Column(String(500), nullable=True)
|
||||||
llm_model = Column(String(200), nullable=True)
|
llm_model = Column(String(200), nullable=True)
|
||||||
can_use_local_model = Column(Boolean, default=False, nullable=False)
|
can_use_local_model = Column(Boolean, default=False, nullable=False)
|
||||||
can_use_nsfw = Column(Boolean, default=False, nullable=False)
|
can_use_nsfw = Column(Boolean, default=False, nullable=False)
|
||||||
user_preferences = Column(JSON, nullable=True, default=lambda: {"default_backend": "aliyun", "onboarding_completed": False})
|
user_preferences = Column(JSON, nullable=True, default=lambda: {"default_backend": "local", "onboarding_completed": False})
|
||||||
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
|
created_at = Column(DateTime, default=datetime.utcnow, nullable=False)
|
||||||
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
|
updated_at = Column(DateTime, default=datetime.utcnow, onupdate=datetime.utcnow, nullable=False)
|
||||||
|
|
||||||
@@ -105,9 +104,7 @@ class VoiceDesign(Base):
|
|||||||
id = Column(Integer, primary_key=True, index=True)
|
id = Column(Integer, primary_key=True, index=True)
|
||||||
user_id = Column(Integer, ForeignKey("users.id"), nullable=False, index=True)
|
user_id = Column(Integer, ForeignKey("users.id"), nullable=False, index=True)
|
||||||
name = Column(String(100), nullable=False)
|
name = Column(String(100), nullable=False)
|
||||||
backend_type = Column(String(20), nullable=False, index=True)
|
|
||||||
instruct = Column(Text, nullable=False)
|
instruct = Column(Text, nullable=False)
|
||||||
aliyun_voice_id = Column(String(255), nullable=True)
|
|
||||||
meta_data = Column(JSON, nullable=True)
|
meta_data = Column(JSON, nullable=True)
|
||||||
preview_text = Column(Text, nullable=True)
|
preview_text = Column(Text, nullable=True)
|
||||||
ref_audio_path = Column(String(500), nullable=True)
|
ref_audio_path = Column(String(500), nullable=True)
|
||||||
@@ -121,7 +118,6 @@ class VoiceDesign(Base):
|
|||||||
user = relationship("User", back_populates="voice_designs")
|
user = relationship("User", back_populates="voice_designs")
|
||||||
|
|
||||||
__table_args__ = (
|
__table_args__ = (
|
||||||
Index('idx_user_backend', 'user_id', 'backend_type'),
|
|
||||||
Index('idx_user_active', 'user_id', 'is_active'),
|
Index('idx_user_active', 'user_id', 'is_active'),
|
||||||
)
|
)
|
||||||
|
|
||||||
@@ -176,8 +172,6 @@ class AudiobookCharacter(Base):
|
|||||||
description = Column(Text, nullable=True)
|
description = Column(Text, nullable=True)
|
||||||
instruct = Column(Text, nullable=True)
|
instruct = Column(Text, nullable=True)
|
||||||
voice_design_id = Column(Integer, ForeignKey("voice_designs.id"), nullable=True)
|
voice_design_id = Column(Integer, ForeignKey("voice_designs.id"), nullable=True)
|
||||||
use_indextts2 = Column(Boolean, default=False, nullable=False)
|
|
||||||
|
|
||||||
project = relationship("AudiobookProject", back_populates="characters")
|
project = relationship("AudiobookProject", back_populates="characters")
|
||||||
voice_design = relationship("VoiceDesign")
|
voice_design = relationship("VoiceDesign")
|
||||||
segments = relationship("AudiobookSegment", back_populates="character")
|
segments = relationship("AudiobookSegment", back_populates="character")
|
||||||
@@ -1,4 +1,4 @@
|
|||||||
upstream qwen_tts_backend {
|
upstream canto_backend {
|
||||||
server 127.0.0.1:8000;
|
server 127.0.0.1:8000;
|
||||||
}
|
}
|
||||||
|
|
||||||
@@ -13,7 +13,7 @@ server {
|
|||||||
proxy_send_timeout 300s;
|
proxy_send_timeout 300s;
|
||||||
|
|
||||||
location / {
|
location / {
|
||||||
proxy_pass http://qwen_tts_backend;
|
proxy_pass http://canto_backend;
|
||||||
proxy_set_header Host $host;
|
proxy_set_header Host $host;
|
||||||
proxy_set_header X-Real-IP $remote_addr;
|
proxy_set_header X-Real-IP $remote_addr;
|
||||||
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
|
||||||
@@ -34,20 +34,20 @@ server {
|
|||||||
}
|
}
|
||||||
|
|
||||||
location /outputs/ {
|
location /outputs/ {
|
||||||
alias /opt/qwen3-tts-backend/outputs/;
|
alias /opt/canto-backend/outputs/;
|
||||||
autoindex off;
|
autoindex off;
|
||||||
add_header Cache-Control "public, max-age=3600";
|
add_header Cache-Control "public, max-age=3600";
|
||||||
add_header Content-Disposition "attachment";
|
add_header Content-Disposition "attachment";
|
||||||
}
|
}
|
||||||
|
|
||||||
location /health {
|
location /health {
|
||||||
proxy_pass http://qwen_tts_backend/health;
|
proxy_pass http://canto_backend/health;
|
||||||
proxy_set_header Host $host;
|
proxy_set_header Host $host;
|
||||||
access_log off;
|
access_log off;
|
||||||
}
|
}
|
||||||
|
|
||||||
location /metrics {
|
location /metrics {
|
||||||
proxy_pass http://qwen_tts_backend/metrics;
|
proxy_pass http://canto_backend/metrics;
|
||||||
proxy_set_header Host $host;
|
proxy_set_header Host $host;
|
||||||
allow 127.0.0.1;
|
allow 127.0.0.1;
|
||||||
deny all;
|
deny all;
|
||||||
@@ -1,15 +1,15 @@
|
|||||||
[Unit]
|
[Unit]
|
||||||
Description=Qwen3-TTS Backend API Service
|
Description=Canto Backend API Service
|
||||||
After=network.target
|
After=network.target
|
||||||
|
|
||||||
[Service]
|
[Service]
|
||||||
Type=simple
|
Type=simple
|
||||||
User=qwen-tts
|
User=qwen-tts
|
||||||
Group=qwen-tts
|
Group=qwen-tts
|
||||||
WorkingDirectory=/opt/qwen3-tts-backend
|
WorkingDirectory=/opt/canto-backend
|
||||||
Environment="PATH=/opt/conda/envs/qwen3-tts/bin:/usr/local/bin:/usr/bin:/bin"
|
Environment="PATH=/opt/conda/envs/canto/bin:/usr/local/bin:/usr/bin:/bin"
|
||||||
EnvironmentFile=/opt/qwen3-tts-backend/.env
|
EnvironmentFile=/opt/canto-backend/.env
|
||||||
ExecStart=/opt/conda/envs/qwen3-tts/bin/python main.py
|
ExecStart=/opt/conda/envs/canto/bin/python main.py
|
||||||
Restart=on-failure
|
Restart=on-failure
|
||||||
RestartSec=10s
|
RestartSec=10s
|
||||||
StandardOutput=append:/var/log/qwen-tts/app.log
|
StandardOutput=append:/var/log/qwen-tts/app.log
|
||||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user