bdim/Canto

Go to file

bdim404 a756a31479 feat: Update README with desktop and mobile interface previews; add new images for light/dark modes, settings, and history

2026-02-05 23:51:40 +08:00

.github

Update GitHub Sponsors username in FUNDING.yml

2026-01-26 15:38:24 +08:00

docs

feat: add requirements document for multi-character dialogue feature in Qwen3-TTS

2026-01-26 19:33:11 +08:00

images

feat: Update README with desktop and mobile interface previews; add new images for light/dark modes, settings, and history

2026-02-05 23:51:40 +08:00

qwen3-tts-backend

refactor: Remove cache and metrics endpoints, and clean up voice design CRUD operations

2026-02-04 19:41:08 +08:00

qwen3-tts-frontend

refactor: Simplify AudioPlayer component by removing mobile detection logic and streamline download handling; enhance Home layout for better responsiveness

2026-02-05 23:30:07 +08:00

qwen_tts

init commit

2026-01-26 15:34:31 +08:00

.gitignore

fix: Update .gitignore to include backend scripts and service files; modify VITE_API_URL in .env.production for local development

2026-02-05 10:46:53 +08:00

LICENSE

Add Apache License 2.0

2026-01-26 15:37:54 +08:00

README.md

feat: Update README with desktop and mobile interface previews; add new images for light/dark modes, settings, and history

2026-02-05 23:51:40 +08:00

README.zh.md

feat: Update README with desktop and mobile interface previews; add new images for light/dark modes, settings, and history

2026-02-05 23:51:40 +08:00

README.md

Qwen3-TTS WebUI

A text-to-speech web application based on Qwen3-TTS, supporting custom voice, voice design, and voice cloning.

中文文档

Features

Custom Voice: Predefined speaker voices
Voice Design: Create voices from natural language descriptions
Voice Cloning: Clone voices from uploaded audio
Dual Backend Support: Switch between local model and Aliyun TTS API
Multi-language Support: English, 简体中文, 繁體中文, 日本語, 한국어
JWT auth, async tasks, voice cache, dark mode

Interface Preview

Desktop - Light Mode

Desktop - Dark Mode

Desktop - Voice Design List

Desktop - Save Voice Design Dialog

Desktop - Voice Cloning

Mobile - Light & Dark Mode

Mobile - Settings & History

Tech Stack

Backend: FastAPI + SQLAlchemy + PyTorch + JWT Frontend: React 19 + TypeScript + Vite + Tailwind + Shadcn/ui

Quick Start

Backend

cd qwen3-tts-backend
python -m venv venv
source venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
# Edit .env to configure MODEL_BASE_PATH and DEFAULT_BACKEND
# For local model: Ensure MODEL_BASE_PATH points to Qwen model directory
# For Aliyun: Set DEFAULT_BACKEND=aliyun and configure API key in web settings
uvicorn main:app --host 0.0.0.0 --port 8000 --reload

Frontend

cd qwen3-tts-frontend
npm install
cp .env.example .env
# Edit .env to configure VITE_API_URL
npm run dev

Visit http://localhost:5173

First Time Setup: On first run, a default superuser account will be automatically created:

Username: admin
Password: admin123456
IMPORTANT: Please change the password immediately after first login for security!

Configuration

Backend Configuration

Backend .env key settings:

SECRET_KEY=your-secret-key
MODEL_DEVICE=cuda:0
MODEL_BASE_PATH=../Qwen
DATABASE_URL=sqlite:///./qwen_tts.db

DEFAULT_BACKEND=local

ALIYUN_REGION=beijing
ALIYUN_MODEL_FLASH=qwen3-tts-flash-realtime
ALIYUN_MODEL_VC=qwen3-tts-vc-realtime-2026-01-15
ALIYUN_MODEL_VD=qwen3-tts-vd-realtime-2026-01-15

Backend Options:

DEFAULT_BACKEND: Default TTS backend, options: local or aliyun
Local Mode: Uses local Qwen3-TTS model (requires MODEL_BASE_PATH configuration)
Aliyun Mode: Uses Aliyun TTS API (requires users to configure their API keys in settings)

Aliyun Configuration:

Users need to add their Aliyun API keys in the web interface settings page
API keys are encrypted and stored securely in the database
Superuser can enable/disable local model access for all users
To obtain an Aliyun API key, visit the Aliyun Console

Frontend Configuration

Frontend .env:

VITE_API_URL=http://localhost:8000

Usage

Switching Between Backends

Log in to the web interface
Navigate to Settings page
Configure your preferred backend:
- Local Model: Select "本地模型" (requires local model to be enabled by superuser)
- Aliyun API: Select "阿里云" and add your API key
The selected backend will be used for all TTS operations by default
You can also specify a different backend per request using the backend parameter in the API

Managing Aliyun API Key

In Settings page, find the "阿里云 API 密钥" section
Enter your Aliyun API key
Click "更新密钥" to save and validate
The system will verify the key before saving
You can delete the key anytime using the delete button

API

POST /auth/register          - Register
POST /auth/token             - Login
POST /tts/custom-voice       - Custom voice (supports backend parameter)
POST /tts/voice-design       - Voice design (supports backend parameter)
POST /tts/voice-clone        - Voice cloning (supports backend parameter)
GET  /jobs                   - Job list
GET  /jobs/{id}/download     - Download result

Backend Parameter:

All TTS endpoints support an optional backend parameter to specify the TTS backend:

backend: "local" - Use local Qwen3-TTS model
backend: "aliyun" - Use Aliyun TTS API
If not specified, uses the user's default backend setting

License

Apache-2.0 license

Languages

Python 86.4%

TypeScript 12.5%

Cuda 0.4%

C 0.3%

CSS 0.2%

Other 0.1%