gpt-sovits
Docker image for https://github.com/RVC-Boss/GPT-SoVITS
10K+
Check out our demo video here!
Unseen speakers few-shot fine-tuning demo:
https://github.com/RVC-Boss/GPT-SoVITS/assets/129054828/05bee1fa-bdd8-4d85-9350-80c060ab47fb
For users in China region, you can use AutoDL Cloud Docker to experience the full functionality online: https://www.codewithgpu.com/i/RVC-Boss/GPT-SoVITS/GPT-SoVITS-Official
Zero-shot TTS: Input a 5-second vocal sample and experience instant text-to-speech conversion.
Few-shot TTS: Fine-tune the model with just 1 minute of training data for improved voice similarity and realism.
Cross-lingual Support: Inference in languages different from the training dataset, currently supporting English, Japanese, and Chinese.
WebUI Tools: Integrated tools include voice accompaniment separation, automatic training set segmentation, Chinese ASR, and text labeling, assisting beginners in creating training datasets and GPT/SoVITS models.
If you are a Windows user (tested with win>=10) you can install directly via the prezip. Just download the prezip, unzip it and double-click go-webui.bat to start GPT-SoVITS-WebUI.
Note: numba==0.56.4 require py<3.11
conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
bash install.sh
pip install -r requirements.txt
conda install ffmpeg
sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'
brew install ffmpeg
Download and place ffmpeg.exe and ffprobe.exe in the GPT-SoVITS root.
Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models.
For UVR5 (Vocals/Accompaniment Separation & Reverberation Removal, additionally), download models from UVR5 Weights and place them in tools/uvr5/uvr5_weights.
Users in China region can download these two models by entering the links below and clicking "Download a copy"
For Chinese ASR (additionally), download models from Damo ASR Model, Damo VAD Model, and Damo Punc Model and place them in tools/damo_asr/models.
If you are a Mac user, make sure you meet the following conditions for training and inferencing with GPU:
xcode-select --installOther Macs can do inference with CPU only.
Then install by using the following commands:
conda create -n GPTSoVits python=3.9
conda activate GPTSoVits
pip install -r requirements.txt
pip uninstall torch torchaudio
pip3 install --pre torch torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu
docker compose -f "docker-compose.yaml" up -d
As above, modify the corresponding parameters based on your actual situation, then run the following command:
docker run --rm -it --gpus=all --env=is_half=False --volume=G:\GPT-SoVITS-DockerTest\output:/workspace/output --volume=G:\GPT-SoVITS-DockerTest\logs:/workspace/logs --volume=G:\GPT-SoVITS-DockerTest\SoVITS_weights:/workspace/SoVITS_weights --workdir=/workspace -p 9870:9870 -p 9871:9871 -p 9872:9872 -p 9873:9873 -p 9874:9874 --shm-size="16G" -d breakstring/gpt-sovits:xxxxx
The TTS annotation .list file format:
vocal_path|speaker_name|language|text
Language dictionary:
Example:
D:\GPT-SoVITS\xxx/xxx.wav|xxx|en|I like playing Genshin.
High Priority:
Features:
Special thanks to the following projects and contributors:
Content type
Image
Digest
sha256:3d57370ff…
Size
4.5 GB
Last updated
almost 2 years ago
docker pull breakstring/gpt-sovits:latest-elite