最近换了电脑,终于可以试试GSV和RVC了。在简单体验一番后,萌生了将其与LLM结合起来的想法。
实现方法也简单,官方已经提供了API,只需要把LLM输出的话用它处理一遍就好了。(当然,考虑到TTS需要消耗一定时间,还需要对句子进行拆分,分段进行流式输出。)
API
以v2版本为例,要想使用GSV的API,首先要运行其根目录下的api_v2.py文件。需要在根目录下打开CMD,然后执行以下命令。
runtime\python api_v2.py -a 127.0.0.1 -p 9880 -c GPT_SoVITS/configs/tts_infer.yaml
- -a – 绑定地址, 默认”127.0.0.1″
- -p – 绑定端口, 默认9880
- -c– TTS配置文件路径, 默认”GPT_SoVITS/configs/tts_infer.yaml”
官方提供的API支持GET/POST方法
#GET:
http://127.0.0.1:9880/tts?text=先帝创业未半而中道崩殂,今天下三分,益州疲弊,此诚危急存亡之秋也。&text_lang=zh&ref_audio_path=archive_jingyuan_1.wav&prompt_lang=zh&prompt_text=我是「罗浮」云骑将军景元。不必拘谨,「将军」只是一时的身份,你称呼我景元便可&text_split_method=cut5&batch_size=1&media_type=wav&streaming_mode=true
#POST
{
"text": "", # str.(required) text to be synthesized
"text_lang: "", # str.(required) language of the text to be synthesized
"ref_audio_path": "", # str.(required) reference audio path
"aux_ref_audio_paths": [], # list.(optional) auxiliary reference audio paths for multi-speaker synthesis
"prompt_text": "", # str.(optional) prompt text for the reference audio
"prompt_lang": "", # str.(required) language of the prompt text for the reference audio
"top_k": 5, # int. top k sampling
"top_p": 1, # float. top p sampling
"temperature": 1, # float. temperature for sampling
"text_split_method": "cut0", # str. text split method, see text_segmentation_method.py for details.
"batch_size": 1, # int. batch size for inference
"batch_threshold": 0.75, # float. threshold for batch splitting.
"split_bucket": True, # bool. whether to split the batch into multiple buckets.
"return_fragment": False, # bool. step by step return the audio fragment.
"speed_factor":1.0, # float. control the speed of the synthesized audio.
"streaming_mode": False, # bool. whether to return a streaming response.
"seed": -1, # int. random seed for reproducibility.
"parallel_infer": True, # bool. whether to use parallel inference.
"repetition_penalty": 1.35 # float. repetition penalty for T2S model.
}
示例代码如下(使用默认配置):
import requests
import json
url = "http://127.0.0.1:9880/tts"
data = {
"text": "我在这里,博士。",
"text_lang": "zh",
"ref_audio_path": "干员报到.wav",
"aux_ref_audio_paths": [],
"prompt_text": "星象学者,星极,以近卫干员身份任职,今后就由您差遣了,博士。",
"prompt_lang": "zh",
"top_k": 5,
"top_p": 1,
"temperature": 1,
"text_split_method": "cut5",
"batch_size": 1,
"batch_threshold": 0.75,
"split_bucket": True,
"speed_factor":1.0,
"fragment_interval":0.3,
"seed": -1,
"media_type": "wav",
"streaming_mode": False,
"parallel_infer": True,
"repetition_penalty": 1.35
}
headers = {"Content-Type": "application/json"}
response = requests.post(url, data=json.dumps(data), headers=headers)
if response.status_code == 200:
# 保存生成的音频
with open("output.wav", "wb") as f:
f.write(response.content)
print("音频生成成功!")
return response.content
else:
print(f"请求失败,状态码:{response.status_code}, 错误信息:{response.text}")
效果如下:
可以听出来,尽管没有对模型进行过微调,其音色却已经比较接近原声。不过美中不足的是其情感上还是有些平淡。我们可以增加一些参考音频来改善这点:
除此之外,API还提供了一些控制命令
### 命令控制
endpoint: `/control`
command:
"restart": 重新运行
"exit": 结束运行
GET:
```
http://127.0.0.1:9880/control?command=restart
```
POST:
```json
{
"command": "restart"
}
```
RESP: 无
以及切换GPT和SoVites模型
### 切换GPT模型
endpoint: `/set_gpt_weights`
GET:
```
http://127.0.0.1:9880/set_gpt_weights?weights_path=GPT_SoVITS/pretrained_models/s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
```
RESP:
成功: 返回"success", http code 200
失败: 返回包含错误信息的 json, http code 400
### 切换Sovits模型
endpoint: `/set_sovits_weights`
GET:
```
http://127.0.0.1:9880/set_sovits_weights?weights_path=GPT_SoVITS/pretrained_models/s2G488k.pth
```
RESP:
成功: 返回"success", http code 200
失败: 返回包含错误信息的 json, http code 400
参考自GPT-SoVites官方文档,转载请注明出处。