学习日志|结合GPT-SoVites与LLM（1） – Sayoung的小站~正在施工中

最近换了电脑，终于可以试试GSV和RVC了。在简单体验一番后，萌生了将其与LLM结合起来的想法。（毕竟谁不想让AI用自己喜爱的虚拟角色的声音说话呢

实现方法也简单，官方已经提供了API，只需要把LLM输出的话用它处理一遍就好了。（当然，考虑到TTS需要消耗一定时间，还需要对句子进行拆分，分段进行流式输出。）

API

以v2版本为例，要想使用GSV的API，首先要运行其根目录下的api_v2.py文件。需要在根目录下打开CMD，然后执行以下命令。

runtime\python api_v2.py -a 127.0.0.1 -p 9880 -c GPT_SoVITS/configs/tts_infer.yaml

-a – 绑定地址, 默认”127.0.0.1″

-p – 绑定端口, 默认9880

-c– TTS配置文件路径, 默认”GPT_SoVITS/configs/tts_infer.yaml”

官方提供的API支持GET/POST方法

#GET:
http://127.0.0.1:9880/tts?text=先帝创业未半而中道崩殂，今天下三分，益州疲弊，此诚危急存亡之秋也。&text_lang=zh&ref_audio_path=archive_jingyuan_1.wav&prompt_lang=zh&prompt_text=我是「罗浮」云骑将军景元。不必拘谨，「将军」只是一时的身份，你称呼我景元便可&text_split_method=cut5&batch_size=1&media_type=wav&streaming_mode=true

#POST
{
    "text": "",                   # str.(required) text to be synthesized
    "text_lang: "",               # str.(required) language of the text to be synthesized
    "ref_audio_path": "",         # str.(required) reference audio path
    "aux_ref_audio_paths": [],    # list.(optional) auxiliary reference audio paths for multi-speaker synthesis
    "prompt_text": "",            # str.(optional) prompt text for the reference audio
    "prompt_lang": "",            # str.(required) language of the prompt text for the reference audio
    "top_k": 5,                   # int. top k sampling
    "top_p": 1,                   # float. top p sampling
    "temperature": 1,             # float. temperature for sampling
    "text_split_method": "cut0",  # str. text split method, see text_segmentation_method.py for details.
    "batch_size": 1,              # int. batch size for inference
    "batch_threshold": 0.75,      # float. threshold for batch splitting.
    "split_bucket": True,          # bool. whether to split the batch into multiple buckets.
    "return_fragment": False,     # bool. step by step return the audio fragment.
    "speed_factor":1.0,           # float. control the speed of the synthesized audio.
    "streaming_mode": False,      # bool. whether to return a streaming response.
    "seed": -1,                   # int. random seed for reproducibility.
    "parallel_infer": True,       # bool. whether to use parallel inference.
    "repetition_penalty": 1.35    # float. repetition penalty for T2S model.
}

示例代码如下（使用默认配置）：

import requests
import json

url = "http://127.0.0.1:9880/tts"

    data = {
        "text": "我在这里，博士。",                   
        "text_lang": "zh",               
        "ref_audio_path": "干员报到.wav",        
        "aux_ref_audio_paths": [],   
        "prompt_text": "星象学者，星极，以近卫干员身份任职，今后就由您差遣了，博士。",      
        "prompt_lang": "zh",            
        "top_k": 5,                   
        "top_p": 1,                   
        "temperature": 1,            
        "text_split_method": "cut5",  
        "batch_size": 1,              
        "batch_threshold": 0.75,      
        "split_bucket": True,          
        "speed_factor":1.0,           
        "fragment_interval":0.3,      
        "seed": -1,                   
        "media_type": "wav",          
        "streaming_mode": False,      
        "parallel_infer": True,       
        "repetition_penalty": 1.35    
    }

    headers = {"Content-Type": "application/json"}

    response = requests.post(url, data=json.dumps(data), headers=headers)

    if response.status_code == 200:
        # 保存生成的音频
        with open("output.wav", "wb") as f:
            f.write(response.content) 
        print("音频生成成功！")
        return response.content
    else:
        print(f"请求失败，状态码：{response.status_code}, 错误信息：{response.text}")

效果如下：

GSV输出的效果

原声

可以听出来，尽管没有对模型进行过微调，其音色却已经比较接近原声。不过美中不足的是其情感上还是有些平淡。我们可以增加一些参考音频来改善这点：

参考音频为”编入队伍.wav”

除此之外，API还提供了一些控制命令

### 命令控制

endpoint: `/control`

command:
"restart": 重新运行
"exit": 结束运行

GET:
```
http://127.0.0.1:9880/control?command=restart
```
POST:
```json
{
    "command": "restart"
}
```

RESP: 无

以及切换GPT和SoVites模型

### 切换GPT模型

endpoint: `/set_gpt_weights`

GET:
```
http://127.0.0.1:9880/set_gpt_weights?weights_path=GPT_SoVITS/pretrained_models/s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
```
RESP: 
成功: 返回"success", http code 200
失败: 返回包含错误信息的 json, http code 400


### 切换Sovits模型

endpoint: `/set_sovits_weights`

GET:
```
http://127.0.0.1:9880/set_sovits_weights?weights_path=GPT_SoVITS/pretrained_models/s2G488k.pth
```

RESP: 
成功: 返回"success", http code 200
失败: 返回包含错误信息的 json, http code 400

参考自GPT-SoVites官方文档，转载请注明出处。