学习日志|结合GPT-SoVites与LLM(1)

最近换了电脑,终于可以试试GSV和RVC了。在简单体验一番后,萌生了将其与LLM结合起来的想法。 (毕竟谁不想让AI用自己喜爱的虚拟角色的声音说话呢

实现方法也简单,官方已经提供了API,只需要把LLM输出的话用它处理一遍就好了。(当然,考虑到TTS需要消耗一定时间,还需要对句子进行拆分,分段进行流式输出。)

API

以v2版本为例,要想使用GSV的API,首先要运行其根目录下的api_v2.py文件。需要在根目录下打开CMD,然后执行以下命令。

runtime\python api_v2.py -a 127.0.0.1 -p 9880 -c GPT_SoVITS/configs/tts_infer.yaml
    • -a – 绑定地址, 默认”127.0.0.1″
    • -p – 绑定端口, 默认9880
    • -c– TTS配置文件路径, 默认”GPT_SoVITS/configs/tts_infer.yaml”

    官方提供的API支持GET/POST方法

    #GET:
    http://127.0.0.1:9880/tts?text=先帝创业未半而中道崩殂,今天下三分,益州疲弊,此诚危急存亡之秋也。&text_lang=zh&ref_audio_path=archive_jingyuan_1.wav&prompt_lang=zh&prompt_text=我是「罗浮」云骑将军景元。不必拘谨,「将军」只是一时的身份,你称呼我景元便可&text_split_method=cut5&batch_size=1&media_type=wav&streaming_mode=true
    
    #POST
    {
        "text": "",                   # str.(required) text to be synthesized
        "text_lang: "",               # str.(required) language of the text to be synthesized
        "ref_audio_path": "",         # str.(required) reference audio path
        "aux_ref_audio_paths": [],    # list.(optional) auxiliary reference audio paths for multi-speaker synthesis
        "prompt_text": "",            # str.(optional) prompt text for the reference audio
        "prompt_lang": "",            # str.(required) language of the prompt text for the reference audio
        "top_k": 5,                   # int. top k sampling
        "top_p": 1,                   # float. top p sampling
        "temperature": 1,             # float. temperature for sampling
        "text_split_method": "cut0",  # str. text split method, see text_segmentation_method.py for details.
        "batch_size": 1,              # int. batch size for inference
        "batch_threshold": 0.75,      # float. threshold for batch splitting.
        "split_bucket": True,          # bool. whether to split the batch into multiple buckets.
        "return_fragment": False,     # bool. step by step return the audio fragment.
        "speed_factor":1.0,           # float. control the speed of the synthesized audio.
        "streaming_mode": False,      # bool. whether to return a streaming response.
        "seed": -1,                   # int. random seed for reproducibility.
        "parallel_infer": True,       # bool. whether to use parallel inference.
        "repetition_penalty": 1.35    # float. repetition penalty for T2S model.
    }
    

    示例代码如下(使用默认配置):

    import requests
    import json
    
    url = "http://127.0.0.1:9880/tts"
    
        data = {
            "text": "我在这里,博士。",                   
            "text_lang": "zh",               
            "ref_audio_path": "干员报到.wav",        
            "aux_ref_audio_paths": [],   
            "prompt_text": "星象学者,星极,以近卫干员身份任职,今后就由您差遣了,博士。",      
            "prompt_lang": "zh",            
            "top_k": 5,                   
            "top_p": 1,                   
            "temperature": 1,            
            "text_split_method": "cut5",  
            "batch_size": 1,              
            "batch_threshold": 0.75,      
            "split_bucket": True,          
            "speed_factor":1.0,           
            "fragment_interval":0.3,      
            "seed": -1,                   
            "media_type": "wav",          
            "streaming_mode": False,      
            "parallel_infer": True,       
            "repetition_penalty": 1.35    
        }
    
        headers = {"Content-Type": "application/json"}
    
        response = requests.post(url, data=json.dumps(data), headers=headers)
    
        if response.status_code == 200:
            # 保存生成的音频
            with open("output.wav", "wb") as f:
                f.write(response.content) 
            print("音频生成成功!")
            return response.content
        else:
            print(f"请求失败,状态码:{response.status_code}, 错误信息:{response.text}")
    

    效果如下:

    GSV输出的效果
    原声

    可以听出来,尽管没有对模型进行过微调,其音色却已经比较接近原声。不过美中不足的是其情感上还是有些平淡。我们可以增加一些参考音频来改善这点:

    参考音频为”编入队伍.wav”

    除此之外,API还提供了一些控制命令

    ### 命令控制
    
    endpoint: `/control`
    
    command:
    "restart": 重新运行
    "exit": 结束运行
    
    GET:
    ```
    http://127.0.0.1:9880/control?command=restart
    ```
    POST:
    ```json
    {
        "command": "restart"
    }
    ```
    
    RESP: 无
    

    以及切换GPT和SoVites模型

    ### 切换GPT模型
    
    endpoint: `/set_gpt_weights`
    
    GET:
    ```
    http://127.0.0.1:9880/set_gpt_weights?weights_path=GPT_SoVITS/pretrained_models/s1bert25hz-2kh-longer-epoch=68e-step=50232.ckpt
    ```
    RESP: 
    成功: 返回"success", http code 200
    失败: 返回包含错误信息的 json, http code 400
    
    
    ### 切换Sovits模型
    
    endpoint: `/set_sovits_weights`
    
    GET:
    ```
    http://127.0.0.1:9880/set_sovits_weights?weights_path=GPT_SoVITS/pretrained_models/s2G488k.pth
    ```
    
    RESP: 
    成功: 返回"success", http code 200
    失败: 返回包含错误信息的 json, http code 400
    

    参考自GPT-SoVites官方文档,转载请注明出处。

    上一篇
    下一篇