大语言模型#

在底层大模型接入中,我们设计了开放的接口,支持对接多种大模型。同时对于接入模型的效果,我们有非常严格的把控与评审机制。对大模型能力上与ChatGPT对比,在准确率上需要满足85%以上的能力对齐。我们用更高的标准筛选模型,是期望在用户使用过程中,可以省去前面繁琐的测试评估环节。

多模型使用#

如果要使用不同的模型,请修改.env配置文件中的LLM MODEL参数以在模型之间切换。

注意:你可以从 .env.template 创建 .env 文件。只需使用如下命令:

cp .env.template .env
LLM_MODEL=vicuna-13b
MODEL_SERVER=http://127.0.0.1:8000

now we support models vicuna-13b, vicuna-7b, chatglm-6b, flan-t5-base, guanaco-33b-merged, falcon-40b, gorilla-7b, llama-2-7b, llama-2-13b.

如果你想使用其他模型,比如chatglm-6b, 仅仅需要修改.env 配置文件

LLM_MODEL=chatglm-6b

or chatglm2-6b, which is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B.

LLM_MODEL=chatglm2-6b

用CPU运行模型#

我们也支持一些小模型,你可以通过CPU/MPS(M1、M2)运行, 模型下载gpt4all

将模型放在models路径, 修改.env 配置文件

LLM_MODEL=gptj-6b

DB-GPT提供了多模型适配器load adapter和chat adapter.load adapter通过继承BaseLLMAdapter类, 实现match和loader方法允许你适配不同的LLM.

vicuna llm load adapter

class VicunaLLMAdapater(BaseLLMAdaper):
    """Vicuna Adapter"""

    def match(self, model_path: str):
        return "vicuna" in model_path

    def loader(self, model_path: str, from_pretrained_kwagrs: dict):
        tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
        model = AutoModelForCausalLM.from_pretrained(
            model_path, low_cpu_mem_usage=True, **from_pretrained_kwagrs
        )
        return model, tokenizer

chatglm load adapter


class ChatGLMAdapater(BaseLLMAdaper):
    """LLM Adatpter for THUDM/chatglm-6b"""

    def match(self, model_path: str):
        return "chatglm" in model_path

    def loader(self, model_path: str, from_pretrained_kwargs: dict):
        tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)

        if DEVICE != "cuda":
            model = AutoModel.from_pretrained(
                model_path, trust_remote_code=True, **from_pretrained_kwargs
            ).float()
            return model, tokenizer
        else:
            model = (
                AutoModel.from_pretrained(
                    model_path, trust_remote_code=True, **from_pretrained_kwargs
                )
                .half()
                .cuda()
            )
            return model, tokenizer

chat adapter通过继承BaseChatAdpter允许你通过实现match和get_generate_stream_func方法允许你适配不同的LLM.

vicuna llm chat adapter

class VicunaChatAdapter(BaseChatAdpter):
 """Model chat Adapter for vicuna"""

    def match(self, model_path: str):
        return "vicuna" in model_path

    def get_generate_stream_func(self):
        return generate_stream

chatglm llm chat adapter

class ChatGLMChatAdapter(BaseChatAdpter):
    """Model chat Adapter for ChatGLM"""

    def match(self, model_path: str):
        return "chatglm" in model_path

    def get_generate_stream_func(self):
        from pilot.model.llm_out.chatglm_llm import chatglm_generate_stream

        return chatglm_generate_stream

如果你想集成自己的模型,只需要继承BaseLLMAdaper和BaseChatAdpter类,然后实现里面的方法即可

Multi Proxy LLMs#

1. Openai proxy#

If you haven’t deployed a private infrastructure for a large model, or if you want to use DB-GPT in a low-cost and high-efficiency way, you can also use OpenAI’s large model as your underlying model.

  • If your environment deploying DB-GPT has access to OpenAI, then modify the .env configuration file as below will work.

LLM_MODEL=proxyllm
MODEL_SERVER=127.0.0.1:8000
PROXY_API_KEY=sk-xxx
PROXY_SERVER_URL=https://api.openai.com/v1/chat/completions
  • If you can’t access OpenAI locally but have an OpenAI proxy service, you can configure as follows.

LLM_MODEL=proxyllm
MODEL_SERVER=127.0.0.1:8000
PROXY_API_KEY=sk-xxx
PROXY_SERVER_URL={your-openai-proxy-server/v1/chat/completions}