大语言模型#
在底层大模型接入中,我们设计了开放的接口,支持对接多种大模型。同时对于接入模型的效果,我们有非常严格的把控与评审机制。对大模型能力上与ChatGPT对比,在准确率上需要满足85%以上的能力对齐。我们用更高的标准筛选模型,是期望在用户使用过程中,可以省去前面繁琐的测试评估环节。
多模型使用#
如果要使用不同的模型,请修改.env配置文件中的LLM MODEL参数以在模型之间切换。
注意:你可以从 .env.template 创建 .env 文件。只需使用如下命令:
cp .env.template .env
LLM_MODEL=vicuna-13b
MODEL_SERVER=http://127.0.0.1:8000
now we support models vicuna-13b, vicuna-7b, chatglm-6b, flan-t5-base, guanaco-33b-merged, falcon-40b, gorilla-7b, llama-2-7b, llama-2-13b.
如果你想使用其他模型,比如chatglm-6b, 仅仅需要修改.env 配置文件
LLM_MODEL=chatglm-6b
or chatglm2-6b, which is the second-generation version of the open-source bilingual (Chinese-English) chat model ChatGLM-6B.
LLM_MODEL=chatglm2-6b
用CPU运行模型#
我们也支持一些小模型,你可以通过CPU/MPS(M1、M2)运行, 模型下载gpt4all
将模型放在models路径, 修改.env 配置文件
LLM_MODEL=gptj-6b
DB-GPT提供了多模型适配器load adapter和chat adapter.load adapter通过继承BaseLLMAdapter类, 实现match和loader方法允许你适配不同的LLM.
vicuna llm load adapter
class VicunaLLMAdapater(BaseLLMAdaper):
"""Vicuna Adapter"""
def match(self, model_path: str):
return "vicuna" in model_path
def loader(self, model_path: str, from_pretrained_kwagrs: dict):
tokenizer = AutoTokenizer.from_pretrained(model_path, use_fast=False)
model = AutoModelForCausalLM.from_pretrained(
model_path, low_cpu_mem_usage=True, **from_pretrained_kwagrs
)
return model, tokenizer
chatglm load adapter
class ChatGLMAdapater(BaseLLMAdaper):
"""LLM Adatpter for THUDM/chatglm-6b"""
def match(self, model_path: str):
return "chatglm" in model_path
def loader(self, model_path: str, from_pretrained_kwargs: dict):
tokenizer = AutoTokenizer.from_pretrained(model_path, trust_remote_code=True)
if DEVICE != "cuda":
model = AutoModel.from_pretrained(
model_path, trust_remote_code=True, **from_pretrained_kwargs
).float()
return model, tokenizer
else:
model = (
AutoModel.from_pretrained(
model_path, trust_remote_code=True, **from_pretrained_kwargs
)
.half()
.cuda()
)
return model, tokenizer
chat adapter通过继承BaseChatAdpter允许你通过实现match和get_generate_stream_func方法允许你适配不同的LLM.
vicuna llm chat adapter
class VicunaChatAdapter(BaseChatAdpter):
"""Model chat Adapter for vicuna"""
def match(self, model_path: str):
return "vicuna" in model_path
def get_generate_stream_func(self):
return generate_stream
chatglm llm chat adapter
class ChatGLMChatAdapter(BaseChatAdpter):
"""Model chat Adapter for ChatGLM"""
def match(self, model_path: str):
return "chatglm" in model_path
def get_generate_stream_func(self):
from pilot.model.llm_out.chatglm_llm import chatglm_generate_stream
return chatglm_generate_stream
如果你想集成自己的模型,只需要继承BaseLLMAdaper和BaseChatAdpter类,然后实现里面的方法即可
Multi Proxy LLMs#
1. Openai proxy#
If you haven’t deployed a private infrastructure for a large model, or if you want to use DB-GPT in a low-cost and high-efficiency way, you can also use OpenAI’s large model as your underlying model.
If your environment deploying DB-GPT has access to OpenAI, then modify the .env configuration file as below will work.
LLM_MODEL=proxyllm
MODEL_SERVER=127.0.0.1:8000
PROXY_API_KEY=sk-xxx
PROXY_SERVER_URL=https://api.openai.com/v1/chat/completions
If you can’t access OpenAI locally but have an OpenAI proxy service, you can configure as follows.
LLM_MODEL=proxyllm
MODEL_SERVER=127.0.0.1:8000
PROXY_API_KEY=sk-xxx
PROXY_SERVER_URL={your-openai-proxy-server/v1/chat/completions}