2026-04-02 23:12:36 +08:00

43 lines
2.2 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

---
title: GLM3 API 使用指北
createTime: 2026/04/02 20:26:36
---
## FastAPI docs
由FastAPI自动生成的API[说明页面](http://yuany3721.site:6017/docs),由[API部署代码](https://github.com/THUDM/ChatGLM3/blob/main/openai_api_demo/api_server.py)中注释自动生成,基于[OpenAI API](https://platform.openai.com/docs/api-reference/chat)。
API入口
- "/health": 响应API运行状态返回200则运行正常
- "/v1/chat/completions": 响应文本对话请求,可选是否流式输出
- "/v1/embeddings": 响应一组列表式文本对话请求
更多代码说明:
> This script implements an API for the ChatGLM3-6B model,
> formatted similarly to OpenAI's API (https://platform.openai.com/docs/api-reference/chat).
> It's designed to be run as a web server using FastAPI and uvicorn,
> making the ChatGLM3-6B model accessible through OpenAI Client.
>
> Key Components and Features:
>
> - Model and Tokenizer Setup: Configures the model and tokenizer paths and loads them.
> - FastAPI Configuration: Sets up a FastAPI application with CORS middleware for handling cross-origin requests.
> - API Endpoints:
> - "/v1/models": Lists the available models, specifically ChatGLM3-6B.
> - "/v1/chat/completions": Processes chat completion requests with options for streaming and regular responses.
> - "/v1/embeddings": Processes Embedding request of a list of text inputs.
> - Token Limit Caution: In the OpenAI API, 'max_tokens' is equivalent to HuggingFace's 'max_new_tokens', not 'max_length'.
> For instance, setting 'max_tokens' to 8192 for a 6b model would result in an error due to the model's inability to output
> that many tokens after accounting for the history and prompt tokens.
> - Stream Handling and Custom Functions: Manages streaming responses and custom function calls within chat responses.
> - Pydantic Models: Defines structured models for requests and responses, enhancing API documentation and type safety.
> - Main Execution: Initializes the model and tokenizer, and starts the FastAPI app on the designated host and port.
## API调用示例
参考[api-demo](https://github.com/THUDM/ChatGLM3/blob/main/openai_api_demo/openai_api_request.py)
修改第15行`base_url`为目标URL。