Add qianfan support for MCP servers (#17222)

This commit is contained in:
Lin Manhui
2025-11-28 17:34:14 +08:00
committed by GitHub
parent 830b64ef5e
commit 108efb4893
6 changed files with 151 additions and 44 deletions

2
.gitignore vendored
View File

@@ -25,7 +25,7 @@ log/
build/
dist/
paddleocr.egg-info/
*.egg-info/
/deploy/android_demo/app/OpenCV/
/deploy/android_demo/app/PaddleLite/
/deploy/android_demo/app/.cxx/

View File

@@ -18,12 +18,15 @@ This project provides a lightweight [Model Context Protocol (MCP)](https://model
- **Supported Working Modes**
- **Local Python Library**: Runs PaddleOCR pipelines directly on the local machine. This mode requires a suitable local environment and hardware, and is ideal for offline use or privacy-sensitive scenarios.
- **PaddleOCR Official Website Service**: Invokes services provided by the [PaddleOCR Official Website](https://aistudio.baidu.com/paddleocr?lang=en). This is suitable for quick testing, prototyping, or no-code scenarios.
- **Qianfan Platform Service**: Calls the cloud services provided by Baidu AI Cloud's Qianfan large model platform.
- **Self-hosted Service**: Invokes the user's self-hosted PaddleOCR services. This mode offers the advantages of serving and high flexibility. It is suitable for scenarios requiring customized service configurations, as well as those with strict data privacy requirements. **Currently, only the basic serving solution is supported.**
## Examples:
The following showcases creative use cases built with PaddleOCR MCP server combined with other tools:
### Demo 1
In Claude for Desktop, extract handwritten content from images and save to note-taking software Notion. The PaddleOCR MCP server extracts text, formulas and other information from images while preserving document structure.
<div align="center">
<img width="65%" src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/mcp_demo/note_to_notion.gif" alt="note_to_notion">
@@ -35,6 +38,7 @@ In Claude for Desktop, extract handwritten content from images and save to note-
---
### Demo 2
In VSCode, convert handwritten ideas or pseudocode into runnable Python scripts that comply with project coding standards with one click, and upload them to GitHub repositories. The PaddleOCR MCP server extracts explicitly handwritten code from images for subsequent processing.
<div align="center">
@@ -46,15 +50,18 @@ In VSCode, convert handwritten ideas or pseudocode into runnable Python scripts
---
### Demo 3
In Claude for Desktop, convert PDF documents or images containing complex tables, formulas, handwritten text and other content into locally editable files.
#### Demo 3.1
Convert complex PDF documents with tables and watermarks to editable doc/Word format:
<div align="center">
<img width="70%" img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/mcp_demo/pdf_to_file.gif" alt="pdf_to_file">
</div>
#### Demo 3.2
Convert images containing formulas and tables to editable csv/Excel format:
<div align="center">
<img width="70%" img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/00136903a4d0b5f11bd978cb0ef5d3c44f3aa5e9/images/paddleocr/mcp_demo/table_to_excel1.png" alt="table_to_excel1">
@@ -83,15 +90,11 @@ Convert images containing formulas and tables to editable csv/Excel format:
This section explains how to install the `paddleocr-mcp` library via pip.
- For the local Python library mode, you need to install both `paddleocr-mcp` and the PaddlePaddle framework along with PaddleOCR, as per the [PaddleOCR installation documentation](../installation.en.md).
- For the PaddleOCR official website service or the self-hosted service modes, if used within MCP hosts like Claude for Desktop, the server can also be run without installation via tools like `uvx`. See [2. Using with Claude for Desktop](#2-using-with-claude-for-desktop) for details.
For the local Python library mode you may optionally choose convenience extras (helpful to reduce manual dependency steps):
- `paddleocr-mcp[local]`: Includes PaddleOCR (but NOT the PaddlePaddle framework itself).
- `paddleocr-mcp[local-cpu]`: Includes PaddleOCR AND the CPU version of the PaddlePaddle framework.
It is still recommended to use an isolated virtual environment to avoid conflicts.
- For the local Python library mode, in addition to installing `paddleocr-mcp`, you also need to install the PaddlePaddle framework and PaddleOCR by referring to the [PaddleOCR installation guide](../installation.en.md).
- For the local Python library mode, you may also consider installing the corresponding optional dependencies:
- `paddleocr-mcp[local]`: includes PaddleOCR (without the PaddlePaddle framework).
- `paddleocr-mcp[local-cpu]`: based on `local`, additionally includes the CPU version of the PaddlePaddle framework.
- PaddleOCR also supports running the server without installation through methods like `uvx`. For details, please refer to the instructions in [2. Using with Claude for Desktop](#2-using-with-claude-for-desktop).
To install `paddleocr-mcp` using pip:
@@ -99,14 +102,13 @@ To install `paddleocr-mcp` using pip:
# Install from PyPI
pip install -U paddleocr-mcp
# Or install from source
# git clone https://github.com/PaddlePaddle/PaddleOCR.git
# pip install -e mcp_server
# Install from source
git clone https://github.com/PaddlePaddle/PaddleOCR.git
pip install -e mcp_server
# Install with optional extras (choose ONE of the following if you prefer convenience installs)
# Install PaddleOCR together with the MCP server (framework not included):
pip install "paddleocr-mcp[local]"
# Install PaddleOCR and CPU PaddlePaddle framework together:
pip install "paddleocr-mcp[local-cpu]"
```
@@ -262,7 +264,37 @@ Configuration example:
- Do not expose your access token.
#### Mode 3: Self-hosted Service
#### Mode 3: Qianfan Platform Services
1. Install `paddleocr-mcp`.
2. Obtain an API key by referring to the [Qianfan Platform Official Documentation](https://cloud.baidu.com/doc/qianfan-api/s/ym9chdsy5).
3. Modify the `claude_desktop_config.json` file according to the configuration example below. Set `PADDLEOCR_MCP_QIANFAN_API_KEY` to your Qianfan platform API key.
4. Restart the MCP host.
Configuration example:
```json
{
"mcpServers": {
"paddleocr-ocr": {
"command": "paddleocr_mcp",
"args": [],
"env": {
"PADDLEOCR_MCP_PIPELINE": "PaddleOCR-VL",
"PADDLEOCR_MCP_PPOCR_SOURCE": "qianfan",
"PADDLEOCR_MCP_SERVER_URL": "https://qianfan.baidubce.com/v2/ocr",
"PADDLEOCR_MCP_QIANFAN_API_KEY": "<your-api-key>"
}
}
}
}
```
**Note**:
- The Qianfan platform service currently only supports PaddleOCR-VL.
#### Mode 4: Self-hosted Service
1. In the environment where you need to run the PaddleOCR inference server, run the inference server as per the [PaddleOCR serving documentation](./serving.en.md).
2. Install `paddleocr-mcp` where the MCP server will run.
@@ -294,7 +326,7 @@ Configuration example:
### 2.4 Using `uvx`
Currently, for the PaddleOCR official website and self-hosted modes, and (for CPU inference) the local mode, starting the MCP server via `uvx` is also supported. With this approach, manual installation of `paddleocr-mcp` is not required. The main steps are as follows:
PaddleOCR also supports starting the MCP server via `uvx`. With this approach, manual installation of `paddleocr-mcp` is not required. The main steps are as follows:
1. Install [uv](https://docs.astral.sh/uv/#installation).
2. Modify `claude_desktop_config.json`. Examples:
@@ -321,7 +353,7 @@ Currently, for the PaddleOCR official website and self-hosted modes, and (for CP
}
```
Local mode (CPU, using the `local-cpu` extra):
Local mode (inference on CPUs, using the `local-cpu` extra):
```json
{
@@ -342,7 +374,9 @@ Currently, for the PaddleOCR official website and self-hosted modes, and (for CP
}
```
Because a different startup method is used (`uvx` pulls and runs the package on-demand), only `command` and `args` differ from earlier examples; available environment variables and CLI arguments remain identical.
For information on local mode dependencies, performance tuning, and production configuration, please refer to the [3.1 Quick Start](#21-quick-start) section.
Due to the use of a different startup method, the `command` and `args` settings in the configuration file differ from the previously described approach. However, the command-line arguments and environment variables supported by the MCP service (such as `PADDLEOCR_MCP_SERVER_URL`) can still be set in the same way.
## 3. Running the Server
@@ -376,9 +410,9 @@ You can control the MCP server via environment variables or CLI arguments.
| Environment Variable | CLI Argument | Type | Description | Options | Default |
| ------------------------------------- | ------------------------- | ------ | --------------------------------------------------------------------- | ---------------------------------------- | ------------- |
| `PADDLEOCR_MCP_PIPELINE` | `--pipeline` | `str` | Pipeline to run. | `"OCR"`, `"PP-StructureV3"`, `"PaddleOCR-VL"` | `"OCR"` |
| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | Source of PaddleOCR capabilities. | `"local"` (local Python library), `"aistudio"` (PaddleOCR official website service), `"self_hosted"` (self-hosted service) | `"local"` |
| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | Base URL for the underlying service (`aistudio` or `self_hosted` mode only). | - | `None` |
| `PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN` | `--aistudio_access_token` | `str` | AI Studio access token (`aistudio` mode only). | - | `None` |
| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | Source of PaddleOCR capabilities. | `"local"` (local Python library), `"aistudio"` (PaddleOCR official website service), `"qianfan"` (Qianfan platform service), `"self_hosted"` (self-hosted service) | `"local"` |
| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | Base URL for the underlying service (required for `aistudio`, `qianfan`, or `self_hosted` modes). | - | `None` |
| `PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN` | `--aistudio_access_token` | `str` | AI Studio access token (required for `aistudio` mode). | - | `None` |
| `PADDLEOCR_MCP_TIMEOUT` | `--timeout` | `int` | Read timeout for the underlying requests (seconds). | - | `60` |
| `PADDLEOCR_MCP_DEVICE` | `--device` | `str` | Device for inference (`local` mode only). | - | `None` |
| `PADDLEOCR_MCP_PIPELINE_CONFIG` | `--pipeline_config` | `str` | Path to pipeline config file (`local` mode only). | - | `None` |

View File

@@ -18,9 +18,11 @@ comments: true
- **支持运行在如下工作模式**
- **本地 Python 库**:在本机直接运行 PaddleOCR 产线。此模式对本地环境与计算机性能有一定要求,适用于需要离线使用、对数据隐私有严格要求的场景。
- **PaddleOCR 官网服务**:调用 [PaddleOCR 官网](https://aistudio.baidu.com/paddleocr) 提供的云服务。此模式适合快速体验功能、快速验证方案等,也适用于零代码开发场景。
- **千帆平台服务**:调用百度智能云千帆大模型平台提供的云服务。
- **自托管服务**:调用用户自托管的 PaddleOCR 服务。此模式具备服务化部署优势及高度灵活性,适用于需要自定义服务配置的场景,同时也适用于对数据隐私有严格要求的场景。**目前暂时只支持基础服务化部署方案。**
## 示例:
以下展示了使用 PaddleOCR MCP 服务器结合其他工具搭建的创意案例:
### Demo 1
@@ -46,15 +48,18 @@ comments: true
---
### Demo 3
在 Claude for Desktop 中,将含有复杂表格、公式、手写文字等内容的 PDF 文档或图片转存为本地可编辑文件。
#### Demo 3.1
含表格水印复杂文档PDF 转为 doc/Word 可编辑格式:
<div align="center">
<img width="70%" img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/main/images/paddleocr/mcp_demo/pdf_to_file.gif" alt="pdf_to_file">
</div>
#### Demo 3.2
含公式表格图片转为 csv/Excel 可编辑格式:
<div align="center">
<img width="70%" img src="https://raw.githubusercontent.com/cuicheng01/PaddleX_doc_images/00136903a4d0b5f11bd978cb0ef5d3c44f3aa5e9/images/paddleocr/mcp_demo/table_to_excel1.png" alt="table_to_excel1">
@@ -88,8 +93,7 @@ comments: true
- 对于本地 Python 库模式,也可以考虑选择安装相应的可选依赖:
- `paddleocr-mcp[local]`:包含 PaddleOCR不包含飞桨框架
- `paddleocr-mcp[local-cpu]`:在 `local` 基础上额外包含 CPU 版本的飞桨框架。
- 本地模式也支持通过 `uvx` 方式免安装运行服务器(适用于 CPU 推理)。详情请参考 [2.4 使用 `uvx`](#24-使用-uvx) 中的说明。
- 对于 PaddleOCR 官网服务和自托管服务模式,如果希望在 Claude for Desktop 等 MCP 主机中使用,也支持通过 `uvx` 等方式免安装运行服务器。详情请参考 [2. 在 Claude for Desktop 中使用](#2-在-claude-for-desktop-中使用) 中的说明。
- PaddleOCR 也支持通过 `uvx` 方式免安装运行服务器。详情请参考 [2. 在 Claude for Desktop 中使用](#2-在-claude-for-desktop-中使用) 中的说明。
使用 pip 安装 `paddleocr-mcp` 库的命令如下:
@@ -97,14 +101,13 @@ comments: true
# 从 PyPI 安装
pip install -U paddleocr-mcp
# 或者,从项目源码安装
# git clone https://github.com/PaddlePaddle/PaddleOCR.git
# pip install -e mcp_server
# 从项目源码安装
git clone https://github.com/PaddlePaddle/PaddleOCR.git
pip install -e mcp_server
# 通过指定 extra 安装可选依赖
# 同时安装 PaddleOCR
pip install "paddleocr-mcp[local]"
# 同时安装 PaddleOCR 和 CPU 版本飞桨框架
pip install "paddleocr-mcp[local-cpu]"
```
@@ -123,7 +126,7 @@ paddleocr_mcp --help
### 2.1 快速开始
接下来以 **PaddleOCR 官网服务** 工作模式为例,引导您快速上手。此模式无需在本地安装复杂的依赖,因此比较适合用于快速体验。
接下来以 **PaddleOCR 官网服务** 工作模式为例,引导您快速上手。
1. **安装 `paddleocr-mcp`**
@@ -187,6 +190,7 @@ paddleocr_mcp --help
您可以根据需求配置 MCP 服务器,使其运行在不同的工作模式。不同工作模式需要的操作流程有所不同,下面将详细介绍。
#### 模式一:本地 Python 库
1. 安装 `paddleocr-mcp`、飞桨框架和 PaddleOCR。对于飞桨框架和 PaddleOCR可以参考 [PaddleOCR 安装文档](../installation.md) 手动安装,也可以通过 `paddleocr-mcp[local]` 或 `paddleocr-mcp[local-cpu]` 方式于 `paddleocr-mcp` 一同安装。为避免依赖冲突,**强烈建议在独立的虚拟环境中安装**。
2. 参考下方的配置示例更改 `claude_desktop_config.json` 文件内容。
3. 重启 MCP 主机。
@@ -253,7 +257,37 @@ paddleocr_mcp --help
对于文字识别以外的任务,请在 PaddleOCR 官网获取任务对应的服务基础 URL并正确设置 `PADDLEOCR_MCP_PIPELINE` 与 `PADDLEOCR_MCP_SERVER_URL`(参数说明详见第 4 节)。
#### 模式三:自托管服务
#### 模式三:千帆平台服务
1. 安装 `paddleocr-mcp`。
2. 参考 [千帆平台官方文档](https://cloud.baidu.com/doc/qianfan-api/s/ym9chdsy5) 获取 API key。
3. 参考下方的配置示例更改 `claude_desktop_config.json` 文件内容。将 `PADDLEOCR_MCP_QIANFAN_API_KEY` 设置为千帆平台的 API key。
4. 重启 MCP 主机。
配置示例:
```json
{
"mcpServers": {
"paddleocr-ocr": {
"command": "paddleocr_mcp",
"args": [],
"env": {
"PADDLEOCR_MCP_PIPELINE": "PaddleOCR-VL",
"PADDLEOCR_MCP_PPOCR_SOURCE": "qianfan",
"PADDLEOCR_MCP_SERVER_URL": "https://qianfan.baidubce.com/v2/ocr",
"PADDLEOCR_MCP_QIANFAN_API_KEY": "<your-api-key>"
}
}
}
}
```
**说明**
- 千帆平台服务目前仅支持 PaddleOCR-VL。
#### 模式四:自托管服务
1. 在需要运行 PaddleOCR 推理服务器的环境中,参考 [PaddleOCR 服务化部署文档](./serving.md) 运行推理服务器。
2. 在需要运行 MCP 服务器的环境中安装 `paddleocr-mcp`。
@@ -285,12 +319,13 @@ paddleocr_mcp --help
### 2.4 使用 `uvx`
对于 PaddleOCR 官网服务和自托管服务模式,目前也支持通过 `uvx` 启动 MCP 服务器。这种方式不需要手动安装 `paddleocr-mcp`。主要步骤如下:
PaddleOCR 也支持通过 `uvx` 启动 MCP 服务器。这种方式不需要手动安装 `paddleocr-mcp`。主要步骤如下:
1. 安装 [uv](https://docs.astral.sh/uv/#installation)。
2. 修改 `claude_desktop_config.json` 文件。下面给出使用 `uvx` 启动的两种常见模式示例。
2. 修改 `claude_desktop_config.json` 文件。示例如下:
自托管服务模式示例:
```json
{
"mcpServers": {
@@ -311,7 +346,8 @@ paddleocr_mcp --help
}
```
本地模式CPU使用可选依赖 `local-cpu`)示例:
本地模式CPU 推理,使用可选依赖 `local-cpu`)示例:
```json
{
"mcpServers": {
@@ -331,11 +367,9 @@ paddleocr_mcp --help
}
```
需了解本地模式的依赖、性能调优及产线配置,请参 [模式一:本地 Python 库](#模式一本地-python-库) 部分。
需了解本地模式的依赖、性能调优及产线配置,请参 [模式一:本地 Python 库](#模式一本地-python-库) 部分。
由于使用了不一样的启动方式,配置文件中 `command` 和 `args` 的设置都与 [2.1 快速开始](#21-快速开始) 介绍的方式存在显著不同,但 MCP 服务本身支持的命令行参数与环境变量(如 `PADDLEOCR_MCP_SERVER_URL`)仍然可以以相同的方式设置。
说明:仅启动方式不同(通过 `uvx` 拉取并执行),可用的环境变量与命令行参数仍与前文一致。
由于使用了不一样的启动方式,配置文件中 `command` 和 `args` 的设置都与前文介绍的方式存在不同,但 MCP 服务本身支持的命令行参数与环境变量(如 `PADDLEOCR_MCP_SERVER_URL`)仍然可以以相同的方式设置。
## 3. 运行服务器
@@ -369,8 +403,8 @@ paddleocr_mcp --pipeline OCR --ppocr_source self_hosted --server_url http://127.
| 环境变量 | 命令行参数 | 类型 | 描述 | 可选值 | 默认值 |
|:---------|:-----------|:-----|:-----|:-------|:-------|
| `PADDLEOCR_MCP_PIPELINE` | `--pipeline` | `str` | 要运行的产线。 | `"OCR"``"PP-StructureV3"``"PaddleOCR-VL"` | `"OCR"` |
| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | PaddleOCR 能力来源。 | `"local"`(本地 Python 库),`"aistudio"`PaddleOCR 官网服务),`"self_hosted"`(自托管服务) | `"local"` |
| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | 底层服务基础 URL`aistudio``self_hosted` 模式下必需)。 | - | `None` |
| `PADDLEOCR_MCP_PPOCR_SOURCE` | `--ppocr_source` | `str` | PaddleOCR 能力来源。 | `"local"`(本地 Python 库),`"aistudio"`PaddleOCR 官网服务),`"qianfan"`(千帆平台服务),`"self_hosted"`(自托管服务) | `"local"` |
| `PADDLEOCR_MCP_SERVER_URL` | `--server_url` | `str` | 底层服务基础 URL`aistudio`、`qianfan`、`self_hosted` 模式下必需)。 | - | `None` |
| `PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN` | `--aistudio_access_token` | `str` | AI Studio 访问令牌(`aistudio` 模式下必需)。 | - | `None` |
| `PADDLEOCR_MCP_TIMEOUT` | `--timeout` | `int` | 底层服务请求的读取超时时间(秒)。 | - | `60` |
| `PADDLEOCR_MCP_DEVICE` | `--device` | `str` | 指定运行推理的设备(仅在 `local` 模式下生效)。 | - | `None` |

View File

@@ -38,9 +38,9 @@ def _parse_args() -> argparse.Namespace:
)
parser.add_argument(
"--ppocr_source",
choices=["local", "aistudio", "self_hosted"],
choices=["local", "aistudio", "qianfan", "self_hosted"],
default=os.getenv("PADDLEOCR_MCP_PPOCR_SOURCE", "local"),
help="Source of PaddleOCR functionality: local (local library), aistudio (AI Studio service), self_hosted (self-hosted server).",
help="Source of PaddleOCR functionality: local (local library), aistudio (AI Studio service), qianfan (Qianfan service), self_hosted (self-hosted server).",
)
parser.add_argument(
"--http",
@@ -85,6 +85,11 @@ def _parse_args() -> argparse.Namespace:
default=os.getenv("PADDLEOCR_MCP_AISTUDIO_ACCESS_TOKEN"),
help="AI Studio access token (required for AI Studio).",
)
parser.add_argument(
"--qianfan_api_key",
default=os.getenv("PADDLEOCR_MCP_QIANFAN_API_KEY"),
help="Qianfan API key (required for Qianfan).",
)
parser.add_argument(
"--timeout",
type=int,
@@ -105,7 +110,7 @@ def _validate_args(args: argparse.Namespace) -> None:
)
sys.exit(2)
if args.ppocr_source in ["aistudio", "self_hosted"]:
if args.ppocr_source in ["aistudio", "qianfan", "self_hosted"]:
if not args.server_url:
print("Error: The server base URL is required.", file=sys.stderr)
print(
@@ -123,6 +128,21 @@ def _validate_args(args: argparse.Namespace) -> None:
file=sys.stderr,
)
sys.exit(2)
elif args.ppocr_source == "qianfan":
if not args.qianfan_api_key:
print("Error: The Qianfan API key is required.", file=sys.stderr)
print(
"Please either set `--qianfan_api_key` or set the environment variable "
"`PADDLEOCR_MCP_QIANFAN_API_KEY`.",
file=sys.stderr,
)
sys.exit(2)
if args.pipeline not in ("PaddleOCR-VL",):
print(
f"{repr(args.pipeline)} is currently not supported when using the {repr(args.ppocr_source)} source.",
file=sys.stderr,
)
sys.exit(2)
async def async_main() -> None:
@@ -139,6 +159,7 @@ async def async_main() -> None:
device=args.device,
server_url=args.server_url,
aistudio_access_token=args.aistudio_access_token,
qianfan_api_key=args.qianfan_api_key,
timeout=args.timeout,
)
except Exception as e:

View File

@@ -138,6 +138,7 @@ class PipelineHandler(abc.ABC):
device: Optional[str],
server_url: Optional[str],
aistudio_access_token: Optional[str],
qianfan_api_key: Optional[str],
timeout: Optional[int],
) -> None:
"""Initialize the pipeline handler.
@@ -149,12 +150,13 @@ class PipelineHandler(abc.ABC):
device: Device to run inference on.
server_url: Base URL for service mode.
aistudio_access_token: AI Studio access token.
qianfan_api_key: Qianfan API key.
timeout: Read timeout in seconds for HTTP requests.
"""
self._pipeline = pipeline
if ppocr_source == "local":
self._mode = "local"
elif ppocr_source in ("aistudio", "self_hosted"):
elif ppocr_source in ("aistudio", "qianfan", "self_hosted"):
self._mode = "service"
else:
raise ValueError(f"Unknown PaddleOCR source {repr(ppocr_source)}")
@@ -163,6 +165,7 @@ class PipelineHandler(abc.ABC):
self._device = device
self._server_url = server_url
self._aistudio_access_token = aistudio_access_token
self._qianfan_api_key = qianfan_api_key
self._timeout = timeout or 60
if self._mode == "local":
@@ -472,7 +475,9 @@ class SimpleInferencePipelineHandler(PipelineHandler):
raise RuntimeError("Server URL not configured")
endpoint = self._get_service_endpoint()
url = f"{self._server_url.rstrip('/')}/{endpoint.lstrip('/')}"
if endpoint:
endpoint = "/" + endpoint
url = f"{self._server_url.rstrip('/')}{endpoint}"
payload = self._prepare_service_payload(processed_input, file_type, **kwargs)
headers = {"Content-Type": "application/json"}
@@ -481,6 +486,10 @@ class SimpleInferencePipelineHandler(PipelineHandler):
if not self._aistudio_access_token:
raise RuntimeError("Missing AI Studio access token")
headers["Authorization"] = f"token {self._aistudio_access_token}"
elif self._ppocr_source == "qianfan":
if not self._qianfan_api_key:
raise RuntimeError("Missing Qianfan API key")
headers["Authorization"] = f"Bearer {self._qianfan_api_key}"
try:
timeout = httpx.Timeout(
@@ -908,6 +917,15 @@ class PaddleOCRVLHandler(_LayoutParsingHandler):
device=self._device,
)
def _get_service_endpoint(self) -> str:
return "layout-parsing" if self._ppocr_source != "qianfan" else "paddleocr"
def _transform_service_kwargs(self, kwargs: Dict[str, Any]) -> Dict[str, Any]:
kwargs = super()._transform_service_kwargs(kwargs)
if self._ppocr_source == "qianfan":
kwargs["model"] = "paddleocr-vl-0.9b"
return kwargs
_PIPELINE_HANDLERS: Dict[str, Type[PipelineHandler]] = {
"OCR": OCRHandler,

View File

@@ -4,7 +4,7 @@ build-backend = "setuptools.build_meta"
[project]
name = "paddleocr_mcp"
version = "0.3.0"
version = "0.4.0"
requires-python = ">=3.10"
dependencies = [
"mcp>=1.5.0",