mirror of
https://gitee.com/infiniflow/ragflow.git
synced 2025-12-06 15:29:03 +08:00
Docs: How to specify an ingestion pipeline when creating a dataset (#11670)
### What problem does this PR solve? ### Type of change - [x] Documentation Update
This commit is contained in:
@@ -419,17 +419,11 @@ Creates a dataset.
|
|||||||
- `"embedding_model"`: `string`
|
- `"embedding_model"`: `string`
|
||||||
- `"permission"`: `string`
|
- `"permission"`: `string`
|
||||||
- `"chunk_method"`: `string`
|
- `"chunk_method"`: `string`
|
||||||
- "parser_config": `object`
|
- `"parser_config"`: `object`
|
||||||
- "parse_type": `int`
|
- `"parse_type"`: `int`
|
||||||
- "pipeline_id": `string`
|
- `"pipeline_id"`: `string`
|
||||||
|
|
||||||
Note: Choose exactly one ingestion mode when creating a dataset.
|
##### A basic request example
|
||||||
- Chunking method: provide `"chunk_method"` (optionally with `"parser_config"`).
|
|
||||||
- Ingestion pipeline: provide both `"parse_type"` and `"pipeline_id"` and do not provide `"chunk_method"`.
|
|
||||||
|
|
||||||
These options are mutually exclusive. If all three of `chunk_method`, `parse_type`, and `pipeline_id` are omitted, the system defaults to `chunk_method = "naive"`.
|
|
||||||
|
|
||||||
##### Request example
|
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl --request POST \
|
curl --request POST \
|
||||||
@@ -441,9 +435,11 @@ curl --request POST \
|
|||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
##### Request example (ingestion pipeline)
|
##### A request example specifying ingestion pipeline
|
||||||
|
|
||||||
Use this form when specifying an ingestion pipeline (do not include `chunk_method`).
|
:::caution WARNING
|
||||||
|
You must *not* include `"chunk_method"` or `"parser_config"` when specifying an ingestion pipeline.
|
||||||
|
:::
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
curl --request POST \
|
curl --request POST \
|
||||||
@@ -452,15 +448,11 @@ curl --request POST \
|
|||||||
--header 'Authorization: Bearer <YOUR_API_KEY>' \
|
--header 'Authorization: Bearer <YOUR_API_KEY>' \
|
||||||
--data '{
|
--data '{
|
||||||
"name": "test-sdk",
|
"name": "test-sdk",
|
||||||
"parse_type": <NUMBER_OF_FORMATS_IN_PARSE>,
|
"parse_type": <NUMBER_OF_PARSERS_IN_YOUR_PARSER_COMPONENT>,
|
||||||
"pipeline_id": "<PIPELINE_ID_32_HEX>"
|
"pipeline_id": "<PIPELINE_ID_32_HEX>"
|
||||||
}'
|
}'
|
||||||
```
|
```
|
||||||
|
|
||||||
Notes:
|
|
||||||
- `parse_type` is an integer. Replace `<NUMBER_OF_FORMATS_IN_PARSE>` with your pipeline's parse-type value.
|
|
||||||
- `pipeline_id` must be a 32-character lowercase hexadecimal string.
|
|
||||||
|
|
||||||
##### Request parameters
|
##### Request parameters
|
||||||
|
|
||||||
- `"name"`: (*Body parameter*), `string`, *Required*
|
- `"name"`: (*Body parameter*), `string`, *Required*
|
||||||
@@ -488,7 +480,8 @@ Notes:
|
|||||||
- `"team"`: All team members can manage the dataset.
|
- `"team"`: All team members can manage the dataset.
|
||||||
|
|
||||||
- `"chunk_method"`: (*Body parameter*), `enum<string>`
|
- `"chunk_method"`: (*Body parameter*), `enum<string>`
|
||||||
The chunking method of the dataset to create. Available options:
|
The default chunk method of the dataset to create. Mutually exclusive with `"parse_type"` and `"pipeline_id"`. If you set `"chunk_method"`, do not include `"parse_type"` or `"pipeline_id"`.
|
||||||
|
Available options:
|
||||||
- `"naive"`: General (default)
|
- `"naive"`: General (default)
|
||||||
- `"book"`: Book
|
- `"book"`: Book
|
||||||
- `"email"`: Email
|
- `"email"`: Email
|
||||||
@@ -501,7 +494,6 @@ Notes:
|
|||||||
- `"qa"`: Q&A
|
- `"qa"`: Q&A
|
||||||
- `"table"`: Table
|
- `"table"`: Table
|
||||||
- `"tag"`: Tag
|
- `"tag"`: Tag
|
||||||
- Mutually exclusive with `parse_type` and `pipeline_id`. If you set `chunk_method`, do not include `parse_type` or `pipeline_id`.
|
|
||||||
|
|
||||||
- `"parser_config"`: (*Body parameter*), `object`
|
- `"parser_config"`: (*Body parameter*), `object`
|
||||||
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
|
The configuration settings for the dataset parser. The attributes in this JSON object vary with the selected `"chunk_method"`:
|
||||||
@@ -538,14 +530,25 @@ Notes:
|
|||||||
- Defaults to: `{"use_raptor": false}`.
|
- Defaults to: `{"use_raptor": false}`.
|
||||||
- If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object.
|
- If `"chunk_method"` is `"table"`, `"picture"`, `"one"`, or `"email"`, `"parser_config"` is an empty JSON object.
|
||||||
|
|
||||||
- "parse_type": (*Body parameter*), `int`
|
- `"parse_type"`: (*Body parameter*), `int`
|
||||||
The ingestion pipeline parse type identifier. Required if and only if you are using an ingestion pipeline (together with `"pipeline_id"`). Must not be provided when `"chunk_method"` is set.
|
The ingestion pipeline parse type identifier, i.e., the number of parsers in your **Parser** component.
|
||||||
|
- Required (along with `"pipeline_id"`) if specifying an ingestion pipeline.
|
||||||
|
- Must not be included when `"chunk_method"` is specified.
|
||||||
|
|
||||||
- "pipeline_id": (*Body parameter*), `string`
|
- `"pipeline_id"`: (*Body parameter*), `string`
|
||||||
The ingestion pipeline ID. Required if and only if you are using an ingestion pipeline (together with `"parse_type"`).
|
The ingestion pipeline ID. Can be found in the corresponding URL in the RAGFlow UI.
|
||||||
- Must not be provided when `"chunk_method"` is set.
|
- Required (along with `"parse_type"`) if specifying an ingestion pipeline.
|
||||||
|
- Must be a 32-character lowercase hexadecimal string, e.g., `"d0bebe30ae2211f0970942010a8e0005"`.
|
||||||
|
- Must not be included when `"chunk_method"` is specified.
|
||||||
|
|
||||||
Note: If none of `chunk_method`, `parse_type`, and `pipeline_id` are provided, the system will default to `chunk_method = "naive"`.
|
:::caution WARNING
|
||||||
|
You can choose either of the following ingestion options when creating a dataset, but *not* both:
|
||||||
|
|
||||||
|
- Use a built-in chunk method -- specify `"chunk_method"` (optionally with `"parser_config"`).
|
||||||
|
- Use an ingestion pipeline -- specify both `"parse_type"` and `"pipeline_id"`.
|
||||||
|
|
||||||
|
If none of `"chunk_method"`, `"parse_type"`, or `"pipeline_id"` are provided, the system defaults to `chunk_method = "naive"`.
|
||||||
|
:::
|
||||||
|
|
||||||
#### Response
|
#### Response
|
||||||
|
|
||||||
|
|||||||
Reference in New Issue
Block a user