ragflow

mirror of https://gitee.com/infiniflow/ragflow.git synced 2025-12-06 07:19:03 +08:00

Author	SHA1	Message	Date
天海蒼灆	8de6b97806	Feature (canvas): Add Api for download "message" component output's file (#11772 ) ### What problem does this PR solve? -Add Api for download "message" component output's file -Change the attachment output type check from tuple to dictionary,because 'attachement' is not instance of tuple -Update the message type to message_end to avoid the problem that content does not send an error message when the message type is ans ["data"] ["content"] ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) - [x] New Feature (non-breaking change which adds functionality)	2025-12-05 19:42:35 +08:00
Ted	ad03ede7cd	fix(sdk): add cancel_all_task_of call in stop_parsing endpoint (#11748 ) ## Problem The SDK API endpoint `DELETE /datasets/{dataset_id}/chunks` only updates database status but does not send cancellation signal via Redis, causing background parsing tasks to continue and eventually complete (status becomes DONE instead of CANCEL). ## Root Cause The SDK endpoint was missing the `cancel_all_task_of(id)` call that the web API ([api/apps/document_app.py](cci:7://file:///d:/workspace1/ragflow-admin/api/apps/document_app.py:0:0-0:0)) uses to properly stop background tasks. ## Solution Added `cancel_all_task_of(id)` call in the [stop_parsing](cci:1://file:///d:/workspace1/ragflow/api/apps/sdk/doc.py:785:0-855:23) function to send cancellation signal via Redis, consistent with the web API behavior. ## Related Issue Fixes #11745 Co-authored-by: tedhappy <tedhappy@users.noreply.github.com>	2025-12-04 19:29:06 +08:00
shirukai	fa7b857aa9	fix: resolve "'bool' object has no attribute 'items'" in SDK enabled … (#11725 ) ### What problem does this PR solve? Fixes the `AttributeError: 'bool' object has no attribute 'items'` error when updating the `enabled` parameter of a document via the Python SDK (Issue #11721). Background: When calling `Document.update({"enabled": True/False})` through the SDK, the server-side API returned a boolean `data=True` in the response (instead of a dictionary). The SDK's `_update_from_dict` method (in `base.py`) expects a dictionary to iterate over with `.items()`, leading to an immediate AttributeError during response parsing. This prevented successful synchronization of the updated `enabled` status to the local SDK object, even if the server-side database/update index operations succeeded. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) ### Additional Context (optional, for clarity) - Root Cause: Server returned `data=True` (boolean) for `enabled` parameter updates, violating the SDK's expectation of a dictionary-type `data` field. - Fix Logic: 1. Removed the separate `return get_result(data=True)` in the `enabled` update branch to unify response flow. 2. - Backward Compatibility: No breaking changes—other update scenarios (e.g., renaming documents, modifying chunk methods) remain unaffected, and the response format stays consistent. Co-authored-by: shirukai <shirukai@hollysysdigital.com>	2025-12-04 11:24:01 +08:00
Jin Hai	a7d40e9132	Update since 'File manager' is renamed to 'File' (#11698 ) ### What problem does this PR solve? Update some docs and comments, since 'File manager' is rename to 'File' ### Type of change - [x] Documentation Update - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com> Co-authored-by: writinwaters <93570324+writinwaters@users.noreply.github.com>	2025-12-03 18:32:15 +08:00
hsparks-codes	4870d42949	feat: Auto-disable Raptor for structured data (Issue #11653 ) (#11676 ) ### What problem does this PR solve? Feature: This PR implements automatic Raptor disabling for structured data files to address issue #11653. Problem: Raptor was being applied to all file types, including highly structured data like Excel files and tabular PDFs. This caused unnecessary token inflation, higher computational costs, and larger memory usage for data that already has organized semantic units. Solution: Automatically skip Raptor processing for: - Excel files (.xls, .xlsx, .xlsm, .xlsb) - CSV files (.csv, .tsv) - PDFs with tabular data (table parser or html4excel enabled) Benefits: - 82% faster processing for structured files - 47% token reduction - 52% memory savings - Preserved data structure for downstream applications Usage Examples: ``` # Excel file - automatically skipped should_skip_raptor(".xlsx") # True # CSV file - automatically skipped should_skip_raptor(".csv") # True # Tabular PDF - automatically skipped should_skip_raptor(".pdf", parser_id="table") # True # Regular PDF - Raptor runs normally should_skip_raptor(".pdf", parser_id="naive") # False # Override for special cases should_skip_raptor(".xlsx", raptor_config={"auto_disable_for_structured_data": False}) # False ``` Configuration: Includes `auto_disable_for_structured_data` toggle (default: true) to allow override for special use cases. Testing: 44 comprehensive tests, 100% passing ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-03 17:02:29 +08:00
hsparks-codes	237a66913b	Feat: RAG evaluation (#11674 ) ### What problem does this PR solve? Feature: This PR implements a comprehensive RAG evaluation framework to address issue #11656. Problem: Developers using RAGFlow lack systematic ways to measure RAG accuracy and quality. They cannot objectively answer: 1. Are RAG results truly accurate? 2. How should configurations be adjusted to improve quality? 3. How to maintain and improve RAG performance over time? Solution: This PR adds a complete evaluation system with: - Dataset & test case management - Create ground truth datasets with questions and expected answers - Automated evaluation - Run RAG pipeline on test cases and compute metrics - Comprehensive metrics - Precision, recall, F1 score, MRR, hit rate for retrieval quality - Smart recommendations - Analyze results and suggest specific configuration improvements (e.g., "increase top_k", "enable reranking") - 20+ REST API endpoints - Full CRUD operations for datasets, test cases, and evaluation runs Impact: Enables developers to objectively measure RAG quality, identify issues, and systematically improve their RAG systems through data-driven configuration tuning. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-03 17:00:58 +08:00
Yongteng Lei	e3f40db963	Refa: make RAGFlow more asynchronous 2 (#11689 ) ### What problem does this PR solve? Make RAGFlow more asynchronous 2. #11551, #11579, #11619. ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-12-03 14:19:53 +08:00
buua436	c8f608b2dd	Feat:support tts in agent (#11675 ) ### What problem does this PR solve? change: support tts in agent ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-03 12:03:59 +08:00
Yongteng Lei	5c81e01de5	Fix: incorrect async chat streamly output (#11679 ) ### What problem does this PR solve? Incorrect async chat streamly output. #11677. Disable beartype for #11666. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-03 11:15:45 +08:00
Kevin Hu	a6681d6366	Revert "Refa: make RAGFlow more asynchronous 2" (#11669 ) Reverts infiniflow/ragflow#11664	2025-12-02 19:42:05 +08:00
Yongteng Lei	627c11c429	Refa: make RAGFlow more asynchronous 2 (#11664 ) ### What problem does this PR solve? Make RAGFlow more asynchronous 2. #11551, #11579, #11619. ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Refactoring - [x] Performance Improvement	2025-12-02 18:57:07 +08:00
Kevin Hu	299c655e39	Fix: file manager KB link issue. (#11648 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-02 12:14:27 +08:00
buua436	b8c0fb4572	Feat:new api /sequence2txt and update QWenSeq2txt (#11643 ) ### What problem does this PR solve? change: new api /sequence2txt, update QWenSeq2txt and ZhipuSeq2txt ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-02 11:17:31 +08:00
Kevin Hu	81ae6cf78d	Feat: support uploading in dialog. (#11634 ) ### What problem does this PR solve? #9590 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-01 16:54:57 +08:00
Yongteng Lei	b6c4722687	Refa: make RAGFlow more asynchronous (#11601 ) ### What problem does this PR solve? Try to make this more asynchronous. Verified in chat and agent scenarios, reducing blocking behavior. #11551, #11579. However, the impact of these changes still requires further investigation to ensure everything works as expected. ### Type of change - [x] Refactoring	2025-12-01 14:24:06 +08:00
Kevin Hu	6ea4248bdc	Feat: support parent-child in search procedure. (#11629 ) ### What problem does this PR solve? #7996 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-12-01 14:03:09 +08:00
Kevin Hu	88a28212b3	Fix: Table parse method issue. (#11627 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-12-01 12:42:35 +08:00
dzikus	9a8ce9d3e2	fix: increase Quart RESPONSE_TIMEOUT and BODY_TIMEOUT for slow LLM responses (#11612 ) ### What problem does this PR solve? Quart framework has default RESPONSE_TIMEOUT and BODY_TIMEOUT of 60 seconds. This causes the frontend chat to hang exactly after 60 seconds when using slow LLM backends (e.g., Ollama on CPU, or remote APIs with high latency). This fix adds configurable timeout settings via environment variables with sensible defaults (600 seconds = 10 minutes) to match other timeout configurations in RAGFlow. Fixes issues with chat timeout when: - Using local Ollama on CPU (response time ~2 minutes) - Using remote LLM APIs with high latency - Processing complex RAG queries with many chunks ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Grzegorz Sterniczuk <grzegorz@sternicz.uk>	2025-12-01 11:26:34 +08:00
Billy Bao	fa9b7b259c	Feat: create datasets from http api supports ingestion pipeline (#11597 ) ### What problem does this PR solve? Feat: create datasets from http api supports ingestion pipeline ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-28 19:55:24 +08:00
Kevin Hu	14616cf845	Feat: add child parent chunking method in backend. (#11598 ) ### What problem does this PR solve? #7996 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-28 19:25:32 +08:00
darion-yaphet	918d5a9ff8	[issue-11572]fix:metadata_condition filtering failed (#11573 ) ### What problem does this PR solve? When using the 'metadata_condition' for metadata filtering, if no documents match the filtering criteria, the system will return the search results of all documents instead of returning an empty result. When the metadata_condition has conditions but no matching documents, simply return an empty result. #11572 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue) Co-authored-by: Chenguang Wang <chenguangwang@deepglint.com>	2025-11-28 14:04:14 +08:00
Billy Bao	cf7fdd274b	Feat: add gmail connector (#11549 ) ### What problem does this PR solve? _Briefly describe what this PR aims to solve. Include background context that will help reviewers understand the purpose of the PR._ ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-28 13:09:40 +08:00
Yongteng Lei	9d8b96c1d0	Feat: add context for figure and table (#11547 ) ### What problem does this PR solve? Add context for figure table. ![demo_figure_table_context](https://github.com/user-attachments/assets/61b37fac-e22e-40a4-9665-9396c7b4103e) `==================()` for demonstrating purpose. ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-27 10:21:44 +08:00
天海蒼灆	a9259917c6	fix(files): replace hard coded status codes with constants (#11544 ) ### What problem does this PR solve? To solve the problem of error reporting caused by type errors when various types of exception returns are triggered ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-27 09:41:24 +08:00
Levi	12979a3f21	feat: improve metadata handling in connector service (#11421 ) ### What problem does this PR solve? - Update sync data source to handle metadata properly ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Kevin Hu <kevinhu.sh@gmail.com>	2025-11-26 19:55:48 +08:00
Zhichang Yu	40e84ca41a	Use Infinity single-field-multi-index (#11444 ) ### What problem does this PR solve? Use Infinity single-field-multi-index ### Type of change - [x] Refactoring - [x] Performance Improvement	2025-11-26 11:06:37 +08:00
Kevin Hu	f5faf0c94f	Feat: support operator in/not in for metadata filter. (#11503 ) ### What problem does this PR solve? #11376 #11378 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-25 12:44:26 +08:00
zhipeng	d5f8548200	Allow create super user when start rag server. (#10634 ) ### What problem does this PR solve? New options for rag server scripts to create the super admin user when start server. ### Type of change - [x] New Feature (non-breaking change which adds functionality) --------- Co-authored-by: Zhichang Yu <yuzhichang@gmail.com> Co-authored-by: Jin Hai <haijin.chn@gmail.com>	2025-11-24 19:02:08 +08:00
Billy Bao	1009819801	Fix: coroutine object has no attribute get (#11472 ) ### What problem does this PR solve? Fix: coroutine object has no attribute get ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-24 12:21:33 +08:00
Kevin Hu	249296e417	Feat: API supports toc_enhance. (#11437 ) ### What problem does this PR solve? Close #11433 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-21 14:51:58 +08:00
Kevin Hu	820934fc77	Fix: no result if metadata returns none. (#11412 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-20 19:51:25 +08:00
Kevin Hu	06cef71ba6	Feat: add or logic operations for meta data filters. (#11404 ) ### What problem does this PR solve? #11376 #11387 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-20 14:31:12 +08:00
buua436	7c6d30f4c8	Fix:RagFlow not starting with Postgres DB (#11398 ) ### What problem does this PR solve? issue: #11293 change: RagFlow not starting with Postgres DB ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-20 12:49:13 +08:00
天海蒼灆	9f715d6bc2	Feature (canvas): Add mind tagging support (#11359 ) ### What problem does this PR solve? Resolve the issue of missing thinking labels when viewing pre-existing conversations ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-20 10:11:28 +08:00
Kevin Hu	c43bf1dcf5	Fix: refine error msg. (#11380 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-19 19:10:45 +08:00
Kevin Hu	1c201c4d54	Fix: circle imports issue. (#11374 ) ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-19 16:13:21 +08:00
Jin Hai	d1dcf3b43c	Refactor /stats API (#11363 ) ### What problem does this PR solve? One loop to get better performance ### Type of change - [x] Refactoring Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-19 12:27:45 +08:00
Kevin Hu	d1716d865a	Feat: Alter flask to Quart for async API serving. (#11275 ) ### What problem does this PR solve? #11277 ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-18 17:05:16 +08:00
Yongteng Lei	341e5904c8	Fix: No results can be found through the API /api/v1/dify/retrieval (#11338 ) ### What problem does this PR solve? No results can be found through the API /api/v1/dify/retrieval. #11307 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-18 15:42:31 +08:00
buua436	ded9bf80c5	Fix:limit random sampling range in check_embedding (#11337 ) ### What problem does this PR solve? issue: [#11319](https://github.com/infiniflow/ragflow/issues/11319) change: limit random sampling range in check_embedding ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-18 15:24:27 +08:00
buua436	912b6b023e	fix: update check_embedding failed info (#11321 ) ### What problem does this PR solve? change: update check_embedding failed info ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-18 09:39:45 +08:00
Jin Hai	bd4bc57009	Refactor: move mcp connection utilities to common (#11304 ) ### What problem does this PR solve? As title ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-17 15:34:17 +08:00
Billy Bao	0569b50fed	Fix: create dataset return type inconsistent (#11272 ) ### What problem does this PR solve? Fix: create dataset return type inconsistent #11167 ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-17 15:27:19 +08:00
Scott Davidson	6b64641042	Fix: default model base url extraction logic (#11263 ) ### What problem does this PR solve? Fixes an issue where default models which used the same factory but different base URLs would all be initialised with the default chat model's base URL and would ignore e.g. the embedding model's base URL config. For example, with the following service config, the embedding and reranker models would end up using the base URL for the default chat model (i.e. `llm1.example.com`): ```yaml ragflow: service_conf: user_default_llm: factory: OpenAI-API-Compatible api_key: not-used default_models: chat_model: name: llm1 base_url: https://llm1.example.com/v1 embedding_model: name: llm2 base_url: https://llm2.example.com/v1 rerank_model: name: llm3 base_url: https://llm3.example.com/v1/rerank llm_factories: factory_llm_infos: - name: OpenAI-API-Compatible logo: "" tags: "LLM,TEXT EMBEDDING,SPEECH2TEXT,MODERATION" status: "1" llm: - llm_name: llm1 base_url: 'https://llm1.example.com/v1' api_key: not-used tags: "LLM,CHAT,IMAGE2TEXT" max_tokens: 100000 model_type: chat is_tools: false - llm_name: llm2 base_url: https://llm2.example.com/v1 api_key: not-used tags: "TEXT EMBEDDING" max_tokens: 10000 model_type: embedding - llm_name: llm3 base_url: https://llm3.example.com/v1/rerank api_key: not-used tags: "RERANK,1k" max_tokens: 10000 model_type: rerank ``` ### Type of change - [X] Bug Fix (non-breaking change which fixes an issue)	2025-11-17 14:21:27 +08:00
Jin Hai	61cf430dbb	Minor tweats (#11271 ) ### What problem does this PR solve? As title. ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-16 19:29:20 +08:00
Billy Bao	68e3b33ae4	Feat: extract message output to file (#11251 ) ### What problem does this PR solve? Feat: extract message output to file ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-14 19:52:11 +08:00
Lynn	b5f2cf16bc	Fix: check task executor alive and display status (#11270 ) ### What problem does this PR solve? Correctly check task executor alive and display status. ### Type of change - [x] Bug Fix (non-breaking change which fixes an issue)	2025-11-14 15:52:28 +08:00
buua436	e8f1a245a6	Feat:update check_embedding api (#11254 ) ### What problem does this PR solve? pr: #10854 change: update check_embedding api ### Type of change - [x] New Feature (non-breaking change which adds functionality)	2025-11-13 18:48:25 +08:00
Jin Hai	70a0f081f6	Minor tweaks (#11249 ) ### What problem does this PR solve? Fix some IDE warnings ### Type of change - [x] Refactoring --------- Signed-off-by: Jin Hai <haijin.chn@gmail.com>	2025-11-13 16:11:07 +08:00
buua436	871055b0fc	Feat:support API for generating knowledge graph and raptor (#11229 ) ### What problem does this PR solve? issue: [#11195](https://github.com/infiniflow/ragflow/issues/11195) change: support API for generating knowledge graph and raptor ### Type of change - [x] New Feature (non-breaking change which adds functionality) - [x] Documentation Update	2025-11-13 15:17:52 +08:00

1 2 3 4 5 ...

1201 Commits