CN118797019A

CN118797019A - Task processing method, document dialogue method and document processing method

Info

Publication number: CN118797019A
Application number: CN202411274436.XA
Authority: CN
Inventors: 余海洋; 李永彬; 黄非
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2024-09-11
Filing date: 2024-09-11
Publication date: 2024-10-18
Anticipated expiration: 2044-09-11
Also published as: CN118797019B

Abstract

The embodiment of the application provides a task processing method, a document dialogue method and a document processing method, wherein the task processing method comprises the following steps: acquiring task data aiming at a target document and a document structure diagram of the target document; and inputting task data and a document structure diagram of a target document into a task processing model to obtain a task processing result, wherein the target document comprises a plurality of document blocks, the document structure diagram is constructed based on the association relation among the plurality of document blocks and a hierarchy diagram of the target document, and the hierarchy diagram is constructed by taking the plurality of document blocks as nodes and taking the hierarchy relation among the plurality of document blocks as edges. The target document is represented by utilizing the coarse-granularity hierarchical relationship between the document blocks, and the fine-granularity association relationship between the document blocks is merged, so that the target document is represented as a multi-granularity document structure diagram, more accurate and detailed document knowledge can be provided in the task processing process, and the comprehensiveness and accuracy of the task processing result are improved.

Description

Task processing method, document dialogue method and document processing method

技术领域Technical Field

本申请实施例涉及计算机技术领域，特别涉及任务处理方法、文档对话方法以及文档处理方法。The embodiments of the present application relate to the field of computer technology, and in particular to a task processing method, a document dialogue method, and a document processing method.

背景技术Background Art

随着计算机技术的发展，针对文档的任务处理逐渐成为研究重点。以文档问答为例，文档问答是指一种自然语言处理技术，允许用户以自然语言的形式向文档集提问，并从这些文档中自动抽取相关信息来形成答案。With the development of computer technology, document-based task processing has gradually become a research focus. Take document question answering as an example. Document question answering refers to a natural language processing technology that allows users to ask questions to a document set in natural language and automatically extract relevant information from these documents to form answers.

目前，通常可以将文档切分成多个文档块，针对任一文档块进行任务处理。然而，上述方案中，由于文档块中的文档知识较为局限，导致任务处理全面性和准确性较差。因此，亟需一种全面且准确的任务处理方案。At present, documents can usually be divided into multiple document blocks, and task processing can be performed on any document block. However, in the above scheme, due to the limited document knowledge in the document block, the comprehensiveness and accuracy of task processing are poor. Therefore, a comprehensive and accurate task processing solution is urgently needed.

发明内容Summary of the invention

有鉴于此，本申请实施例提供了一种任务处理方法。本申请一个或者多个实施例同时涉及一种文档对话方法，一种文档处理方法，一种任务处理装置，一种文档对话装置，一种文档处理装置，一种计算设备，一种计算机可读存储介质以及一种计算机程序产品，以解决现有技术中存在的技术缺陷。In view of this, an embodiment of the present application provides a task processing method. One or more embodiments of the present application also relate to a document dialogue method, a document processing method, a task processing device, a document dialogue device, a document processing device, a computing device, a computer-readable storage medium and a computer program product to solve the technical defects existing in the prior art.

根据本申请实施例的第一方面，提供了一种任务处理方法，包括：According to a first aspect of an embodiment of the present application, a task processing method is provided, including:

获取针对目标文档的任务数据和目标文档的文档结构图；Obtaining task data for a target document and a document structure diagram of the target document;

将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到。The task data and the document structure diagram of the target document are input into the task processing model to obtain the task processing result, wherein the target document includes multiple document blocks, and the document structure diagram is constructed based on the association relationship between the multiple document blocks and the hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges.

根据本申请实施例的第二方面，提供了一种文档对话方法，包括：According to a second aspect of an embodiment of the present application, a document dialogue method is provided, comprising:

接收客户端发送的针对目标文档的对话数据；Receive the conversation data for the target document sent by the client;

将对话数据和目标文档的文档结构图输入任务处理模型，获得对话处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到；Inputting the dialog data and the document structure graph of the target document into the task processing model to obtain a dialog processing result, wherein the target document includes multiple document blocks, the document structure graph is constructed based on the association relationship between the multiple document blocks and the hierarchical structure graph of the target document, and the hierarchical structure graph is constructed with the multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges;

将对话处理结果发送至客户端。Send the dialog processing result to the client.

根据本申请实施例的第三方面，提供了一种文档处理方法，包括：According to a third aspect of an embodiment of the present application, a document processing method is provided, including:

获取目标文档的层次结构图，其中，目标文档包括多个文档块，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到；Obtaining a hierarchical structure graph of a target document, wherein the target document includes a plurality of document blocks, and the hierarchical structure graph is constructed with the plurality of document blocks as nodes and the hierarchical relationships between the plurality of document blocks as edges;

将针对目标文档的信息抽取模式和层次结构图输入任务处理模型，获得信息抽取结果；Input the information extraction pattern and hierarchical structure diagram of the target document into the task processing model to obtain the information extraction result;

根据信息抽取结果，确定多个文档块之间的关联关系；According to the information extraction results, determine the association relationship between multiple document blocks;

根据关联关系，更新层次结构图中的边，获得目标文档的文档结构图。According to the association relationship, the edges in the hierarchical graph are updated to obtain the document structure graph of the target document.

根据本申请实施例的第四方面，提供了一种任务处理装置，包括：According to a fourth aspect of an embodiment of the present application, there is provided a task processing device, including:

第一获取模块，被配置为获取针对目标文档的任务数据和目标文档的文档结构图；A first acquisition module is configured to acquire task data for a target document and a document structure diagram of the target document;

第一输入模块，被配置为将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到。The first input module is configured to input task data and a document structure diagram of a target document into a task processing model to obtain a task processing result, wherein the target document includes multiple document blocks, and the document structure diagram is constructed based on the association relationship between the multiple document blocks and a hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with multiple document blocks as nodes and hierarchical relationships between the multiple document blocks as edges.

根据本申请实施例的第五方面，提供了一种文档对话装置，包括：According to a fifth aspect of an embodiment of the present application, a document dialogue device is provided, including:

接收模块，被配置为接收客户端发送的针对目标文档的对话数据；A receiving module, configured to receive the conversation data for the target document sent by the client;

第二输入模块，被配置为将对话数据和目标文档的文档结构图输入任务处理模型，获得对话处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到；a second input module configured to input the dialog data and a document structure graph of a target document into the task processing model to obtain a dialog processing result, wherein the target document includes a plurality of document blocks, the document structure graph is constructed based on association relationships between the plurality of document blocks and a hierarchical structure graph of the target document, and the hierarchical structure graph is constructed with the plurality of document blocks as nodes and the hierarchical relationships between the plurality of document blocks as edges;

发送模块，被配置为将对话处理结果发送至客户端。The sending module is configured to send the dialogue processing result to the client.

根据本申请实施例的第六方面，提供了一种文档处理装置，包括：According to a sixth aspect of an embodiment of the present application, there is provided a document processing device, including:

第二获取模块，被配置为获取目标文档的层次结构图，其中，目标文档包括多个文档块，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到；A second acquisition module is configured to acquire a hierarchical structure graph of a target document, wherein the target document includes a plurality of document blocks, and the hierarchical structure graph is constructed with the plurality of document blocks as nodes and the hierarchical relationships between the plurality of document blocks as edges;

第三输入模块，被配置为将针对目标文档的信息抽取模式和层次结构图输入任务处理模型，获得信息抽取结果；A third input module is configured to input the information extraction pattern and the hierarchical structure diagram for the target document into the task processing model to obtain the information extraction result;

确定模块，被配置为根据信息抽取结果，确定多个文档块之间的关联关系；A determination module is configured to determine the association relationship between the plurality of document blocks according to the information extraction result;

更新模块，被配置为根据关联关系，更新层次结构图中的边，获得目标文档的文档结构图。The updating module is configured to update the edges in the hierarchical graph according to the association relationship to obtain the document structure graph of the target document.

根据本申请实施例的第七方面，提供了一种计算设备，包括：According to a seventh aspect of an embodiment of the present application, there is provided a computing device, including:

存储器和处理器；Memory and processor;

所述存储器用于存储计算机程序/指令，所述处理器用于执行所述计算机程序/指令，该计算机程序/指令被处理器执行时实现上述第一方面或者第二方面或者第三方面所提供方法的步骤。The memory is used to store computer programs/instructions, and the processor is used to execute the computer programs/instructions. When the computer program/instructions are executed by the processor, the steps of the method provided in the first aspect, the second aspect, or the third aspect are implemented.

根据本申请实施例的第八方面，提供了一种计算机可读存储介质，其存储有计算机程序/指令，该计算机程序/指令被处理器执行时实现上述第一方面或者第二方面或者第三方面所提供方法的步骤。According to an eighth aspect of an embodiment of the present application, a computer-readable storage medium is provided, which stores a computer program/instruction. When the computer program/instruction is executed by a processor, the steps of the method provided in the first aspect, the second aspect, or the third aspect are implemented.

根据本申请实施例的第九方面，提供了一种计算机程序产品，包括计算机程序/指令，该计算机程序/指令被处理器执行时实现上述第一方面或者第二方面或者第三方面所提供方法的步骤。According to the ninth aspect of the embodiments of the present application, a computer program product is provided, including a computer program/instruction, which, when executed by a processor, implements the steps of the method provided in the first aspect, the second aspect, or the third aspect.

本申请一个实施例提供的任务处理方法，包括：获取针对目标文档的任务数据和目标文档的文档结构图；将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到。A task processing method provided by an embodiment of the present application includes: obtaining task data for a target document and a document structure diagram of the target document; inputting the task data and the document structure diagram of the target document into a task processing model to obtain a task processing result, wherein the target document includes multiple document blocks, and the document structure diagram is constructed based on the association relationship between the multiple document blocks and a hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1是本申请一个实施例提供的一种任务处理系统的架构图；FIG1 is an architecture diagram of a task processing system provided by one embodiment of the present application;

图2是本申请一个实施例提供的一种任务处理方法的流程图；FIG2 is a flow chart of a task processing method provided by an embodiment of the present application;

图3是本申请一个实施例提供的一种任务处理方法中文档结构图的示意图；FIG3 is a schematic diagram of a document structure diagram in a task processing method provided by an embodiment of the present application;

图4是本申请一个实施例提供的一种任务处理方法中层次结构图的示意图；FIG4 is a schematic diagram of a hierarchical structure diagram in a task processing method provided by an embodiment of the present application;

图5是本申请一个实施例提供的一种文档对话方法的流程图；FIG5 is a flow chart of a document dialogue method provided by an embodiment of the present application;

图6是本申请一个实施例提供的一种文档处理方法的流程图；FIG6 is a flowchart of a document processing method provided by an embodiment of the present application;

图7是本申请一个实施例提供的一种文档处理方法的处理过程流程图；FIG7 is a flowchart of a document processing method provided by an embodiment of the present application;

图8是本申请一个实施例提供的一种任务处理装置的结构示意图；FIG8 is a schematic diagram of the structure of a task processing device provided by an embodiment of the present application;

图9是本申请一个实施例提供的一种文档对话装置的结构示意图；FIG9 is a schematic diagram of the structure of a document dialogue device provided by an embodiment of the present application;

图10是本申请一个实施例提供的一种文档处理装置的结构示意图；FIG10 is a schematic diagram of the structure of a document processing device provided by an embodiment of the present application;

图11是本申请一个实施例提供的一种任务平台的结构示意图；FIG11 is a schematic diagram of the structure of a task platform provided by an embodiment of the present application;

图12是本申请一个实施例提供的一种计算设备的结构框图。FIG. 12 is a structural block diagram of a computing device provided by an embodiment of the present application.

具体实施方式DETAILED DESCRIPTION

在下面的描述中阐述了很多具体细节以便于充分理解本申请。但是本申请能够以很多不同于在此描述的其它方式来实施，本领域技术人员可以在不违背本申请内涵的情况下做类似推广，因此本申请不受下面公开的具体实施的限制。Many specific details are described in the following description to facilitate a full understanding of the present application. However, the present application can be implemented in many other ways than those described herein, and those skilled in the art can make similar generalizations without violating the connotation of the present application, so the present application is not limited by the specific implementation disclosed below.

在本申请一个或多个实施例中使用的术语是仅仅出于描述特定实施例的目的，而非旨在限制本申请一个或多个实施例。在本申请一个或多个实施例和所附权利要求书中所使用的单数形式的“一种”、“所述”和“该”也旨在包括多数形式，除非上下文清楚地表示其他含义。还应当理解，本申请一个或多个实施例中使用的术语“和/或”是指并包含一个或多个相关联的列出项目的任何或所有可能组合。The terms used in one or more embodiments of the present application are only for the purpose of describing specific embodiments, and are not intended to limit one or more embodiments of the present application. The singular forms of "a", "said" and "the" used in one or more embodiments of the present application and the appended claims are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" used in one or more embodiments of the present application refers to and includes any or all possible combinations of one or more associated listed items.

应当理解，尽管在本申请一个或多个实施例中可能采用术语第一、第二等来描述各种信息，但这些信息不应限于这些术语。这些术语仅用来将同一类型的信息彼此区分开。例如，在不脱离本申请一个或多个实施例范围的情况下，第一也可以被称为第二，类似地，第二也可以被称为第一。取决于语境，如在此所使用的词语“如果”可以被解释成为“在……时”或“当……时”或“响应于确定”。It should be understood that, although the terms first, second, etc. may be used to describe various information in one or more embodiments of the present application, these information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of one or more embodiments of the present application, the first may also be referred to as the second, and similarly, the second may also be referred to as the first. Depending on the context, the word "if" as used herein may be interpreted as "at the time of" or "when" or "in response to determining".

此外，需要说明的是，本申请一个或多个实施例所涉及的用户信息（包括但不限于用户设备信息、用户个人信息等）和数据（包括但不限于用于分析的数据、存储的数据、展示的数据等），均为经用户授权或者经过各方充分授权的信息和数据，并且相关数据的收集、使用和处理需要遵守相关国家和地区的相关法律法规和标准，并提供有相应的操作入口，供用户选择授权或者拒绝。In addition, it should be noted that the user information (including but not limited to user device information, user personal information, etc.) and data (including but not limited to data used for analysis, stored data, displayed data, etc.) involved in one or more embodiments of the present application are all information and data authorized by the user or fully authorized by all parties, and the collection, use and processing of relevant data must comply with the relevant laws, regulations and standards of the relevant countries and regions, and provide corresponding operation entrances for users to choose to authorize or refuse.

本申请一个或多个实施例中，大模型是指具有大规模模型参数的深度学习模型，通常包含上亿、上百亿、上千亿、上万亿甚至十万亿以上的模型参数。大模型又可以称为基石模型/基础模型（Foundation Model），通过大规模无标注的语料进行大模型的预训练，产出亿级以上参数的预训练模型，这种模型能适应广泛的下游任务，模型具有较好的泛化能力，例如大规模语言模型（LLM，Large Language Model）、多模态预训练模型（multi-modalpre-training model）等。In one or more embodiments of the present application, a large model refers to a deep learning model with large-scale model parameters, which usually contains hundreds of millions, tens of billions, hundreds of billions, trillions, or even more than 10 trillion model parameters. A large model can also be called a foundation model/foundation model. The large model is pre-trained with large-scale unlabeled corpus to produce a pre-trained model with more than 100 million parameters. This model can adapt to a wide range of downstream tasks, and the model has good generalization ability, such as a large-scale language model (LLM, Large Language Model), a multi-modal pre-training model, etc.

大模型在实际应用时，仅需少量样本对预训练模型进行微调即可应用于不同的任务中，大模型可以广泛应用于自然语言处理（NLP，Natural Language Processing）、计算机视觉等领域，具体可以应用于如视觉问答（VQA，Visual Question Answering）、图像描述（IC，Image Caption）、图像生成等计算机视觉领域任务，以及基于文本的情感分类、文本摘要生成、机器翻译等自然语言处理领域任务，大模型主要的应用场景包括数字助理、智能机器人、搜索、在线教育、办公软件、电子商务、智能设计等。When the big model is used in practice, only a small number of samples are needed to fine-tune the pre-trained model and it can be applied to different tasks. The big model can be widely used in natural language processing (NLP), computer vision and other fields. Specifically, it can be applied to computer vision tasks such as visual question answering (VQA), image description (IC, Image Caption), image generation, as well as natural language processing tasks such as text-based sentiment classification, text summary generation, and machine translation. The main application scenarios of the big model include digital assistants, intelligent robots, search, online education, office software, e-commerce, intelligent design, etc.

首先，对本申请一个或多个实施例涉及的名词术语进行解释。First, the terms involved in one or more embodiments of the present application are explained.

便携式文档格式解析器（PDF-Parser,Portable Document Format-Parser）：是指一种软件工具或程序库，用于解析PDF文件的内容。这种工具可以从PDF文件中提取文本、图像、表格等元素，并将这些信息转换为其他格式，以便进一步处理或分析。Portable Document Format-Parser (PDF-Parser) is a software tool or library used to parse the content of PDF files. This tool can extract text, images, tables and other elements from PDF files and convert this information into other formats for further processing or analysis.

光学字符识别（OCR，Optical Character Recognition）：是一种将图像中的文字转换成可编辑和可搜索的文本的技术。OCR 技术可以应用于扫描件、照片、书籍、报纸等包含印刷或手写文字的图片中。Optical Character Recognition (OCR): is a technology that converts text in an image into editable and searchable text. OCR technology can be applied to scans, photos, books, newspapers and other images that contain printed or handwritten text.

文档分块（Document Chunking）：是一种文本处理技术，是指将较长的文档分割成一系列较小的、更易于管理和处理的片段或块（chunks）。文档分块技术适用于自然语言处理、信息检索、文本挖掘以及大规模文档处理中。Document Chunking: It is a text processing technique that divides a long document into a series of smaller, more manageable and processable fragments or chunks. Document chunking is applicable to natural language processing, information retrieval, text mining, and large-scale document processing.

聚合问答（Aggregated Question Answering）：是一种信息检索和自然语言处理技术，旨在从多个来源收集、整合相关信息来回答用户的问题。在传统的问答系统中，通常是从单一来源提供答案，而聚合问答则试图通过结合不同来源的信息来提供更全面、更准确的答案。Aggregated Question Answering: is an information retrieval and natural language processing technology that aims to collect and integrate relevant information from multiple sources to answer user questions. In traditional question-answering systems, answers are usually provided from a single source, while aggregated question answering attempts to provide more comprehensive and accurate answers by combining information from different sources.

链式推理（CoT，Chain-of-Thought）：是一种在人工智能领域，特别是针对大模型的技术，它旨在提高模型解决需要复杂推理任务的能力。链式推理的核心思想是将复杂的问题分解成一系列更简单的、可管理的步骤，这样模型可以逐步地解决问题，而不是尝试一次性解决整个问题。Chain-of-Thought (CoT) is a technique in the field of artificial intelligence, especially for large models, which aims to improve the model's ability to solve tasks that require complex reasoning. The core idea of chain reasoning is to break down complex problems into a series of simpler, manageable steps, so that the model can solve the problem step by step, rather than trying to solve the entire problem at once.

对于基于文档的任务来说，文档的表示是一个非常重要的前置性工作，目前，通常可以对文档进行解析，通过PDF-Parser或者OCR等版面识别技术，提取里面的文字信息，根据文字信息进行文档分块，将文档切分成多个文档块，针对任一文档块进行任务处理。假设文档有1000个字，由于文档过长，导致无法完整地输入模型进行任务处理，因此，可以将1000个字切成500字的两个文档块，将与任务数据相关的文档块输入模型进行任务处理。然而，上述方案中，对于文档的表示粒度太粗，文档块中的文档知识较为局限，导致任务处理全面性和准确性较差，并且也无法回答比如全文摘要、聚合问答、链式推理等问题。For document-based tasks, document representation is a very important preparatory work. At present, documents can usually be parsed, and the text information inside can be extracted through layout recognition technologies such as PDF-Parser or OCR. The document is divided into multiple document blocks according to the text information, and task processing is performed on any document block. Assuming that the document has 1,000 words, it cannot be fully input into the model for task processing due to its length. Therefore, the 1,000 words can be cut into two document blocks of 500 words, and the document blocks related to the task data can be input into the model for task processing. However, in the above scheme, the granularity of the document representation is too coarse, and the document knowledge in the document block is relatively limited, resulting in poor comprehensiveness and accuracy of task processing, and it is also impossible to answer questions such as full-text summarization, aggregated question answering, and chain reasoning.

为了解决上述问题，本申请实施例通过对文档进行多粒度地挖掘，形成多粒度文档结构图的文档知识表示形态，提升了无结构化多模态文档的表示能力，从而达到在细粒度和粗粒度任务处理上都有一个较好效果，使得任务处理更加全面，任务处理结果更加准确。具体地，获取针对目标文档的任务数据和目标文档的文档结构图；将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到。In order to solve the above problems, the embodiment of the present application improves the representation ability of unstructured multimodal documents by mining the documents at multiple granularities to form a document knowledge representation form of a multi-granularity document structure graph, thereby achieving a better effect in both fine-grained and coarse-grained task processing, making the task processing more comprehensive and the task processing results more accurate. Specifically, the task data for the target document and the document structure graph of the target document are obtained; the task data and the document structure graph of the target document are input into the task processing model to obtain the task processing results, wherein the target document includes multiple document blocks, and the document structure graph is constructed based on the association relationship between the multiple document blocks and the hierarchical structure graph of the target document, and the hierarchical structure graph is constructed with multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges.

在本申请中，提供了一种任务处理方法，本申请同时涉及一种文档对话方法，一种文档处理方法，一种任务处理装置，一种文档对话装置，一种文档处理装置，一种计算设备，一种计算机可读存储介质以及一种计算机程序产品，在下面的实施例中逐一进行详细说明。In the present application, a task processing method is provided. The present application also relates to a document dialogue method, a document processing method, a task processing device, a document dialogue device, a document processing device, a computing device, a computer-readable storage medium and a computer program product, which are described in detail one by one in the following embodiments.

考虑到任务处理模型的模型参数量较为庞大，且客户端的运算资源有限，本申请实施例提出的任务处理方法可以应用于如图1所示的任务处理系统，但不仅限于此。参见图1，图1示出了本申请一个实施例提供的一种任务处理系统的架构图，任务处理系统可以包括客户端100和服务端200；Considering that the model parameters of the task processing model are relatively large and the computing resources of the client are limited, the task processing method proposed in the embodiment of the present application can be applied to the task processing system shown in Figure 1, but is not limited thereto. Referring to Figure 1, Figure 1 shows an architecture diagram of a task processing system provided by an embodiment of the present application, and the task processing system may include a client 100 and a server 200;

客户端100，用于向服务端200发送针对目标文档的任务数据；The client 100 is used to send task data for a target document to the server 200;

服务端200，用于将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到；向客户端100发送任务处理结果；The server 200 is used to input the task data and the document structure diagram of the target document into the task processing model to obtain the task processing result, wherein the target document includes multiple document blocks, the document structure diagram is constructed based on the association relationship between the multiple document blocks and the hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with the multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges; and send the task processing result to the client 100;

客户端100，还用于接收服务端200发送的任务处理结果。The client 100 is also used to receive the task processing result sent by the server 200.

如图1所示，任务处理模型部署在服务端200中，服务端200可以通过局域网连接、广域网连接、因特网连接，或者其他类型的数据网络，连接一个或多个客户端100。客户端100可以包括但不限于：智能手机、平板电脑、笔记本电脑、掌上电脑、个人计算机、智能家居设备、车载设备等。客户端100还可以通过图形用户界面与用户进行交互，实现对任务处理模型的调用，进而实现本申请实施例所提供的任务处理方法。As shown in FIG1 , the task processing model is deployed in the server 200, and the server 200 can be connected to one or more clients 100 through a local area network connection, a wide area network connection, an Internet connection, or other types of data networks. The client 100 may include, but is not limited to: a smart phone, a tablet computer, a laptop computer, a PDA, a personal computer, a smart home device, a vehicle-mounted device, etc. The client 100 can also interact with the user through a graphical user interface to implement the call of the task processing model, thereby implementing the task processing method provided in the embodiment of the present application.

值得说明的是，本申请实施例中提供的任务处理方法一般由服务端执行，但是，在本申请的其它实施例中，在客户端的运行资源可以满足任务处理模型的部署和运行条件的情况下，客户端也可以与服务端具有相似的功能，从而执行本申请实施例所提供的任务处理方法。在其它实施例中，本申请实施例所提供的任务处理方法还可以由客户端与服务端共同执行。接下来，以服务端执行本申请实施例所提出的任务处理方法为例，对任务处理方法的实现过程进行详细说明。It is worth noting that the task processing method provided in the embodiment of the present application is generally executed by the server, but in other embodiments of the present application, when the operating resources of the client can meet the deployment and operating conditions of the task processing model, the client can also have similar functions as the server, thereby executing the task processing method provided in the embodiment of the present application. In other embodiments, the task processing method provided in the embodiment of the present application can also be jointly executed by the client and the server. Next, taking the execution of the task processing method proposed in the embodiment of the present application by the server as an example, the implementation process of the task processing method is described in detail.

参见图2，图2示出了本申请一个实施例提供的一种任务处理方法的流程图，具体包括以下步骤：Referring to FIG. 2 , FIG. 2 shows a flowchart of a task processing method provided by an embodiment of the present application, which specifically includes the following steps:

步骤202：获取针对目标文档的任务数据和目标文档的文档结构图。Step 202: Obtain task data for a target document and a document structure diagram of the target document.

本申请一个或多个实施例中，任务处理时，可以获取针对目标文档的任务数据，从而基于任务数据进行任务处理，获得任务处理结果。In one or more embodiments of the present application, during task processing, task data for a target document may be acquired, thereby performing task processing based on the task data to obtain a task processing result.

需要说明的是，目标文档可以是不同场景中的文档，如金融场景中的交易文档、教育场景中的知识文档、法律场景中的法律文档。目标文档可以是不同形式的文档，文档形式如PDF、超文本标记语言（HTML，HyperText Markup Language）、Word文档、表格文档等等。目标文档可以是不同语言的文档，如英文文档、中文文档等等。本申请一种可选的实施例中，若目标文档为非中文文档，可以对目标文档进行语言转换，获得中文文档。目标文档可以是单模态文档，也可以是多模态文档，其中，多模态文档包括文本内容、图像内容、表格内容中的任一种。多模态文档包括文本内容、图像内容、表格内容中的至少两种。任务数据是指执行与目标文档相关的目标任务时所涉及的数据和信息。任务数据包括但不限于目标文档的文档内容、文档标识以及处理目标文档过程中的各种参数和设置。目标任务可以是针对目标文档的不同类型的任务，如聚合问答任务、对话任务、摘要提取任务、链式推理任务、文档分析任务、文档写作任务等等。It should be noted that the target document may be a document in different scenarios, such as a transaction document in a financial scenario, a knowledge document in an educational scenario, and a legal document in a legal scenario. The target document may be a document in different forms, such as PDF, HyperText Markup Language (HTML), Word document, table document, etc. The target document may be a document in different languages, such as an English document, a Chinese document, etc. In an optional embodiment of the present application, if the target document is a non-Chinese document, the target document may be converted into a Chinese document. The target document may be a unimodal document or a multimodal document, wherein the multimodal document includes any one of text content, image content, and table content. The multimodal document includes at least two of text content, image content, and table content. Task data refers to the data and information involved in performing a target task related to a target document. Task data includes, but is not limited to, the document content of the target document, the document identifier, and various parameters and settings in the process of processing the target document. The target task may be a task of different types for the target document, such as an aggregated question-and-answer task, a dialogue task, a summary extraction task, a chain reasoning task, a document analysis task, a document writing task, and the like.

实际应用中，获取针对目标文档的任务数据的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请一种可能的实现方式中，可以接收客户端发送的针对目标文档的任务数据。本申请另一种可能的实现方式中，可以从其他数据库或数据获取设备中读取针对目标文档的任务数据。In practical applications, there are many ways to obtain task data for a target document, which can be selected based on actual conditions, and the embodiments of the present application do not impose any restrictions on this. In one possible implementation of the present application, task data for a target document sent by a client can be received. In another possible implementation of the present application, task data for a target document can be read from other databases or data acquisition devices.

步骤204：将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到。Step 204: Input the task data and the document structure diagram of the target document into the task processing model to obtain the task processing result, wherein the target document includes multiple document blocks, and the document structure diagram is constructed based on the association relationship between the multiple document blocks and the hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges.

需要说明的是，文档结构图（DocGraph）用于表示目标文档的文档结构和文档内容。文档结构图通过图形数据结构来捕捉目标文档中元素之间的关系。文档结构图由节点（nodes）和边（edges）组成，其中，节点代表目标文档的不同组成部分（如段落、句子、词汇等），边则表示节点之间的连接关系。文档结构图可以展示目标文档中的结构化信息，有助于更深入地理解目标文档的内容和组织结构。通过文档结构图，可以更容易地执行各种目标任务。文档块包括但不限于单词、短语、句子、段落。关联关系通常是指两个或多个文档块之间存在的联系。关联关系包括但不限于语法关系（如主谓关系）、语义关系（如因果关系）、层级关系（如章节包含段落）、实体关系（如同义关系）。It should be noted that the document structure graph (DocGraph) is used to represent the document structure and document content of the target document. The document structure graph captures the relationship between elements in the target document through a graph data structure. The document structure graph consists of nodes and edges, where nodes represent different components of the target document (such as paragraphs, sentences, words, etc.), and edges represent the connection relationship between nodes. The document structure graph can display the structured information in the target document, which helps to have a deeper understanding of the content and organizational structure of the target document. Through the document structure graph, various target tasks can be performed more easily. Document blocks include but are not limited to words, phrases, sentences, and paragraphs. An association relationship generally refers to the connection between two or more document blocks. Association relationships include but are not limited to grammatical relationships (such as subject-predicate relationships), semantic relationships (such as causal relationships), hierarchical relationships (such as chapters containing paragraphs), and entity relationships (such as synonymous relationships).

任务处理模型是指可以根据给定的任务数据和文档结构图来处理目标任务，并生成相应处理结果的深度学习模型。任务处理模型可以是大模型，也可以是基于样本任务数据、样本文档结构和样本处理结果训练得到的模型。任务处理结果与目标任务的任务类型相关，若目标任务为摘要提取任务，则任务处理结果为目标文档的文档摘要，若目标任务为文档问答任务，则任务处理结果为文档答复结果。文档块是指目标文档中的一个逻辑单元或组成部分。文档块包括文本内容、图像内容和表格内容中的至少一种。文档块之间的关联关系是指文档块之间存在语义相关、实体相关等等。The task processing model refers to a deep learning model that can process the target task according to the given task data and document structure diagram and generate the corresponding processing results. The task processing model can be a large model or a model trained based on sample task data, sample document structure and sample processing results. The task processing result is related to the task type of the target task. If the target task is a summary extraction task, the task processing result is the document summary of the target document. If the target task is a document question-and-answer task, the task processing result is the document reply result. A document block refers to a logical unit or component in a target document. A document block includes at least one of text content, image content and table content. The association relationship between document blocks refers to the existence of semantic correlation, entity correlation, etc. between document blocks.

目标文档的层次结构图用于表示目标文档的层次结构和文档内容，层次结构图可以称为层次结构树。层次结构树是指用来描述文档内部结构的一种树形数据结构。这种结构树以树的形式组织文档的内容，每个节点代表文档中的一个部分或元素，节点之间的关系反映了文档内容之间的层级关系。层级关系包括父子关系和兄弟关系。若一个父节点有多个子节点，而这些子节点只有这一个父节点，则父节点和子节点之间的关系可以称为父子关系。若同一层级的节点拥有同一个父节点，则同一层级的节点之间的关系可以称为兄弟关系。层次结构树包括根节点、子节点和叶子节点，根节点通常代表目标文档。子节点是目标文档的子部分，如文档块。叶子节点是指没有子节点的节点，通常为目标文档的最小分割单元。例如，目标文档中，根节点为整个目标文档，子节点是目标文档的各章，每一章下面又有各个小节，以此类推。The hierarchical structure diagram of the target document is used to represent the hierarchical structure and document content of the target document. The hierarchical structure diagram can be called a hierarchical structure tree. A hierarchical structure tree refers to a tree data structure used to describe the internal structure of a document. This structure tree organizes the content of the document in the form of a tree. Each node represents a part or element in the document, and the relationship between the nodes reflects the hierarchical relationship between the document content. Hierarchical relationships include parent-child relationships and sibling relationships. If a parent node has multiple child nodes, and these child nodes have only one parent node, the relationship between the parent node and the child node can be called a parent-child relationship. If nodes at the same level have the same parent node, the relationship between nodes at the same level can be called a sibling relationship. The hierarchical structure tree includes a root node, child nodes, and leaf nodes. The root node usually represents the target document. The child node is a sub-part of the target document, such as a document block. A leaf node refers to a node without child nodes, which is usually the smallest segmentation unit of the target document. For example, in the target document, the root node is the entire target document, the child nodes are the chapters of the target document, and each chapter has subsections, and so on.

实际应用中，将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请第一种可能的实现方式中，可以直接将任务数据和目标文档的文档结构图输入任务处理模型中，获得任务处理模型输出的任务处理结果。本申请第二种可能的实现方式中，可以将任务数据、目标文档的文档结构图和处理提示信息（prompt）输入任务处理模型，获得任务处理模型输出的任务处理结果，其中，处理提示信息用于引导任务处理模型基于任务处理数据和文档结构图生成任务处理结果。本申请第三种可能的实现方式中，可以分别对任务数据和目标文档的文档结构图进行特征提取，将提取到的特征输入任务处理模型，获得任务处理结果。In actual applications, there are many ways to input the task data and the document structure diagram of the target document into the task processing model to obtain the task processing results. The specific selection depends on the actual situation. The embodiments of the present application do not impose any restrictions on this. In the first possible implementation method of the present application, the task data and the document structure diagram of the target document can be directly input into the task processing model to obtain the task processing results output by the task processing model. In the second possible implementation method of the present application, the task data, the document structure diagram of the target document and the processing prompt information (prompt) can be input into the task processing model to obtain the task processing results output by the task processing model, wherein the processing prompt information is used to guide the task processing model to generate the task processing results based on the task processing data and the document structure diagram. In the third possible implementation method of the present application, feature extraction can be performed on the task data and the document structure diagram of the target document respectively, and the extracted features can be input into the task processing model to obtain the task processing results.

应用本申请实施例的方案，通过利用多个文档块和多个文档块之间的层级关系表示目标文档，挖掘了目标文档内部的粗粒度信息，并且，融入多个文档块之间的关联关系，进一步挖掘了目标文档内部的文本内容、图像内容和表格内容等细粒度信息，实现将目标文档表示为多粒度的文档结构图，从而在任务处理过程中，可以提供更加准确、详细的文档知识，提高了任务处理结果的全面性和准确性。By applying the solution of the embodiment of the present application, the target document is represented by utilizing multiple document blocks and the hierarchical relationship between the multiple document blocks, the coarse-grained information within the target document is mined, and the association relationship between the multiple document blocks is integrated to further mine the fine-grained information such as the text content, image content and table content within the target document, thereby representing the target document as a multi-granular document structure diagram, thereby providing more accurate and detailed document knowledge during the task processing process, thereby improving the comprehensiveness and accuracy of the task processing results.

本申请一种可选的实施例中，上述将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果，可以包括以下步骤：In an optional embodiment of the present application, the above-mentioned inputting the task data and the document structure diagram of the target document into the task processing model to obtain the task processing result may include the following steps:

对任务数据进行特征提取，获得任务数据特征；Extract features from task data to obtain task data features;

对文档结构图进行特征提取，获得文档结构特征；Extract features from the document structure graph to obtain document structure features;

将任务数据特征和文档结构特征输入任务处理模型，获得任务处理结果。The task data features and document structure features are input into the task processing model to obtain the task processing results.

需要说明的是，特征提取是指从原始数据中选择和转换出一组特征的过程，这些特征可以有效地代表原始数据中的关键信息。任务数据特征是指从任务数据中提取出的有助于完成目标任务的特征。文档结构特征是指从文档结构图中提取出的有助于完成目标任务的特征。It should be noted that feature extraction refers to the process of selecting and converting a set of features from raw data, which can effectively represent the key information in the raw data. Task data features refer to features extracted from task data that help complete the target task. Document structure features refer to features extracted from the document structure diagram that help complete the target task.

实际应用中，对任务数据进行特征提取，获得任务数据特征的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请一种可能的实现方式中，可以将任务数据输入词袋模型（BoW，Bag of Words）中，获得任务数据特征。本申请另一种可能的实现方式中，可以利用词频-逆文档频率（TF-IDF，Term Frequency-Inverse DocumentFrequency）对任务数据进行特征提取，获得任务数据特征。In practical applications, there are many ways to extract features from task data and obtain task data features. The specific method is selected according to the actual situation, and the embodiments of the present application do not impose any restrictions on this. In one possible implementation of the present application, the task data can be input into a bag of words model (BoW) to obtain task data features. In another possible implementation of the present application, the term frequency-inverse document frequency (TF-IDF) can be used to extract features from task data to obtain task data features.

进一步地，对文档结构图进行特征提取，获得文档结构特征的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请一种可能的实现方式中，可以将文档结构图输入图神经网络（GNN，Graph Neural Network）中，获得文档结构特征。本申请另一种可能的实现方式中，可以将文档结构图输入长短期记忆网络（LSTM，LongShort-Term Memory）中，获得文档结构特征。Furthermore, there are many ways to extract features from the document structure graph and obtain document structure features, which can be selected based on actual conditions, and the embodiments of the present application do not impose any restrictions on this. In one possible implementation of the present application, the document structure graph can be input into a graph neural network (GNN) to obtain document structure features. In another possible implementation of the present application, the document structure graph can be input into a long short-term memory network (LSTM) to obtain document structure features.

应用本申请实施例的方案，在任务处理模型之外对任务数据和文档结构图进行特征提取，实现了任务处理模型的输入为任务数据特征和文档结构特征，使得任务处理模型的适用范围更广。By applying the solution of the embodiment of the present application, feature extraction is performed on the task data and document structure diagram outside the task processing model, so that the input of the task processing model is task data features and document structure features, making the application scope of the task processing model wider.

本申请一种可选的实施例中，将任务数据和目标文档的文档结构图输入任务处理模型之前，可以获取目标文档的文档结构图。获取目标文档的文档结构图的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请一种可能的实现方式中，可以从其他数据获取设备或数据库中读取目标文档的文档结构图。本申请另一种可能的实现方式中，可以获取目标文档的层次结构图，基于层次结构图构建文档结构图。也即，上述将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果之前，还可以包括以下步骤：In an optional embodiment of the present application, before inputting the task data and the document structure diagram of the target document into the task processing model, the document structure diagram of the target document can be obtained. There are many ways to obtain the document structure diagram of the target document, which can be selected according to the actual situation, and the embodiment of the present application does not impose any restrictions on this. In a possible implementation method of the present application, the document structure diagram of the target document can be read from other data acquisition devices or databases. In another possible implementation method of the present application, the hierarchical structure diagram of the target document can be obtained, and the document structure diagram can be constructed based on the hierarchical structure diagram. That is, the above-mentioned inputting the task data and the document structure diagram of the target document into the task processing model, before obtaining the task processing result, can also include the following steps:

获取目标文档的层次结构图；Get the hierarchical structure diagram of the target document;

将针对目标文档的信息抽取模式和层次结构图输入信息抽取模型，获得信息抽取结果；Input the information extraction pattern and hierarchical structure diagram of the target document into the information extraction model to obtain the information extraction result;

需要说明的是，针对目标文档的信息抽取模式（schema）是指定义了从目标文档中抽取哪些信息以及这些信息应该如何组织的一种结构化描述。信息抽取模式通常包含要抽取的关键实体、属性以及它们之间的关系。信息抽取模型用于利用目标文档的信息抽取模式从目标文档的文档结构图中抽取出信息抽取结果。信息抽取模型可以是预训练大模型，也可以是基于样本信息抽取模型、样本层次结果图和样本抽取结果训练得到的神经网络模型。信息抽取结果可以是文本形式，也可以是表格形式，具体根据实际情况进行选择，本申请实施例对此不做任何限定。It should be noted that the information extraction schema for the target document refers to a structured description that defines what information to extract from the target document and how the information should be organized. The information extraction schema usually contains the key entities to be extracted, attributes, and the relationships between them. The information extraction model is used to extract the information extraction results from the document structure diagram of the target document using the information extraction schema of the target document. The information extraction model can be a pre-trained large model, or it can be a neural network model trained based on a sample information extraction model, a sample hierarchical result diagram, and sample extraction results. The information extraction results can be in text form or in tabular form, which is selected according to the actual situation, and the embodiments of the present application do not impose any restrictions on this.

示例性地，假设目标文档为东坡牛肉烹饪教学文档，信息抽取模式是材料、做法、厨师一点通，将目标文档的信息抽取模式和层次结构图输入信息抽取模型，获得信息抽取结果为“材料：牛肉、胡萝卜、白萝卜、大料和香叶；做法：牛肉洗净放入清水中煮至七成熟，捞出切成方块...；厨师一点通：炖牛肉时可以放几个山楂进去，这样牛肉可以烂的快些，而且有股山楂的清香”。For example, assuming that the target document is a Dongpo beef cooking instruction document, the information extraction pattern is ingredients, cooking method, and chef tips. The information extraction pattern and hierarchical structure diagram of the target document are input into the information extraction model, and the information extraction result is obtained as "Ingredients: beef, carrots, white radish, aniseed and bay leaves; Cooking method: Wash the beef and put it in clean water and cook until it is 70% done, then take it out and cut it into cubes...; Chef tips: Put a few hawthorns in the stewed beef, so that the beef can be cooked faster and have a fresh hawthorn fragrance."

实际应用中，将针对目标文档的信息抽取模式和层次结构图输入信息抽取模型时，可以直接将信息抽取模式和层次结构图输入信息抽取模型，还可以额外在信息抽取模型的输入中添加抽取提示信息，也即将信息抽取模式、层次结构图和抽取提示信息输入信息抽取模型，获得信息抽取结果。In actual applications, when inputting the information extraction pattern and hierarchical structure diagram for the target document into the information extraction model, the information extraction pattern and hierarchical structure diagram can be directly input into the information extraction model, and extraction hint information can also be additionally added to the input of the information extraction model, that is, the information extraction pattern, hierarchical structure diagram and extraction hint information are input into the information extraction model to obtain the information extraction result.

进一步地，根据信息抽取结果，确定多个文档块之间的关联关系时，首先，可以确定信息抽取模式相同的信息抽取结果，接着确定这些信息抽取结果所属的文档块，由此，可以确定这些信息抽取结果所属的文档块具有相同的信息抽取模式，也即这些信息抽取结果所属的文档块具有关联关系。例如，信息抽取结果牛肉和白萝卜具有相同的信息抽取模式，而牛肉属于文档块1，白萝卜属于文档块3，则确定文档块1和文档块3具有关联关系。确定关联关系之后，可以在层次结构图中存在关联关系的文档块中添加边，获得目标文档的文档结构图。Furthermore, when determining the association relationship between multiple document blocks based on the information extraction results, first, the information extraction results with the same information extraction mode can be determined, and then the document blocks to which these information extraction results belong can be determined. Thus, it can be determined that the document blocks to which these information extraction results belong have the same information extraction mode, that is, the document blocks to which these information extraction results belong have an association relationship. For example, the information extraction results beef and white radish have the same information extraction mode, and beef belongs to document block 1, and white radish belongs to document block 3, then it is determined that document block 1 and document block 3 have an association relationship. After determining the association relationship, edges can be added to the document blocks with an association relationship in the hierarchical graph to obtain the document structure graph of the target document.

应用本申请实施例的方案，利用信息抽取模式从层次结构图中抽取出信息抽取结果，并利用信息抽取结果确定层次结构图中多个文档块之间的细粒度关系，使得文档结构图可以提供更加准确、详细的文档知识。By applying the solution of the embodiment of the present application, the information extraction pattern is used to extract information extraction results from the hierarchical diagram, and the information extraction results are used to determine the fine-grained relationship between multiple document blocks in the hierarchical diagram, so that the document structure diagram can provide more accurate and detailed document knowledge.

参见图3，图3示出了本申请一个实施例提供的一种任务处理方法中文档结构图的示意图。如图3所示，文档结构图中包括五种类型的节点，分别为文档节点、文档块节点、表格节点、图像节点和文本节点。图3中的实线单箭头表示相连的节点之间存在层级关系，点划线双箭头表示相连的节点之间存在语义关系，实线双箭头表示相连的节点之间存在实体关系，长划线双箭头表示相连的节点之间存在文档关系，文档关系表示两节点是不同文档的文档结构图中相同的节点。Refer to Figure 3, which shows a schematic diagram of a document structure diagram in a task processing method provided by an embodiment of the present application. As shown in Figure 3, the document structure diagram includes five types of nodes, namely document nodes, document block nodes, table nodes, image nodes and text nodes. The solid single arrow in Figure 3 indicates that there is a hierarchical relationship between the connected nodes, the dotted double arrow indicates that there is a semantic relationship between the connected nodes, the solid double arrow indicates that there is an entity relationship between the connected nodes, the long dash double arrow indicates that there is a document relationship between the connected nodes, and the document relationship indicates that two nodes are the same nodes in the document structure diagram of different documents.

实际应用中，获取目标文档的层次结构图的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请一种可能的实现方式中，可以从其他数据获取设备或数据库中读取目标文档的层次结构图。本申请另一种可能的实现方式中，可以基于目标文档中多个文档块以及多个文档块之间的层级关系构建目标文档的层次结构图。In practical applications, there are many ways to obtain the hierarchical structure diagram of the target document, which can be selected according to the actual situation, and the embodiments of the present application do not impose any restrictions on this. In one possible implementation of the present application, the hierarchical structure diagram of the target document can be read from other data acquisition devices or databases. In another possible implementation of the present application, the hierarchical structure diagram of the target document can be constructed based on multiple document blocks in the target document and the hierarchical relationship between the multiple document blocks.

本申请一种可选的实施例中，上述获取目标文档的层次结构图，可以包括以下步骤：In an optional embodiment of the present application, the above-mentioned obtaining of the hierarchical structure diagram of the target document may include the following steps:

获取目标文档，其中，目标文档包括多个文档块，文档块包括文本内容、图像内容和表格内容中的至少一种；Acquire a target document, wherein the target document includes a plurality of document blocks, and the document blocks include at least one of text content, image content, and table content;

解析多个文档块，确定多个文档块之间的层级关系；Parse multiple document blocks and determine the hierarchical relationship between the multiple document blocks;

以多个文档块为节点，层级关系为边，构建目标文档的层次结构图。With multiple document blocks as nodes and hierarchical relationships as edges, a hierarchical graph of the target document is constructed.

需要说明的是，获取目标文档的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请一种可能的实现方式中，可以接收用户通过客户端发送的目标文档。本申请另一种可能的实现方式中，可以从其他数据读取设备或数据库中读取目标文档。It should be noted that there are multiple ways to obtain the target document, which can be selected according to the actual situation, and the embodiments of the present application do not limit this. In one possible implementation of the present application, the target document sent by the user through the client can be received. In another possible implementation of the present application, the target document can be read from other data reading devices or databases.

实际应用中，获取目标文档之后，可以通过OCR、版面识别等技术识别目标文档中的文字，并识别文字对应的标题、页码、文本、表格、图片等，确定目标文档中的多个文档块。接着，基于文档块的位置信息（如缩进、标题级别、段落间距等）和格式信息（如字体大小、加粗、斜体等）确定多个文档块之间的层级关系，或者利用版面分析工具确定多个文档块之间的层级关系。In actual applications, after obtaining the target document, the text in the target document can be identified through OCR, layout recognition and other technologies, and the title, page number, text, table, picture, etc. corresponding to the text can be identified to determine multiple document blocks in the target document. Then, the hierarchical relationship between multiple document blocks is determined based on the location information (such as indentation, title level, paragraph spacing, etc.) and format information (such as font size, bold, italic, etc.) of the document blocks, or the hierarchical relationship between multiple document blocks is determined using layout analysis tools.

应用本申请实施例的方案，以目标文档中的多个文档块为节点，多个文档块之间的层级关系为边，构建目标文档的层次结构图，可以直观地显示目标文档的主要部分和子部分之间的关系，有助于深入理解目标文档的主题和发展脉络，为任务处理过程提供更加准确的文档知识。By applying the solution of the embodiment of the present application, a hierarchical diagram of the target document is constructed with multiple document blocks in the target document as nodes and the hierarchical relationships between the multiple document blocks as edges. This can intuitively display the relationship between the main parts and sub-parts of the target document, help to deeply understand the theme and development context of the target document, and provide more accurate document knowledge for the task processing process.

本申请一种可选的实施例中，上述以多个文档块为节点，层级关系为边，构建目标文档的层次结构图之前，还可以包括以下步骤：In an optional embodiment of the present application, before constructing the hierarchical structure diagram of the target document with multiple document blocks as nodes and hierarchical relationships as edges, the following steps may also be included:

获取多个文档块的阅读顺序；Get the reading order of multiple document blocks;

以多个文档块为节点，层级关系为边，构建目标文档的层次结构图，可以包括以下步骤：Using multiple document blocks as nodes and hierarchical relationships as edges, constructing a hierarchical graph of the target document can include the following steps:

以多个文档块为节点，层级关系和阅读顺序为边，构建目标文档的层次结构图。A hierarchical graph of the target document is constructed with multiple document blocks as nodes and hierarchical relationships and reading orders as edges.

需要说明的是，多个文档块的阅读顺序是指阅读者按照某种逻辑顺序浏览文档中的各个部分的方式。阅读顺序通常根据文档的结构和内容布局来定义，以确保信息能够被正确地理解。阅读顺序包括但不限于按照物理位置的阅读顺序（如按照文档块在页面上的实际位置从左到右、从上到下依次阅读）、按照逻辑结构的阅读顺序（如先读标题后读正文/先读章节标题再读子章节标题）。It should be noted that the reading order of multiple document blocks refers to the way readers browse the various parts of the document in a certain logical order. The reading order is usually defined based on the structure and content layout of the document to ensure that the information can be correctly understood. The reading order includes but is not limited to the reading order based on physical position (such as reading from left to right and from top to bottom according to the actual position of the document block on the page) and the reading order based on logical structure (such as reading the title first and then the text/reading the chapter title first and then the sub-chapter title).

实际应用中，获取多个文档块的阅读顺序的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请另一种可能的实现方式中，可以接收用户通过客户端发送的多个文档块的阅读顺序。本申请另一种可能的实现方式中，可以从多个候选阅读顺序中随机选择出多个文档块的阅读顺序。In practical applications, there are many ways to obtain the reading order of multiple document blocks, which are selected according to actual conditions, and the embodiments of the present application do not impose any restrictions on this. In another possible implementation of the present application, the reading order of multiple document blocks sent by the user through the client can be received. In another possible implementation of the present application, the reading order of multiple document blocks can be randomly selected from multiple candidate reading orders.

进一步地，获取多个文档块的阅读顺序之后，首先，可以以多个文档块为节点，层级关系为边，构建目标文档的初始层次结构图，接着，可以根据阅读顺序在初始层次结构图中的多个文档块之间添加边，获得目标文档的层次结构图。Furthermore, after obtaining the reading order of multiple document blocks, first, an initial hierarchical structure graph of the target document can be constructed with multiple document blocks as nodes and hierarchical relationships as edges. Then, edges can be added between multiple document blocks in the initial hierarchical structure graph according to the reading order to obtain the hierarchical structure graph of the target document.

应用本申请实施例的方案，通过以多个文档块为节点，层级关系和阅读顺序为边，构建目标文档的层次结构图，使得层次结构图不仅可以展示多个文档块之间的层级关系，还可以展示阅读顺序，为任务处理过程提供更加准确的文档知识。By applying the solution of the embodiment of the present application, a hierarchical graph of the target document is constructed by taking multiple document blocks as nodes and hierarchical relationships and reading orders as edges, so that the hierarchical graph can not only display the hierarchical relationships between multiple document blocks, but also display the reading order, thereby providing more accurate document knowledge for the task processing process.

参见图4，图4示出了本申请一个实施例提供的一种任务处理方法中层次结构图的示意图。如图4所示，层次结构图中的根节点表示目标文档，根节点包括文档块1和文档块2两个子节点，子节点文档块1包括表格内容、文本内容和图像内容三个叶子节点。其中，文档块1为多模态文档块，文档块1包括顺序阅读的四部分文本片段，第一部分文本片段包括文本内容，第二部分文本片段包括文本内容和图像内容，第三部分文本片段包括表格内容，第四部分文本片段包括文本内容和图像内容。其中，图4所示的层次结构图中的实线单箭头表示文档块之间的层级关系，虚线单箭头表示基于多个文档块的阅读顺序添加的边。Refer to Figure 4, which shows a schematic diagram of a hierarchical structure diagram in a task processing method provided by an embodiment of the present application. As shown in Figure 4, the root node in the hierarchical structure diagram represents the target document, and the root node includes two child nodes, document block 1 and document block 2. The child node document block 1 includes three leaf nodes, namely, table content, text content, and image content. Among them, document block 1 is a multimodal document block, and document block 1 includes four parts of text fragments that are read sequentially. The first part of the text fragment includes text content, the second part of the text fragment includes text content and image content, the third part of the text fragment includes table content, and the fourth part of the text fragment includes text content and image content. Among them, the solid single arrows in the hierarchical structure diagram shown in Figure 4 represent the hierarchical relationship between document blocks, and the dotted single arrows represent the edges added based on the reading order of multiple document blocks.

本申请一种可选的实施例中，由于目标文档中可能包括图像、表格等非文本模态的内容，为了使得任务处理模型可以更加准确地理解这些内容，可以确定这些内容的摘要信息和关键信息，也即，上述将针对目标文档的信息抽取模式和层次结构图输入信息抽取模型，获得信息抽取结果之前，还可以包括以下步骤：In an optional embodiment of the present application, since the target document may include non-text modal contents such as images and tables, in order to enable the task processing model to understand these contents more accurately, the summary information and key information of these contents can be determined, that is, the above-mentioned information extraction mode and hierarchical structure diagram for the target document are input into the information extraction model, and before obtaining the information extraction result, the following steps may be further included:

将多个文档块输入信息识别模型，获得多个文档块分别对应的摘要信息和关键信息；Inputting multiple document blocks into the information recognition model to obtain summary information and key information corresponding to the multiple document blocks;

将针对目标文档的信息抽取模式和层次结构图输入信息抽取模型，获得信息抽取结果，包括：Input the information extraction pattern and hierarchical structure diagram of the target document into the information extraction model to obtain the information extraction results, including:

将针对目标文档的信息抽取模式、层次结构图和多个文档块分别对应的摘要信息和关键信息输入信息抽取模型，获得信息抽取结果。The information extraction pattern, hierarchical structure diagram and summary information and key information corresponding to multiple document blocks of the target document are input into the information extraction model to obtain the information extraction result.

需要说明的是，信息识别模型用于识别文档块中的摘要信息和关键信息。信息识别模型可以是预训练大模型，也可以是基于样本摘要信息、样本关键信息和样本文档块训练得到的神经网络模型。摘要信息可以称为文档块的全局信息，关键信息可以称为文档块的局部信息。全局信息是指文档块的描述性总结内容，局部信息是指文档块中的关键内容。It should be noted that the information recognition model is used to identify summary information and key information in a document block. The information recognition model can be a pre-trained large model or a neural network model trained based on sample summary information, sample key information, and sample document blocks. Summary information can be called global information of a document block, and key information can be called local information of a document block. Global information refers to the descriptive summary content of a document block, and local information refers to the key content in a document block.

实际应用中，将多个文档块输入信息识别模型时，可以直接将各文档块输入信息识别模型，获得各文档块的摘要信息和关键信息。还可以在信息识别模型的输入中添加识别提示信息，也即将各文档块和识别提示信息输入信息识别模型，获得各文档块的摘要信息和关键信息。其中，识别提示信息如请对文档块进行摘要和关键信息识别，获得摘要信息和关键信息。又如请对文档块进行全局信息和局部信息描述，获得摘要信息和关键信息，具体根据实际情况进行选择，本申请实施例对此不做任何限定。In actual applications, when multiple document blocks are input into the information recognition model, each document block can be directly input into the information recognition model to obtain summary information and key information of each document block. It is also possible to add recognition prompt information to the input of the information recognition model, that is, to input each document block and recognition prompt information into the information recognition model to obtain summary information and key information of each document block. Among them, the recognition prompt information may be, for example, to identify the summary and key information of the document block to obtain summary information and key information. Another example is to describe the global information and local information of the document block to obtain summary information and key information. The specific selection is made according to the actual situation, and the embodiments of the present application do not impose any limitation on this.

示例性地，若文档块中包括文本内容，可以利用预训练大模型对文档块进行摘要和关键信息识别，获得摘要信息和关键信息。若文档块中包括表格内容，可以利用预训练大模型对文档块进行摘要和关键信息识别，获得摘要信息和关键信息。若文档块中包括图像内容，可以利用多模态大模型对文档块进行全局描述和局部描述，获得摘要信息和关键信息。例如，包括图片内容的文档块的摘要信息为“该图片是一家人在草坪上进行野餐的场景”，关键信息为“图片上有三个人，两个大人一个小孩，大人穿红色的衣服，小孩穿绿色的衣服”。Exemplarily, if the document block includes text content, the pre-trained large model can be used to summarize and identify key information of the document block to obtain summary information and key information. If the document block includes table content, the pre-trained large model can be used to summarize and identify key information of the document block to obtain summary information and key information. If the document block includes image content, the multimodal large model can be used to globally and locally describe the document block to obtain summary information and key information. For example, the summary information of a document block including image content is "The picture is a scene of a family having a picnic on the lawn", and the key information is "There are three people in the picture, two adults and one child, the adults are wearing red clothes, and the child is wearing green clothes."

进一步地，获得各文档块分别对应的摘要信息和关键信息之后，可以将各文档块的摘要信息和关键信息以文本的形式存储在文档结构图中文档块对应的节点上，从而丰富文档结构图中的文档知识。Furthermore, after obtaining the summary information and key information corresponding to each document block, the summary information and key information of each document block can be stored in the form of text on the node corresponding to the document block in the document structure diagram, thereby enriching the document knowledge in the document structure diagram.

应用本申请实施例的方案，将针对目标文档的信息抽取模式、层次结构图和多个文档块分别对应的摘要信息和关键信息输入信息抽取模型，获得信息抽取结果，将文档块中的图像内容和表格内容用文本形式表示在文档结构图中，为任务处理过程提供更加全面、准确的文档知识。By applying the solution of the embodiment of the present application, the information extraction pattern, hierarchical structure diagram and summary information and key information corresponding to multiple document blocks of the target document are input into the information extraction model to obtain the information extraction result, and the image content and table content in the document block are represented in text form in the document structure diagram, providing more comprehensive and accurate document knowledge for the task processing process.

本申请一种可选的实施例中，获得目标文档的文档结构图之后，可以利用额外的文档块对文档结构图进行扩展，也即，上述根据关联关系，更新层次结构图中的边，获得目标文档的文档结构图之后，还可以包括以下步骤：In an optional embodiment of the present application, after obtaining the document structure diagram of the target document, the document structure diagram may be expanded using additional document blocks, that is, after the document structure diagram of the target document is obtained, the following steps may be further included:

将扩展提示信息和文档结构图输入任务处理模型，获得待添加文档块；Input the extended prompt information and the document structure diagram into the task processing model to obtain the document block to be added;

利用待添加文档块，对文档结构图进行更新，获得更新后的文档结构图；Using the document block to be added, the document structure diagram is updated to obtain an updated document structure diagram;

将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果，可以包括以下步骤：Inputting the task data and the document structure diagram of the target document into the task processing model to obtain the task processing result may include the following steps:

将任务数据和目标文档的更新后的文档结构图输入任务处理模型，获得任务处理结果。The task data and the updated document structure diagram of the target document are input into the task processing model to obtain the task processing result.

需要说明的是，扩展提示信息用于引导任务处理模型生成与文档结构图中的各文档块相关的待添加文档块。待添加文档块是指与文档结构图中的各文档块相关的文档块。这里的相关可以是存在实体关系、语义关系、层级关系等等。待添加文档块的数量可以是一个，也可以是多个。例如目标文档为一篇有关户外活动的文档，文档结构图中包括文档块“营地”，将扩展提示信息和文档结构图输入任务处理模型，获得待添加文档块可以是“露营”、“帐篷”、“单人帐篷”。It should be noted that the extended prompt information is used to guide the task processing model to generate document blocks to be added that are related to each document block in the document structure diagram. The document blocks to be added refer to document blocks related to each document block in the document structure diagram. The relevance here can be the existence of entity relationships, semantic relationships, hierarchical relationships, etc. The number of document blocks to be added can be one or more. For example, the target document is a document about outdoor activities, and the document structure diagram includes the document block "camp". The extended prompt information and the document structure diagram are input into the task processing model, and the document blocks to be added can be "camping", "tent", and "single tent".

实际应用中，利用待添加文档块，对文档结构图进行更新，获得更新后的文档结构图的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请一种可能的实现方式中，可以在文档结构图中，添加待添加文档块，并为待添加文档块和任意文档块之间添加边，获得更新后的文档结构图。本申请另一种可能的实现方式中，可以从文档结构图中的多个文档块中筛选出与待添加文档块非常相似的目标文档块，在文档结构图中，添加待添加文档块，并为待添加文档块和目标文档块之间添加边，获得更新后的文档结构图。In actual applications, there are many ways to update the document structure diagram using the document block to be added, and obtain the updated document structure diagram. The specific selection depends on the actual situation, and the embodiments of the present application do not impose any restrictions on this. In one possible implementation of the present application, the document block to be added can be added to the document structure diagram, and an edge is added between the document block to be added and any document block to obtain an updated document structure diagram. In another possible implementation of the present application, a target document block that is very similar to the document block to be added can be screened out from multiple document blocks in the document structure diagram, and the document block to be added is added to the document structure diagram, and an edge is added between the document block to be added and the target document block to obtain an updated document structure diagram.

应用本申请实施例的方案，将扩展提示信息和文档结构图输入任务处理模型，获得待添加文档块；利用待添加文档块，对文档结构图进行更新，获得更新后的文档结构图，从而更新后的文档结构图中包括更多的节点，使得更新后的文档结构图可以提供更加准确、详细的文档知识，提高了任务处理结果的全面性和准确性。By applying the solution of the embodiment of the present application, the extended prompt information and the document structure diagram are input into the task processing model to obtain the document block to be added; the document block to be added is used to update the document structure diagram to obtain an updated document structure diagram, so that the updated document structure diagram includes more nodes, so that the updated document structure diagram can provide more accurate and detailed document knowledge, thereby improving the comprehensiveness and accuracy of the task processing results.

本申请一种可选的实施例中，上述利用待添加文档块，对文档结构图进行更新，获得更新后的文档结构图，可以包括以下步骤：In an optional embodiment of the present application, the above-mentioned updating of the document structure diagram by using the document block to be added to obtain the updated document structure diagram may include the following steps:

计算待添加文档块和信息抽取结果之间的相似指标；Calculate the similarity index between the document block to be added and the information extraction result;

根据相似指标，从多个文档块中筛选出目标文档块；Filter out a target document block from multiple document blocks based on similarity indicators;

在文档结构图中，添加待添加文档块，并为待添加文档块和目标文档块之间添加边，获得更新后的文档结构图。In the document structure graph, a document block to be added is added, and an edge is added between the document block to be added and the target document block to obtain an updated document structure graph.

需要说明的是，相似指标用于描述待添加文档块与信息抽取结果之间的相似程度。相似指标可以是相似度（如0.8），也可以是相似等级（如非常相似、相似、不相似）。目标文档块是指文档结构图中相似指标大于预设阈值的至少一个信息抽取结果对应的文档块，其中，预设阈值如0.7，具体根据实际情况进行设置。目标文档块也可以理解为文档结构图中与待添加文档块之间的相似指标较大的至少一个文档块。It should be noted that the similarity index is used to describe the degree of similarity between the document block to be added and the information extraction result. The similarity index can be a similarity (such as 0.8) or a similarity level (such as very similar, similar, dissimilar). The target document block refers to a document block corresponding to at least one information extraction result in the document structure diagram whose similarity index is greater than a preset threshold, wherein the preset threshold, such as 0.7, is set according to actual conditions. The target document block can also be understood as at least one document block in the document structure diagram that has a larger similarity index with the document block to be added.

实际应用中，计算待添加文档库和信息抽取结果之间的相似指标的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请一种可能的实现方式中，可以利用相似度算法（如余弦相似度、欧氏距离等）计算待添加文档块和信息抽取结果之间的相似指标。本申请另一种可能的实现方式中，可以先对待添加文档块进行特征提取，获得待添加特征，对信息抽取结果进行特征提取，获得信息抽取特征，利用相似度算法计算待添加特征和信息抽取特征之间的相似指标，将该相似指标确定为待添加文档块和信息抽取结果之间的相似指标。进一步地，获取待添加文档块和信息抽取结果之间的相似指标之后，可以先确定相似指标较大的至少一个信息抽取结果，并将该至少一个信息抽取结果对应的文档块确定为目标文档块。In practical applications, there are many ways to calculate the similarity index between the document library to be added and the information extraction result, which can be selected according to the actual situation. The embodiments of the present application do not impose any restrictions on this. In one possible implementation of the present application, a similarity algorithm (such as cosine similarity, Euclidean distance, etc.) can be used to calculate the similarity index between the document block to be added and the information extraction result. In another possible implementation of the present application, feature extraction can be performed on the document block to be added to obtain the features to be added, feature extraction can be performed on the information extraction result to obtain the information extraction features, and the similarity algorithm can be used to calculate the similarity index between the features to be added and the information extraction features, and the similarity index is determined as the similarity index between the document block to be added and the information extraction result. Further, after obtaining the similarity index between the document block to be added and the information extraction result, at least one information extraction result with a larger similarity index can be first determined, and the document block corresponding to the at least one information extraction result can be determined as the target document block.

应用本申请实施例的方案，在文档结构图中，添加与文档结构图中的各文档块相关的待添加文档块，并为待添加文档块和目标文档块之间添加边，获得更新后的文档结构图，使得更新后的文档结构图中的结构关系更加准确，进一步提高了任务处理结果的全面性和准确性。By applying the solution of the embodiment of the present application, in the document structure diagram, document blocks to be added that are related to each document block in the document structure diagram are added, and edges are added between the document blocks to be added and the target document blocks to obtain an updated document structure diagram, so that the structural relationship in the updated document structure diagram is more accurate, further improving the comprehensiveness and accuracy of the task processing results.

本申请一种可选的实施例中，上述根据关联关系，更新层次结构图中的边，获得目标文档的文档结构图之后，还可以包括以下步骤：In an optional embodiment of the present application, after the edges in the hierarchical structure diagram are updated according to the association relationship to obtain the document structure diagram of the target document, the following steps may also be included:

获取参考结构图，其中，参考结构图与文档结构图具有至少一个相同的节点；Acquire a reference structure graph, wherein the reference structure graph and the document structure graph have at least one common node;

根据至少一个相同的节点，合并文档结构图和参考结构图，获得合并后的文档结构图；According to at least one identical node, merge the document structure graph and the reference structure graph to obtain a merged document structure graph;

将任务数据和目标文档的合并后的文档结构图输入任务处理模型，获得任务处理结果。The document structure diagram after the task data and the target document are combined is input into the task processing model to obtain the task processing result.

需要说明的是，参考结构图是指与文档结构图具有至少一个相同的节点的结构图。由于参考结构图与文档结构图具有至少一个相同的节点，因此，参考结构图对应的参考文档与文档结构图之间相关。It should be noted that the reference structure graph refers to a structure graph having at least one common node with the document structure graph. Since the reference structure graph and the document structure graph have at least one common node, the reference document corresponding to the reference structure graph is related to the document structure graph.

实际应用中，获取参考结构图的方式有多种，具体根据实际情况进行选择，本申请实施例对此不做任何限定。本申请一种可能的实现方式中，可以接收用户通过客户端发送的参考结构图。本申请另一种可能的实现方式中，可以从其他数据获取设备或数据库中读取参考结构图。In practical applications, there are many ways to obtain a reference structure diagram, which can be selected according to actual conditions, and the embodiments of the present application do not impose any restrictions on this. In one possible implementation of the present application, a reference structure diagram sent by a user through a client can be received. In another possible implementation of the present application, a reference structure diagram can be read from other data acquisition devices or databases.

进一步地，根据至少一个相同的节点，合并文档结构图和参考结构图时，可以将文档结构图和参考结构图中相同的节点进行重合，即可获得合并后的文档结构图。还可以创建一个新图，将文档结构图和参考结构图中的各节点和边复制至新图中，在复制过程中，确保至少一个相同的节点不会被重复添加，检查合并后的文档结构图以确保所有的边都正确连接到了相应的节点，并且没有遗漏信息。接着将任务数据和目标文档的合并后的文档结构图输入任务处理模型，获得任务处理结果。Furthermore, when merging the document structure graph and the reference structure graph according to at least one identical node, the identical nodes in the document structure graph and the reference structure graph can be overlapped to obtain a merged document structure graph. A new graph can also be created to copy the nodes and edges in the document structure graph and the reference structure graph to the new graph. During the copying process, ensure that at least one identical node is not added repeatedly, and check the merged document structure graph to ensure that all edges are correctly connected to the corresponding nodes and that no information is missing. Then, the merged document structure graph of the task data and the target document is input into the task processing model to obtain the task processing result.

应用本申请实施例的方案，通过根据参考结构图中与文档结构图相同的节点，合并文档结构图和参考结构图，获得合并后的文档结构图，从而实现了对参考结构图和文档结构图的合并，扩展了文档结构图中的文档知识，使得合并后的文档结构图可以提供更加准确、详细的文档知识，提高了任务处理结果的全面性和准确性。By applying the solution of the embodiment of the present application, the document structure diagram and the reference structure diagram are merged according to the nodes in the reference structure diagram that are the same as the document structure diagram to obtain a merged document structure diagram, thereby realizing the merging of the reference structure diagram and the document structure diagram, expanding the document knowledge in the document structure diagram, so that the merged document structure diagram can provide more accurate and detailed document knowledge, and improving the comprehensiveness and accuracy of the task processing results.

下述结合附图5，以本申请提供的任务处理方法在智能对话场景的应用为例，对所述任务处理方法进行进一步说明。其中，图5示出了本申请一个实施例提供的一种文档对话方法的流程图，具体包括以下步骤：The following is combined with Figure 5, taking the application of the task processing method provided by the present application in the intelligent dialogue scenario as an example to further illustrate the task processing method. Among them, Figure 5 shows a flowchart of a document dialogue method provided by an embodiment of the present application, which specifically includes the following steps:

步骤502：接收客户端发送的针对目标文档的对话数据。Step 502: Receive the conversation data for the target document sent by the client.

步骤504：将对话数据和目标文档的文档结构图输入任务处理模型，获得对话处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到。Step 504: Input the conversation data and the document structure diagram of the target document into the task processing model to obtain the conversation processing result, wherein the target document includes multiple document blocks, and the document structure diagram is constructed based on the association relationship between the multiple document blocks and the hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges.

步骤506：将对话处理结果发送至客户端。Step 506: Send the dialog processing result to the client.

需要说明的是，步骤502至步骤504的实现方式可以参考上述任务处理方法的实现方式，本申请实施例便不再进行赘述。对话数据可以是简单的对话问题，如2023年最高气温是多少，也可以是综合性问题，如“露营需要什么装备”等摘要、总结性问题，具体根据实际情况进行选择，本申请实施例对此不做任何限定。It should be noted that the implementation of steps 502 to 504 can refer to the implementation of the above-mentioned task processing method, and the present embodiment will not be repeated. The dialogue data can be a simple dialogue question, such as what is the highest temperature in 2023, or a comprehensive question, such as "what equipment is needed for camping" and other summary and summary questions. The specific selection is based on the actual situation, and the present embodiment does not make any limitation on this.

应用本申请实施例的方案，通过利用多个文档块和多个文档块之间的层级关系表示目标文档，挖掘了目标文档内部的粗粒度信息，并且，融入多个文档块之间的关联关系，进一步挖掘了目标文档内部的细粒度信息，实现将目标文档表示为多粒度的文档结构图，从而在文档对话过程中，可以提供更加准确、详细的文档知识，提高了对话处理结果的全面性和准确性。By applying the solution of the embodiment of the present application, the target document is represented by utilizing multiple document blocks and the hierarchical relationship between the multiple document blocks, the coarse-grained information inside the target document is mined, and the association relationship between the multiple document blocks is integrated to further mine the fine-grained information inside the target document, thereby representing the target document as a multi-granular document structure diagram, thereby providing more accurate and detailed document knowledge during the document dialogue process, and improving the comprehensiveness and accuracy of the dialogue processing results.

参见图6，图6示出了本申请一个实施例提供的一种文档处理方法的流程图，具体包括以下步骤：Referring to FIG. 6 , FIG. 6 shows a flowchart of a document processing method provided by an embodiment of the present application, which specifically includes the following steps:

步骤602：获取目标文档的层次结构图，其中，目标文档包括多个文档块，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到。Step 602: Obtain a hierarchical structure graph of the target document, wherein the target document includes multiple document blocks, and the hierarchical structure graph is constructed with the multiple document blocks as nodes and the hierarchical relationships between the multiple document blocks as edges.

步骤604：将针对目标文档的信息抽取模式和层次结构图输入任务处理模型，获得信息抽取结果。Step 604: Input the information extraction pattern and hierarchical structure diagram for the target document into the task processing model to obtain the information extraction result.

步骤606：根据信息抽取结果，确定多个文档块之间的关联关系。Step 606: Determine the association relationship between multiple document blocks based on the information extraction result.

步骤608：根据关联关系，更新层次结构图中的边，获得目标文档的文档结构图。Step 608: Update the edges in the hierarchical structure diagram according to the association relationship to obtain the document structure diagram of the target document.

需要说明的是，步骤602至步骤608的实现方式可以参考上述任务处理方法的实现方式，本申请实施例便不再进行赘述。It should be noted that the implementation of steps 602 to 608 can refer to the implementation of the above-mentioned task processing method, and will not be described in detail in the embodiment of the present application.

应用本申请实施例的方案，通过利用多个文档块和多个文档块之间的层级关系表示目标文档，挖掘了目标文档内部的粗粒度信息，并且，融入多个文档块之间的关联关系，进一步挖掘了目标文档内部的细粒度信息，实现将目标文档表示为多粒度的文档结构图，从而在任务处理过程中，可以提供更加准确、详细的文档知识。By applying the solution of the embodiment of the present application, the target document is represented by utilizing multiple document blocks and the hierarchical relationship between the multiple document blocks, the coarse-grained information inside the target document is mined, and the association relationship between the multiple document blocks is integrated to further mine the fine-grained information inside the target document, thereby representing the target document as a multi-granularity document structure diagram, thereby providing more accurate and detailed document knowledge during task processing.

本说明书一种可选的实施例中，上述根据关联关系，更新层次结构图中的边，获得目标文档的文档结构图之后，还可以包括以下步骤：In an optional embodiment of the present specification, after the edges in the hierarchical structure diagram are updated according to the association relationship to obtain the document structure diagram of the target document, the following steps may also be included:

接收客户端发送的文档检索数据；Receive document retrieval data sent by the client;

根据文档检索数据对文档结构图进行内容检索，获得文档检索数据对应的检索结果；Performing content retrieval on the document structure graph according to the document retrieval data to obtain retrieval results corresponding to the document retrieval data;

将检索结果发送至客户端。Send the search results to the client.

需要说明的是，文档检索数据用于从目标文档中检索出与客户端需求相对应的文档内容。文档检索数据包括但不限于查询词、查询参数。查询词用于描述客户端想要查找的信息，例如“衬衫的价格是多少”。查询参数如排序方式、返回结果的数量限制等，具体根据实际情况进行选择，本说明书实施例对此不做任何限定。检索结果是指根据文档检索数据从目标文档中找到的相关文档或文档片段，例如“衬衫的价格是99元”。It should be noted that the document retrieval data is used to retrieve document content corresponding to the client's needs from the target document. The document retrieval data includes but is not limited to query words and query parameters. The query words are used to describe the information that the client wants to find, such as "how much is the price of a shirt". Query parameters such as sorting method, number limit of returned results, etc. are selected according to actual conditions, and the embodiments of this specification do not impose any restrictions on this. The retrieval result refers to the relevant documents or document fragments found from the target document based on the document retrieval data, such as "the price of a shirt is 99 yuan".

实际应用中，根据文档检索数据对文档结构图进行内容检索，获得文档检索数据对应的检索结果的方式有多种，具体根据实际情况进行选择，本说明书实施例对此不做任何限定。本说明书一种可能的实现方式中，可以将文档检索数据中的关键词与文档结构图中的各节点进行匹配，将匹配程度较高的节点确定为文件检索数据对应的检索结果。本说明书另一种可能的实现方式中，可以利用文档搜索引擎根据文档检索数据对文档结构图进行内容检索，获得文档检索数据对应的检索结果。In actual applications, there are many ways to perform content retrieval on the document structure diagram according to the document retrieval data and obtain the retrieval results corresponding to the document retrieval data. The specific selection is based on the actual situation, and the embodiments of this specification do not impose any restrictions on this. In one possible implementation of this specification, the keywords in the document retrieval data can be matched with each node in the document structure diagram, and the nodes with a higher degree of matching can be determined as the retrieval results corresponding to the document retrieval data. In another possible implementation of this specification, a document search engine can be used to perform content retrieval on the document structure diagram according to the document retrieval data to obtain the retrieval results corresponding to the document retrieval data.

应用本说明书实施例的方法，通过将目标文档表示为多粒度的文档结构图，因此，根据文档检索数据对文档结构图进行内容检索，可以获得更加准确、详细的检索结果。By applying the method of the embodiment of the present specification, the target document is represented as a multi-granularity document structure graph. Therefore, by performing content retrieval on the document structure graph according to the document retrieval data, more accurate and detailed retrieval results can be obtained.

参见图7，图7示出了本申请一个实施例提供的一种文档处理方法的处理过程流程图，如图7所示，文档处理过程可以分为目标文档解析、层次结构图构建、摘要信息和关键信息识别、信息抽取、文档结构图构建以及文档结构图更新六个阶段，接下来，分别对这六个节点进行详细说明。Refer to Figure 7, which shows a processing flow chart of a document processing method provided by an embodiment of the present application. As shown in Figure 7, the document processing process can be divided into six stages: target document parsing, hierarchical graph construction, summary information and key information identification, information extraction, document structure graph construction, and document structure graph update. Next, these six nodes are described in detail respectively.

目标文档解析：对目标文档进行解析，通过PDF-Parser或者OCR等版面识别技术，提取里面的文字信息，根据文字信息进行文档分块，将文档切分成多个文档块；Target document analysis: Analyze the target document, extract the text information through PDF-Parser or OCR and other layout recognition technologies, divide the document into multiple document blocks according to the text information;

层次结构图构建：解析多个文档块，确定多个文档块之间的层级关系；以多个文档块为节点，层级关系为边，构建目标文档的层次结构图；Hierarchical graph construction: parse multiple document blocks and determine the hierarchical relationship between the multiple document blocks; use multiple document blocks as nodes and hierarchical relationships as edges to construct a hierarchical graph of the target document;

摘要信息和关键信息识别：将目标文档的层次结构图中的多个文档块输入信息识别模型，获得多个文档块分别对应的摘要信息和关键信息；将多个文档块分别对应的摘要信息和关键信息存储在层次结构图的多个节点；Identification of summary information and key information: multiple document blocks in the hierarchical structure diagram of the target document are input into the information identification model to obtain summary information and key information corresponding to the multiple document blocks; the summary information and key information corresponding to the multiple document blocks are stored in multiple nodes of the hierarchical structure diagram;

信息抽取：将针对目标文档的信息抽取模式和存储了多个文档块分别对应的摘要信息和关键信息的层次结构图输入信息抽取模型，获得信息抽取结果；Information extraction: Input the information extraction pattern for the target document and the hierarchical structure diagram storing the summary information and key information corresponding to multiple document blocks into the information extraction model to obtain the information extraction result;

文档结构图构建：根据信息抽取结果，确定层次结构图中多个文档块之间的关联关系；根据关联关系，更新层次结构图中的边，获得目标文档的文档结构图；Document structure graph construction: According to the information extraction results, determine the association relationship between multiple document blocks in the hierarchical graph; according to the association relationship, update the edges in the hierarchical graph to obtain the document structure graph of the target document;

文档结构图更新：将扩展提示信息和文档结构图输入任务处理模型，获得待添加文档块；计算待添加文档块和信息抽取结果之间的相似指标；根据相似指标，从多个文档块中筛选出目标文档块；在目标文档的文档结构图中，添加待添加文档块，并为待添加文档块和目标文档块之间添加边，获得更新后的文档结构图。Document structure graph update: input the extended prompt information and the document structure graph into the task processing model to obtain the document block to be added; calculate the similarity index between the document block to be added and the information extraction result; filter out the target document block from multiple document blocks based on the similarity index; add the document block to be added to the document structure graph of the target document, and add edges between the document block to be added and the target document block to obtain an updated document structure graph.

与上述任务处理方法实施例相对应，本申请还提供了任务处理装置实施例，图8示出了本申请一个实施例提供的一种任务处理装置的结构示意图。如图8所示，该装置包括：Corresponding to the above-mentioned task processing method embodiment, the present application also provides a task processing device embodiment. FIG8 shows a structural schematic diagram of a task processing device provided by an embodiment of the present application. As shown in FIG8 , the device includes:

第一获取模块802，被配置为获取针对目标文档的任务数据和目标文档的文档结构图；A first acquisition module 802 is configured to acquire task data for a target document and a document structure diagram of the target document;

第一输入模块804，被配置为将任务数据和目标文档的文档结构图输入任务处理模型，获得任务处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到。The first input module 804 is configured to input the task data and the document structure diagram of the target document into the task processing model to obtain the task processing result, wherein the target document includes multiple document blocks, and the document structure diagram is constructed based on the association relationship between the multiple document blocks and the hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges.

可选地，该装置还包括：第三获取模块，被配置为获取目标文档的层次结构图；将针对目标文档的信息抽取模式和层次结构图输入信息抽取模型，获得信息抽取结果；根据信息抽取结果，确定多个文档块之间的关联关系；根据关联关系，更新层次结构图中的边，获得目标文档的文档结构图。Optionally, the device also includes: a third acquisition module, configured to obtain a hierarchical structure diagram of the target document; input the information extraction pattern and hierarchical structure diagram for the target document into the information extraction model to obtain an information extraction result; determine the association relationship between multiple document blocks based on the information extraction result; update the edges in the hierarchical structure diagram based on the association relationship to obtain a document structure diagram of the target document.

可选地，该装置还包括：第四输入模块，被配置为将扩展提示信息和文档结构图输入任务处理模型，获得待添加文档块；利用待添加文档块，对文档结构图进行更新，获得更新后的文档结构图；第一输入模块804，进一步被配置为将任务数据和目标文档的更新后的文档结构图输入任务处理模型，获得任务处理结果。Optionally, the device also includes: a fourth input module, configured to input the extended prompt information and the document structure diagram into the task processing model to obtain the document block to be added; using the document block to be added, the document structure diagram is updated to obtain an updated document structure diagram; the first input module 804 is further configured to input the task data and the updated document structure diagram of the target document into the task processing model to obtain the task processing result.

可选地，第四输入模块，进一步被配置为计算待添加文档块和信息抽取结果之间的相似指标；根据相似指标，从多个文档块中筛选出目标文档块；在文档结构图中，添加待添加文档块，并为待添加文档块和目标文档块之间添加边，获得更新后的文档结构图。Optionally, the fourth input module is further configured to calculate a similarity index between the document block to be added and the information extraction result; filter out a target document block from multiple document blocks based on the similarity index; add the document block to be added to the document structure diagram, and add an edge between the document block to be added and the target document block to obtain an updated document structure diagram.

可选地，该装置还包括：第五输入模块，被配置为将多个文档块输入信息识别模型，获得多个文档块分别对应的摘要信息和关键信息；第三获取模块，进一步被配置为将针对目标文档的信息抽取模式、层次结构图和多个文档块分别对应的摘要信息和关键信息输入信息抽取模型，获得信息抽取结果。Optionally, the device also includes: a fifth input module, configured to input multiple document blocks into the information recognition model to obtain summary information and key information corresponding to the multiple document blocks respectively; a third acquisition module, further configured to input the information extraction pattern, hierarchical structure diagram and summary information and key information corresponding to the multiple document blocks for the target document into the information extraction model to obtain the information extraction result.

可选地，第三获取模块，进一步被配置为获取目标文档，其中，目标文档包括多个文档块，文档块包括文本内容、图像内容和表格内容中的至少一种；解析多个文档块，确定多个文档块之间的层级关系；以多个文档块为节点，层级关系为边，构建目标文档的层次结构图。Optionally, the third acquisition module is further configured to acquire a target document, wherein the target document includes multiple document blocks, and the document blocks include at least one of text content, image content and table content; parse the multiple document blocks to determine the hierarchical relationship between the multiple document blocks; and construct a hierarchical graph of the target document with the multiple document blocks as nodes and the hierarchical relationship as edges.

可选地，该装置还包括：第四获取模块，被配置为获取多个文档块的阅读顺序；第三获取模块，进一步被配置为以多个文档块为节点，层级关系和阅读顺序为边，构建目标文档的层次结构图。Optionally, the device also includes: a fourth acquisition module configured to acquire the reading order of multiple document blocks; and a third acquisition module further configured to construct a hierarchical structure diagram of the target document with multiple document blocks as nodes and hierarchical relationships and reading orders as edges.

可选地，该装置还包括：第五获取模块，被配置为获取参考结构图，其中，参考结构图与文档结构图具有至少一个相同的节点；根据至少一个相同的节点，合并文档结构图和参考结构图，获得合并后的文档结构图；第一输入模块804，进一步被配置为将任务数据和目标文档的合并后的文档结构图输入任务处理模型，获得任务处理结果。Optionally, the device also includes: a fifth acquisition module, configured to acquire a reference structure diagram, wherein the reference structure diagram and the document structure diagram have at least one identical node; based on the at least one identical node, merging the document structure diagram and the reference structure diagram to obtain a merged document structure diagram; and a first input module 804, further configured to input the merged document structure diagram of the task data and the target document into the task processing model to obtain a task processing result.

可选地，第一输入模块804，进一步被配置为对任务数据进行特征提取，获得任务数据特征；对文档结构图进行特征提取，获得文档结构特征；将任务数据特征和文档结构特征输入任务处理模型，获得任务处理结果。Optionally, the first input module 804 is further configured to perform feature extraction on the task data to obtain task data features; perform feature extraction on the document structure graph to obtain document structure features; input the task data features and document structure features into the task processing model to obtain task processing results.

应用本申请实施例的方案，由于任务处理装置中包括可以利用文档结构图对任务数据进行处理的第一输入模型，而文档结构图可以多粒度地表示目标文档，进一步可以向任务处理装置提供更加准确、详细的文档知识，提高了任务处理装置进行任务处理的全面性和准确性。By applying the solution of the embodiment of the present application, since the task processing device includes a first input model that can use the document structure diagram to process task data, and the document structure diagram can represent the target document at multiple granularities, it can further provide the task processing device with more accurate and detailed document knowledge, thereby improving the comprehensiveness and accuracy of the task processing performed by the task processing device.

上述为本实施例的一种任务处理装置的示意性方案。需要说明的是，该任务处理装置的技术方案与上述的任务处理方法的技术方案属于同一构思，任务处理装置的技术方案未详细描述的细节内容，均可以参见上述任务处理方法的技术方案的描述。The above is a schematic scheme of a task processing device of this embodiment. It should be noted that the technical scheme of the task processing device and the technical scheme of the above task processing method belong to the same concept, and the details of the technical scheme of the task processing device that are not described in detail can all be referred to the description of the technical scheme of the above task processing method.

与上述文档对话方法实施例相对应，本申请还提供了文档对话装置实施例，图9示出了本申请一个实施例提供的一种文档对话装置的结构示意图。如图9所示，该装置包括：Corresponding to the above-mentioned document dialogue method embodiment, the present application also provides a document dialogue device embodiment. FIG9 shows a schematic diagram of the structure of a document dialogue device provided by an embodiment of the present application. As shown in FIG9 , the device includes:

接收模块902，被配置为接收客户端发送的针对目标文档的对话数据；The receiving module 902 is configured to receive the conversation data for the target document sent by the client;

第二输入模块904，被配置为将对话数据和目标文档的文档结构图输入任务处理模型，获得对话处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到；The second input module 904 is configured to input the dialog data and the document structure diagram of the target document into the task processing model to obtain a dialog processing result, wherein the target document includes multiple document blocks, the document structure diagram is constructed based on the association relationship between the multiple document blocks and the hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with the multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges;

发送模块906，被配置为将对话处理结果发送至客户端。The sending module 906 is configured to send the dialogue processing result to the client.

应用本申请实施例的方案，由于文档对话装置中包括可以利用文档结构图对对话数据进行处理的第二输入模型，而文档结构图可以多粒度地表示目标文档，进一步可以向文档对话装置提供更加准确、详细的文档知识，提高了文档对话装置进行对话处理的全面性和准确性。By applying the solution of the embodiment of the present application, since the document dialogue device includes a second input model that can use the document structure diagram to process the dialogue data, and the document structure diagram can represent the target document at multiple granularities, it can further provide the document dialogue device with more accurate and detailed document knowledge, thereby improving the comprehensiveness and accuracy of the dialogue processing performed by the document dialogue device.

上述为本实施例的一种文档对话装置的示意性方案。需要说明的是，该文档对话装置的技术方案与上述的文档对话方法的技术方案属于同一构思，文档对话装置的技术方案未详细描述的细节内容，均可以参见上述文档对话方法的技术方案的描述。The above is a schematic scheme of a document dialogue device of this embodiment. It should be noted that the technical scheme of the document dialogue device and the technical scheme of the document dialogue method described above are of the same concept, and the details of the technical scheme of the document dialogue device that are not described in detail can be found in the description of the technical scheme of the document dialogue method described above.

与上述文档处理方法实施例相对应，本申请还提供了文档处理装置实施例，图10示出了本申请一个实施例提供的一种文档处理装置的结构示意图。如图10所示，该装置包括：Corresponding to the above document processing method embodiment, the present application also provides a document processing device embodiment, and FIG10 shows a schematic diagram of the structure of a document processing device provided by an embodiment of the present application. As shown in FIG10 , the device includes:

第二获取模块1002，被配置为获取目标文档的层次结构图，其中，目标文档包括多个文档块，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到；The second acquisition module 1002 is configured to acquire a hierarchical structure graph of a target document, wherein the target document includes a plurality of document blocks, and the hierarchical structure graph is constructed with the plurality of document blocks as nodes and the hierarchical relationships between the plurality of document blocks as edges;

第三输入模块1004，被配置为将针对目标文档的信息抽取模式和层次结构图输入任务处理模型，获得信息抽取结果；The third input module 1004 is configured to input the information extraction pattern and the hierarchical structure diagram for the target document into the task processing model to obtain the information extraction result;

确定模块1006，被配置为根据信息抽取结果，确定多个文档块之间的关联关系；A determination module 1006 is configured to determine the association relationship between the plurality of document blocks according to the information extraction result;

更新模块1008，被配置为根据关联关系，更新层次结构图中的边，获得目标文档的文档结构图。The updating module 1008 is configured to update the edges in the hierarchical structure diagram according to the association relationship to obtain the document structure diagram of the target document.

可选地，该装置还包括：检索模块，被配置为接收客户端发送的文档检索数据；根据文档检索数据对文档结构图进行内容检索，获得文档检索数据对应的检索结果；将检索结果发送至客户端。Optionally, the device further includes: a retrieval module configured to receive document retrieval data sent by a client; perform content retrieval on the document structure diagram according to the document retrieval data to obtain retrieval results corresponding to the document retrieval data; and send the retrieval results to the client.

应用本申请实施例的方案，文档处理装置通过利用多个文档块和多个文档块之间的层级关系表示目标文档，挖掘了目标文档内部的粗粒度信息，并且，融入多个文档块之间的关联关系，进一步挖掘了目标文档内部的细粒度信息，实现将目标文档表示为多粒度的文档结构图，从而在任务处理过程中，可以提供更加准确、详细的文档知识。By applying the solution of the embodiment of the present application, the document processing device represents the target document by utilizing multiple document blocks and the hierarchical relationship between the multiple document blocks, thereby mining the coarse-grained information inside the target document, and further mining the fine-grained information inside the target document by integrating the association relationship between the multiple document blocks, thereby representing the target document as a multi-granularity document structure diagram, thereby providing more accurate and detailed document knowledge during the task processing process.

上述为本实施例的一种文档处理装置的示意性方案。需要说明的是，该文档处理装置的技术方案与上述的文档处理方法的技术方案属于同一构思，文档处理装置的技术方案未详细描述的细节内容，均可以参见上述文档处理方法的技术方案的描述。The above is a schematic solution of a document processing device of this embodiment. It should be noted that the technical solution of the document processing device and the technical solution of the document processing method described above are of the same concept, and the details not described in detail in the technical solution of the document processing device can be referred to the description of the technical solution of the document processing method described above.

参见图11，图11示出了本说明书一个实施例提供的一种任务平台的结构示意图，任务平台包括请求接口1102和响应单元1104；Referring to FIG. 11 , FIG. 11 shows a schematic diagram of the structure of a task platform provided in one embodiment of the present specification, the task platform includes a request interface 1102 and a response unit 1104;

请求接口1102，用于接收客户端发送的对话处理请求，其中，对话处理请求携带针对目标文档的对话数据；The request interface 1102 is used to receive a dialog processing request sent by a client, wherein the dialog processing request carries dialog data for a target document;

响应单元1104，用于将对话数据和目标文档的文档结构图输入任务处理模型，获得对话处理结果，其中，目标文档包括多个文档块，文档结构图基于多个文档块之间的关联关系和目标文档的层次结构图构建得到，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到。Response unit 1104 is used to input the dialogue data and the document structure diagram of the target document into the task processing model to obtain the dialogue processing result, wherein the target document includes multiple document blocks, and the document structure diagram is constructed based on the association relationship between the multiple document blocks and the hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges.

本说明书一种可选的实施例中，任务平台还包括文档处理接口；In an optional embodiment of this specification, the task platform also includes a document processing interface;

文档处理接口，用于获取目标文档的层次结构图，其中，目标文档包括多个文档块，层次结构图以多个文档块为节点，以多个文档块之间的层级关系为边构建得到；将针对目标文档的信息抽取模式和层次结构图输入任务处理模型，获得信息抽取结果；根据信息抽取结果，确定多个文档块之间的关联关系；根据关联关系，更新层次结构图中的边，获得目标文档的文档结构图。A document processing interface is used to obtain a hierarchical structure diagram of a target document, wherein the target document includes multiple document blocks, and the hierarchical structure diagram is constructed with multiple document blocks as nodes and hierarchical relationships between multiple document blocks as edges; the information extraction pattern and the hierarchical structure diagram for the target document are input into a task processing model to obtain information extraction results; based on the information extraction results, the association relationship between multiple document blocks is determined; based on the association relationship, the edges in the hierarchical structure diagram are updated to obtain the document structure diagram of the target document.

应用本说明书实施例的方案，任务平台将目标文档表示为多粒度的文档结构图，提供更加准确、详细的文档知识，从而适应于用户请求进行对话任务处理，获得更加准确、详细的对话处理结果，实现了个性化的对话服务，为用户提供了一个高效、灵活且易用的文档服务平台，提升了用户体验。By applying the solution of the embodiments of this specification, the task platform represents the target document as a multi-granularity document structure diagram, providing more accurate and detailed document knowledge, thereby adapting to user requests for dialogue task processing, obtaining more accurate and detailed dialogue processing results, and realizing personalized dialogue services, providing users with an efficient, flexible and easy-to-use document service platform, and improving the user experience.

上述为本实施例的一种任务平台的示意性方案。需要说明的是，该任务平台的技术方案与上述的文档对话方法和文档处理方法的技术方案属于同一构思，任务平台的技术方案未详细描述的细节内容，均可以参见上述文档对话方法和文档处理方法的技术方案的描述。The above is a schematic scheme of a task platform of this embodiment. It should be noted that the technical scheme of the task platform and the technical scheme of the document dialogue method and document processing method described above are of the same concept, and the details not described in detail in the technical scheme of the task platform can be found in the description of the technical scheme of the document dialogue method and document processing method described above.

图12示出了本申请一个实施例提供的一种计算设备的结构框图。该计算设备1200的部件包括但不限于存储器1210和处理器1220。处理器1220与存储器1210通过总线1230相连接，数据库1250用于保存数据。Fig. 12 shows a block diagram of a computing device provided by an embodiment of the present application. The components of the computing device 1200 include but are not limited to a memory 1210 and a processor 1220. The processor 1220 is connected to the memory 1210 via a bus 1230, and the database 1250 is used to store data.

计算设备1200还包括接入设备1240，接入设备1240使得计算设备1200能够经由一个或多个网络1260通信。这些网络的示例包括公用交换电话网（PSTN，Public SwitchedTelephone Network）、局域网（LAN，Local Area Network）、广域网（WAN，Wide AreaNetwork）、个域网（PAN，Personal Area Network）或诸如因特网的通信网络的组合。接入设备1240可以包括有线或无线的任何类型的网络接口（例如，网络接口卡（NIC，NetworkInterface Card））中的一个或多个，诸如IEEE802.11无线局域网（WLAN，Wireless LocalArea Networks）无线接口、全球微波互联接入（Wi-MAX，World Interoperability forMicrowave Access）接口、以太网接口、通用串行总线（USB，Universal Serial Bus）接口、蜂窝网络接口、蓝牙接口、近场通信（NFC，Near Field Communication）接口，等等。The computing device 1200 also includes an access device 1240 that enables the computing device 1200 to communicate via one or more networks 1260. Examples of these networks include a public switched telephone network (PSTN), a local area network (LAN), a wide area network (WAN), a personal area network (PAN), or a combination of communication networks such as the Internet. The access device 1240 may include one or more of any type of network interface (e.g., a network interface card (NIC)) of wired or wireless, such as an IEEE 802.11 wireless local area network (WLAN) wireless interface, a world microwave interconnection access (Wi-MAX) interface, an Ethernet interface, a universal serial bus (USB) interface, a cellular network interface, a Bluetooth interface, a near field communication (NFC) interface, and the like.

在本申请的一个实施例中，计算设备1200的上述部件以及图12中未示出的其他部件也可以彼此相连接，例如通过总线。应当理解，图12所示的计算设备结构框图仅仅是出于示例的目的，而不是对本申请范围的限制。本领域技术人员可以根据需要，增添或替换其他部件。In one embodiment of the present application, the above components of the computing device 1200 and other components not shown in FIG. 12 may also be connected to each other, for example, through a bus. It should be understood that the computing device structure block diagram shown in FIG. 12 is only for illustrative purposes and is not intended to limit the scope of the present application. Those skilled in the art may add or replace other components as needed.

计算设备1200可以是任何类型的静止或移动计算设备，包括移动计算机或移动计算设备（例如，平板计算机、个人数字助理、膝上型计算机、笔记本计算机、上网本等）、移动电话（例如，智能手机）、可佩戴的计算设备（例如，智能手表、智能眼镜等）或其他类型的移动设备，或者诸如台式计算机或个人计算机（PC，Personal Computer）的静止计算设备。计算设备1200还可以是移动式或静止式的服务器。The computing device 1200 may be any type of stationary or mobile computing device, including a mobile computer or mobile computing device (e.g., a tablet computer, a personal digital assistant, a laptop computer, a notebook computer, a netbook, etc.), a mobile phone (e.g., a smart phone), a wearable computing device (e.g., a smart watch, smart glasses, etc.), or other types of mobile devices, or a stationary computing device such as a desktop computer or a personal computer (PC). The computing device 1200 may also be a mobile or stationary server.

其中，处理器1220用于执行计算机程序/指令，该计算机程序/指令被处理器执行时实现上述任务处理方法或者文档对话方法或者文档处理方法的步骤。The processor 1220 is used to execute computer programs/instructions, which, when executed by the processor, implement the steps of the above-mentioned task processing method or document dialogue method or document processing method.

上述为本实施例的一种计算设备的示意性方案。需要说明的是，该计算设备的技术方案与上述的任务处理方法、文档对话方法以及文档处理方法的技术方案属于同一构思，计算设备的技术方案未详细描述的细节内容，均可以参见上述任务处理方法或者文档对话方法或者文档处理方法的技术方案的描述。The above is a schematic scheme of a computing device of this embodiment. It should be noted that the technical scheme of the computing device and the technical schemes of the above-mentioned task processing method, document dialogue method and document processing method belong to the same concept, and the details not described in detail in the technical scheme of the computing device can be referred to the description of the technical schemes of the above-mentioned task processing method or document dialogue method or document processing method.

本申请一实施例还提供一种计算机可读存储介质，其存储有计算机程序/指令，该计算机程序/指令被处理器执行时实现上述任务处理方法或者文档对话方法或者文档处理方法的步骤。An embodiment of the present application also provides a computer-readable storage medium storing a computer program/instruction, which implements the steps of the above-mentioned task processing method or document dialogue method or document processing method when executed by a processor.

上述为本实施例的一种计算机可读存储介质的示意性方案。需要说明的是，该存储介质的技术方案与上述的任务处理方法、文档对话方法以及文档处理方法的技术方案属于同一构思，存储介质的技术方案未详细描述的细节内容，均可以参见上述任务处理方法或者文档对话方法或者文档处理方法的技术方案的描述。The above is a schematic scheme of a computer-readable storage medium of this embodiment. It should be noted that the technical scheme of the storage medium and the technical schemes of the above-mentioned task processing method, document dialogue method and document processing method belong to the same concept, and the details not described in detail in the technical scheme of the storage medium can be referred to the description of the technical schemes of the above-mentioned task processing method or document dialogue method or document processing method.

本申请一实施例还提供一种计算机程序产品，包括计算机程序/指令，该计算机程序/指令被处理器执行时实现上述任务处理方法或者文档对话方法或者文档处理方法的步骤。An embodiment of the present application also provides a computer program product, including a computer program/instruction, which implements the steps of the above-mentioned task processing method or document dialogue method or document processing method when executed by a processor.

上述为本实施例的一种计算机程序产品的示意性方案。需要说明的是，该计算机程序产品的技术方案与上述的任务处理方法、文档对话方法以及文档处理方法的技术方案属于同一构思，计算机程序产品的技术方案未详细描述的细节内容，均可以参见上述任务处理方法或者文档对话方法或者文档处理方法的技术方案的描述。The above is a schematic scheme of a computer program product of this embodiment. It should be noted that the technical scheme of the computer program product and the technical schemes of the task processing method, document dialogue method and document processing method described above belong to the same concept, and the details not described in detail in the technical scheme of the computer program product can be found in the description of the technical schemes of the task processing method or document dialogue method or document processing method described above.

上述对本申请特定实施例进行了描述。其它实施例在所附权利要求书的范围内。在一些情况下，在权利要求书中记载的动作或步骤可以按照不同于实施例中的顺序来执行并且仍然可以实现期望的结果。另外，在附图中描绘的过程不一定要求示出的特定顺序或者连续顺序才能实现期望的结果。在某些实施方式中，多任务处理和并行处理也是可以的或者可能是有利的。The above describes specific embodiments of the present application. Other embodiments are within the scope of the appended claims. In some cases, the actions or steps recorded in the claims can be performed in an order different from that in the embodiments and still achieve the desired results. In addition, the processes depicted in the accompanying drawings do not necessarily require the specific order or continuous order shown to achieve the desired results. In some embodiments, multitasking and parallel processing are also possible or may be advantageous.

所述计算机指令包括计算机程序代码，所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括：能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器（ROM，Read-Only Memory）、随机存取存储器（RAM，Random Access Memory）、电载波信号、电信信号以及软件分发介质等。需要说明的是，所述计算机可读介质包含的内容可以根据专利实践的要求进行适当的增减，例如在某些地区，根据专利实践，计算机可读介质不包括电载波信号和电信信号。The computer instructions include computer program codes, which may be in source code form, object code form, executable files or some intermediate forms, etc. The computer-readable medium may include: any entity or device capable of carrying the computer program code, recording medium, USB flash drive, mobile hard disk, magnetic disk, optical disk, computer memory, read-only memory (ROM), random access memory (RAM), electric carrier signal, telecommunication signal and software distribution medium, etc. It should be noted that the content contained in the computer-readable medium may be appropriately increased or decreased according to the requirements of patent practice. For example, in some regions, according to patent practice, computer-readable media do not include electric carrier signals and telecommunication signals.

需要说明的是，对于前述的各方法实施例，为了简便描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请实施例并不受所描述的动作顺序的限制，因为依据本申请实施例，某些步骤可以采用其它顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定都是本申请实施例所必须的。It should be noted that, for the above-mentioned method embodiments, for the sake of simplicity of description, they are all expressed as a series of action combinations, but those skilled in the art should be aware that the embodiments of the present application are not limited by the described action sequence, because according to the embodiments of the present application, certain steps can be performed in other sequences or simultaneously. Secondly, those skilled in the art should also be aware that the embodiments described in the specification are all preferred embodiments, and the actions and modules involved are not necessarily required by the embodiments of the present application.

在上述实施例中，对各个实施例的描述都各有侧重，某个实施例中没有详述的部分，可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.

以上公开的本申请优选实施例只是用于帮助阐述本申请。可选实施例并没有详尽叙述所有的细节，也不限制该发明仅为所述的具体实施方式。显然，根据本申请实施例的内容，可作很多的修改和变化。本申请选取并具体描述这些实施例，是为了更好地解释本申请实施例的原理和实际应用，从而使所属技术领域技术人员能很好地理解和利用本申请。本申请仅受权利要求书及其全部范围和等效物的限制。The preferred embodiments of the present application disclosed above are only used to help explain the present application. The optional embodiments do not describe all the details in detail, nor do they limit the invention to the specific implementation methods described. Obviously, many modifications and changes can be made according to the content of the embodiments of the present application. The present application selects and specifically describes these embodiments in order to better explain the principles and practical applications of the embodiments of the present application, so that those skilled in the art can understand and use the present application well. The present application is only limited by the claims and their full scope and equivalents.

Claims

1. A task processing method, comprising:

Acquire task data for a target document and a document structure diagram of the target document;

The task data and the document structure diagram of the target document are input into a task processing model to obtain a task processing result, wherein the target document includes multiple document blocks, and the document structure diagram is constructed based on the association relationship between the multiple document blocks and a hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with the multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges.

2. The method according to claim 1, before inputting the task data and the document structure diagram of the target document into the task processing model and obtaining the task processing result, further comprises:

Get the hierarchical structure diagram of the target document;

Inputting the information extraction pattern for the target document and the hierarchical structure diagram into the information extraction model to obtain an information extraction result;

Determining the association relationship between the plurality of document blocks according to the information extraction result;

According to the association relationship, the edges in the hierarchical structure graph are updated to obtain the document structure graph of the target document.

3. The method according to claim 2, after updating the edges in the hierarchical structure diagram according to the association relationship to obtain the document structure diagram of the target document, further comprises:

Inputting the extended prompt information and the document structure diagram into the task processing model to obtain the document block to be added;

Using the document block to be added, updating the document structure diagram to obtain an updated document structure diagram;

The step of inputting the task data and the document structure diagram of the target document into a task processing model to obtain a task processing result includes:

The task data and the updated document structure diagram of the target document are input into a task processing model to obtain a task processing result.

4. The method according to claim 3, wherein the updating of the document structure diagram using the document block to be added to obtain an updated document structure diagram comprises:

Calculating a similarity index between the document block to be added and the information extraction result;

Filtering a target document block from the multiple document blocks according to the similarity index;

In the document structure graph, the document block to be added is added, and an edge is added between the document block to be added and the target document block to obtain an updated document structure graph.

5. The method according to claim 2, before inputting the information extraction pattern for the target document and the hierarchical structure diagram into the information extraction model to obtain the information extraction result, further comprising:

Inputting the multiple document blocks into an information recognition model to obtain summary information and key information respectively corresponding to the multiple document blocks;

The step of inputting the information extraction pattern for the target document and the hierarchical structure diagram into the information extraction model to obtain the information extraction result includes:

The information extraction pattern for the target document, the hierarchical structure diagram, and the summary information and key information respectively corresponding to the plurality of document blocks are input into the information extraction model to obtain an information extraction result.

6. The method according to claim 2, wherein obtaining the hierarchical structure diagram of the target document comprises:

Acquire a target document, wherein the target document includes a plurality of document blocks, and the document blocks include at least one of text content, image content, and table content;

Parsing the multiple document blocks to determine the hierarchical relationship between the multiple document blocks;

A hierarchical graph of the target document is constructed with the multiple document blocks as nodes and the hierarchical relationships as edges.

7. The method according to claim 6, before constructing the hierarchical structure graph of the target document with the multiple document blocks as nodes and the hierarchical relationships as edges, further comprising:

Obtaining a reading order of the plurality of document blocks;

The step of constructing a hierarchical structure graph of the target document using the multiple document blocks as nodes and the hierarchical relationships as edges includes:

A hierarchical structure graph of the target document is constructed with the multiple document blocks as nodes and the hierarchical relationship and the reading order as edges.

8. The method according to claim 2, after updating the edges in the hierarchical structure diagram according to the association relationship to obtain the document structure diagram of the target document, further comprises:

Acquire a reference structure graph, wherein the reference structure graph and the document structure graph have at least one common node;

According to the at least one identical node, merging the document structure graph and the reference structure graph to obtain a merged document structure graph;

The document structure diagram after the task data and the target document are combined is input into a task processing model to obtain a task processing result.

9. The method according to any one of claims 1 to 8, wherein the step of inputting the task data and the document structure diagram of the target document into a task processing model to obtain a task processing result comprises:

Extracting features from the task data to obtain task data features;

Extracting features from the document structure graph to obtain document structure features;

The task data features and the document structure features are input into a task processing model to obtain a task processing result.

10. A document dialogue method, comprising:

Receive the conversation data for the target document sent by the client;

Inputting the dialog data and the document structure graph of the target document into the task processing model to obtain a dialog processing result, wherein the target document includes a plurality of document blocks, the document structure graph is constructed based on the association relationship between the plurality of document blocks and the hierarchical structure graph of the target document, and the hierarchical structure graph is constructed with the plurality of document blocks as nodes and the hierarchical relationship between the plurality of document blocks as edges;

The dialogue processing result is sent to the client.

11. A document processing method, comprising:

Acquire a hierarchical structure graph of a target document, wherein the target document includes a plurality of document blocks, and the hierarchical structure graph is constructed with the plurality of document blocks as nodes and the hierarchical relationships between the plurality of document blocks as edges;

Inputting the information extraction pattern for the target document and the hierarchical structure diagram into the task processing model to obtain an information extraction result;

12. The method according to claim 11, after updating the edges in the hierarchical structure diagram according to the association relationship to obtain the document structure diagram of the target document, further comprises:

Receive document retrieval data sent by the client;

Performing content retrieval on the document structure graph according to the document retrieval data to obtain retrieval results corresponding to the document retrieval data;

The search result is sent to the client.

13. A task platform, comprising a request interface and a response unit;

The request interface is used to receive a dialog processing request sent by a client, wherein the dialog processing request carries dialog data for a target document;

The response unit is used to input the dialogue data and the document structure diagram of the target document into the task processing model to obtain a dialogue processing result, wherein the target document includes multiple document blocks, and the document structure diagram is constructed based on the association relationship between the multiple document blocks and the hierarchical structure diagram of the target document, and the hierarchical structure diagram is constructed with the multiple document blocks as nodes and the hierarchical relationship between the multiple document blocks as edges.

14. The task platform according to claim 13, further comprising a document processing interface;

The document processing interface is used to obtain a hierarchical structure diagram of a target document, wherein the target document includes multiple document blocks, and the hierarchical structure diagram is constructed using the multiple document blocks as nodes and the hierarchical relationships between the multiple document blocks as edges; inputting an information extraction pattern for the target document and the hierarchical structure diagram into a task processing model to obtain an information extraction result; determining the association relationship between the multiple document blocks based on the information extraction result; and updating the edges in the hierarchical structure diagram based on the association relationship to obtain a document structure diagram of the target document.

15. A computing device comprising:

Memory and processor;

The memory is used to store computer programs/instructions, and the processor is used to execute the computer programs/instructions. When the computer program/instructions are executed by the processor, the steps of the method described in any one of claims 1 to 9 or claim 10 or any one of claims 11 to 12 are implemented.

16. A computer-readable storage medium storing a computer program/instruction, which, when executed by a processor, implements the steps of the method described in any one of claims 1 to 9 or claim 10 or any one of claims 11 to 12.

17. A computer program product, comprising a computer program/instruction, which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 9 or claim 10 or any one of claims 11 to 12.