CN115546815A

CN115546815A - Table identification method, device, equipment and storage medium

Info

Publication number: CN115546815A
Application number: CN202211362565.5A
Authority: CN
Inventors: 吴嘉嘉; 张银田; 殷兵; 胡金水; 刘聪
Original assignee: iFlytek Co Ltd
Current assignee: iFlytek Co Ltd
Priority date: 2022-11-02
Filing date: 2022-11-02
Publication date: 2022-12-30

Abstract

The present application provides a form recognition method, device, equipment, and storage medium. The specific implementation plan is: detecting and determining the form area from the image to be tested; determining the corresponding text feature and position feature based on the form area; The feature is fused with the position feature to obtain a table feature; and the table feature is used to determine the table recognition result in the image to be tested. According to the technical solution of the present application, the form can be accurately identified.

Description

Form recognition method, device, equipment and storage medium

技术领域technical field

本申请涉及人工智能技术领域，尤其涉及一种表格识别方法、装置、设备及存储介质。The present application relates to the technical field of artificial intelligence, and in particular to a form recognition method, device, equipment and storage medium.

背景技术Background technique

随着信息技术的发展，电子表格的重要性毋庸置疑。然而，业务处理过程中面临的表格并不只是简单的Excel和Word文档，还会有许多是以PDF扫描件和图片形式存在的表格图片。With the development of information technology, the importance of spreadsheets is beyond doubt. However, the tables faced in the process of business processing are not just simple Excel and Word documents, but also many table pictures in the form of PDF scans and pictures.

通常情况下，仅采用单元格检测方法对带有表格的图片进行表格识别，识别出的表格不准确，还需要人工对识别出的表格进行后期调整。Usually, only the cell detection method is used to perform table recognition on pictures with tables, and the recognized tables are inaccurate, and manual post-adjustment of the recognized tables is required.

发明内容Contents of the invention

为了解决上述问题，本申请提出一种表格识别方法、装置、设备及存储介质，能够有效提升表格识别的准确性。In order to solve the above problems, the present application proposes a form recognition method, device, equipment and storage medium, which can effectively improve the accuracy of form recognition.

根据本申请实施例的第一方面，提供了一种表格识别方法，包括：According to the first aspect of the embodiments of the present application, a form identification method is provided, including:

从待测图像中检测确定表格区域；Detect and determine the table area from the image to be tested;

基于所述表格区域确定对应的文本特征和位置特征；determining corresponding text features and location features based on the table area;

对所述文本特征和所述位置特征进行融合得到表格特征；Fusing the text feature and the position feature to obtain a form feature;

利用所述表格特征确定所述待测图像中的表格识别结果。Using the table features to determine a table recognition result in the image to be tested.

根据本申请实施例的第二方面，提供了一种表格识别装置，包括：According to a second aspect of the embodiments of the present application, a form recognition device is provided, including:

确定模块，用于从待测图像中检测确定表格区域；Determining a module, used to detect and determine the table area from the image to be tested;

处理模块，用于基于所述表格区域确定对应的文本特征和位置特征；A processing module, configured to determine corresponding text features and location features based on the table area;

融合模块，用于对所述文本特征和所述位置特征进行融合得到表格特征；A fusion module, used to fuse the text features and the position features to obtain table features;

识别模块，用于利用所述表格特征确定所述待测图像中的表格识别结果。A recognition module, configured to use the table features to determine a table recognition result in the image to be tested.

本申请第三方面提供了一种电子设备，包括：The third aspect of the present application provides an electronic device, including:

存储器和处理器；memory and processor;

所述存储器与所述处理器连接，用于存储程序；The memory is connected to the processor for storing programs;

所述处理器，通过运行所述存储器中的程序，实现上述的表格识别方法。The processor realizes the above table recognition method by running the program in the memory.

本申请第四方面提供了一种存储介质，所述存储介质上存储有计算机程序，所述计算机程度被处理器运行时，实现上述的表格识别方法。The fourth aspect of the present application provides a storage medium, on which a computer program is stored, and when the computer program is run by a processor, the above table recognition method is realized.

上述申请中的一个实施例具有如下优点或有益效果：An embodiment in the above application has the following advantages or beneficial effects:

本申请提出的技术方案从待检测图像中识别表格时，基于待测图像中的表格区域确定对应的文本特征和位置特征，对文本特征和位置特征进行融合得到表格特征，即实现了从多个维度对表格进行特征提取，从而利用表格特征能够更准确地识别待测图像中的表格，因此可以提高表格的识别精度。When the technical solution proposed by this application recognizes a form from an image to be detected, the corresponding text feature and position feature are determined based on the form area in the image to be tested, and the text feature and position feature are fused to obtain the form feature, that is, multiple The dimension extracts the features of the table, so that the table in the image to be tested can be identified more accurately by using the table feature, so the recognition accuracy of the table can be improved.

附图说明Description of drawings

为了更清楚地说明本申请实施例或现有技术中的技术方案，下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍，显而易见地，下面描述中的附图仅仅是本申请的实施例，对于本领域普通技术人员来讲，在不付出创造性劳动的前提下，还可以根据提供的附图获得其他的附图。In order to more clearly illustrate the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the drawings that need to be used in the description of the embodiments or the prior art. Obviously, the accompanying drawings in the following description are only It is an embodiment of the present application, and those skilled in the art can also obtain other drawings according to the provided drawings without creative work.

图1为本申请一实施例提供的一种表格识别方法的流程示意图；FIG. 1 is a schematic flow chart of a form recognition method provided by an embodiment of the present application;

图2为本申请一实施例提供的全线表的示意图；FIG. 2 is a schematic diagram of a full-line table provided by an embodiment of the present application;

图3为本申请一实施例提供的无线表的示意图；FIG. 3 is a schematic diagram of a wireless watch provided by an embodiment of the present application;

图4为本申请一实施例提供的少线表的示意图；FIG. 4 is a schematic diagram of a few-line table provided by an embodiment of the present application;

图5为本申请一实施例提供的单元格划分的示意图；FIG. 5 is a schematic diagram of cell division provided by an embodiment of the present application;

图6为本申请一实施例提供的文本行划分的示意图；FIG. 6 is a schematic diagram of text line division provided by an embodiment of the present application;

图7为本申请一实施例提供的一种表格识别方法的流程示意图；FIG. 7 is a schematic flowchart of a form recognition method provided by an embodiment of the present application;

图8为本申请一实施例提供的单元格和文本行划分的示意图；FIG. 8 is a schematic diagram of division of cells and text lines provided by an embodiment of the present application;

图9为本申请一实施例提供的特征融合的具体示意图；FIG. 9 is a specific schematic diagram of feature fusion provided by an embodiment of the present application;

图10为本申请一实施例提供的表格识别的具体示意图；FIG. 10 is a specific schematic diagram of form identification provided by an embodiment of the present application;

图11为本申请一实施例提供的一种表格识别装置的结构示意图；FIG. 11 is a schematic structural diagram of a form recognition device provided by an embodiment of the present application;

图12为本申请一实施例提供的一种电子设备的结构示意图。FIG. 12 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.

具体实施方式detailed description

本申请实施例技术方案适用于应用在各种表格识别的场景中，例如，智能办公场景、文本处理等。The technical solutions of the embodiments of the present application are applicable to various forms recognition scenarios, for example, smart office scenarios, text processing, and the like.

目前，随着人工智能相关技术的日益成熟，智能办公已经成为大部分用户的需求。大部分进行表格识别时，仅采用单元格检测方法对带有表格的图片进行处理，而此种方式对于少线表格的识别效果较差。At present, with the increasing maturity of artificial intelligence-related technologies, smart office has become the demand of most users. Most of the table recognition only uses the cell detection method to process the picture with the table, and this method has a poor recognition effect on the table with few lines.

因此，有必要提供一种表格识别方法、装置、设备及存储介质，能够更加准确地对表格进行识别。Therefore, it is necessary to provide a form identification method, device, equipment and storage medium, which can identify forms more accurately.

本申请实施例技术方案可示例性地应用于处理器、电子设备、服务器(包括云服务器)等硬件设备，或包装成软件程序被运行，当硬件设备执行本申请实施例技术方案的处理过程，或上述软件程序被运行时，利用待测图像中的表格区域确定的文本特征和位置特征进行融合得到表格特征，从而实现利用表格特征生成待测图像中的表格的目的。本申请实施例只对本申请技术方案的具体处理过程进行示例性介绍，并不对本申请技术方案的具体实现形式进行限定，任意的可以执行本申请技术方案处理过程的技术实现形式，都可以被本申请实施例所采用。The technical solutions of the embodiments of the present application can be exemplarily applied to hardware devices such as processors, electronic devices, servers (including cloud servers), or packaged into software programs to be run. When the hardware devices execute the processing process of the technical solutions of the embodiments of the present application, Or when the above software program is running, the text features and position features determined by the table area in the image to be tested are used to fuse to obtain the table feature, so as to achieve the purpose of using the table feature to generate the table in the image to be tested. The embodiment of the application only exemplifies the specific processing process of the technical solution of the application, and does not limit the specific implementation form of the technical solution of the application. Any technical implementation form that can execute the processing process of the technical solution of the application can be used by this application. Used in the application examples.

下面将结合本申请实施例中的附图，对本申请实施例中的技术方案进行清楚、完整地描述，显然，所描述的实施例仅仅是本申请一部分实施例，而不是全部的实施例。基于本申请中的实施例，本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例，都属于本申请保护的范围。The following will clearly and completely describe the technical solutions in the embodiments of the application with reference to the drawings in the embodiments of the application. Apparently, the described embodiments are only some of the embodiments of the application, not all of them. Based on the embodiments in this application, all other embodiments obtained by persons of ordinary skill in the art without making creative efforts belong to the scope of protection of this application.

示例性方法exemplary method

图1是根据本申请一实施例的表格识别方法的流程图。在一示例性实施例中，该一种表格识别方法，具体包括：Fig. 1 is a flow chart of a form recognition method according to an embodiment of the present application. In an exemplary embodiment, the form identification method specifically includes:

S110、从待测图像中检测确定表格区域；S110. Detect and determine the table area from the image to be tested;

S120、基于所述表格区域确定对应的文本特征和位置特征；S120. Determine corresponding text features and location features based on the table area;

S130、对所述文本特征和所述位置特征进行融合得到表格特征；S130. Fusing the text feature and the position feature to obtain a table feature;

S140、利用所述表格特征确定所述待测图像中的表格识别结果。S140. Determine a table recognition result in the image to be tested by using the table features.

在步骤S110中，示例性地，待测图像是指包含表格的图像，其中，表格由一行或多行单元格组成，表格用于显示数字和其他项，这样便于快速引用和分析，表格中的项被组织为行和列。如图2-4所示，表格类别多样，根据有无边框可以分为有线表、无线表，其中，有线表可以分为少线表(如三线表)和全线表等。可选地，待测图像可以是文章或网页的截图，还可以是拍照得到的图像。表格区域是指待测图像中表格所在的区域。可选地，表格区域可以是待测图像中的整个表格，还可以是待测图像中的部分表格，例如，一行表格所在的区域、每个单元格所在的区域等。具体地，可以采用目标检测或实例分割对待测图像进行检测，得到表格区域。例如，YOLO(You Only Look Once)、Faster RCNN(Faster Regions with CNNfeatures)、生成对抗网络(Generative Adversarial Network，GAN)。In step S110, for example, the image to be tested refers to an image containing a table, wherein the table is composed of one or more rows of cells, and the table is used to display numbers and other items, which is convenient for quick reference and analysis. Items are organized into rows and columns. As shown in Figure 2-4, there are various types of tables, which can be divided into wired tables and wireless tables according to the presence or absence of borders. Among them, wired tables can be divided into few-line tables (such as three-line tables) and full-line tables. Optionally, the image to be tested may be a screenshot of an article or a webpage, or an image obtained by taking a photo. The table area refers to the area where the table is located in the image to be tested. Optionally, the table area may be the entire table in the image to be tested, or a part of the table in the image to be tested, for example, the area where a row of the table is located, the area where each cell is located, and so on. Specifically, object detection or instance segmentation can be used to detect the image to be tested to obtain the table area. For example, YOLO (You Only Look Once), Faster RCNN (Faster Regions with CNNfeatures), Generative Adversarial Network (GAN).

在步骤S120中，示例性地，文本特征用于表示表格中文本的信息。可选地，文本特征可以包括：文本语义特征和文本格式特征。可选地，文本语义特征用于表示文本的含义。文本格式特征用于表示文本的视觉信息，例如，文本格式特征可以包括文本颜色、文本字号、文本字体、是否加粗、是否带下划线、字体分布、字体是否有背景色等。位置特征用于表示表格区域的位置。例如，坐标、表格中行的信息、表格中列的信息等。In step S120, for example, the text feature is used to represent text information in the table. Optionally, the text features may include: text semantic features and text format features. Optionally, text semantic features are used to represent the meaning of the text. The text format feature is used to represent the visual information of the text. For example, the text format feature may include text color, text font size, text font, whether it is bold, whether it is underlined, font distribution, whether the font has a background color, etc. The location feature is used to represent the location of table regions. For example, coordinates, information about rows in a table, information about columns in a table, etc.

可选地，当表格区域为整个表格时，文本特征对应整个表格中所有文本的文本特征，位置特征对应整个表格的位置特征。当表格区域为一行表格所在的区域，则文本特征是整个表格中每一行表格对应的文本特征，位置特征是整个表格中每一行表格对应的位置特征，这样，整个表格中有几行，就有几个文本特征和位置特征。当表格区域为每个单元格所在的区域，则文本特征是整个表格中每个单元格对应的文本特征，位置特征是整个表格中每个单元格对应的位置特征，这样，整个表格中有几个单元格，就有几个文本特征和位置特征。Optionally, when the table area is the entire table, the text features correspond to the text features of all texts in the entire table, and the position features correspond to the position features of the entire table. When the table area is the area where a row of tables is located, the text feature is the text feature corresponding to each row of the table in the entire table, and the location feature is the location feature corresponding to each row of the table in the entire table. In this way, there are several rows in the entire table, there is Several text features and location features. When the table area is the area where each cell is located, the text feature is the text feature corresponding to each cell in the entire table, and the position feature is the position feature corresponding to each cell in the entire table. In this way, how many cell, there are several text features and location features.

具体地，可以根据表格图像、文本特征和位置特征对神经网络进行训练，从而得到训练好的神经网络模型。这样，根据训练好的神经网络模型对表格区域进行识别，就可以输出文本特征和位置特征。还可以是将表格区域输入至预先训练好的文本识别模型得到文本特征，将表格区域输入至预先训练好的位置识别模型得到位置特征，从而确定出文本特征和位置特征。文本识别模型可以是卷积循环神经网络结构(Convolutional RecurrentNeural Network，CRNN)等模型，位置识别模型可以是U-net模型，还可以是任意可以在图像中识别坐标的模型，在此不作限定。Specifically, the neural network can be trained according to table images, text features and location features, so as to obtain a trained neural network model. In this way, the text feature and position feature can be output by identifying the table area according to the trained neural network model. It is also possible to input the table area into a pre-trained text recognition model to obtain text features, and input the table area to a pre-trained position recognition model to obtain position features, thereby determining the text features and position features. The text recognition model can be a convolutional recurrent neural network (Convolutional RecurrentNeural Network, CRNN) model, and the position recognition model can be a U-net model, or any model that can recognize coordinates in an image, which is not limited here.

在步骤S130中，示例性地，表格特征用于反映表格的位置和表格中的具体内容。可选地，可以是将文本特征和位置特征进行拼接得到表格特征。可选地，还可以是将文本特征和位置特征进行拼接得到融合特征，将融合特征输入至编码器或神经网络(例如，前馈神经网络模型(Convolutional Neural Network，CNN)、循环神经网络(Recurrent NeuralNetworks，RNN))得到表格特征。In step S130, for example, the table feature is used to reflect the location of the table and the specific content in the table. Optionally, the table features can be obtained by concatenating text features and position features. Optionally, it is also possible to splice text features and position features to obtain fusion features, and input the fusion features to an encoder or a neural network (for example, a feedforward neural network model (Convolutional Neural Network, CNN), a recurrent neural network (Recurrent Neural Network) NeuralNetworks, RNN)) get tabular features.

在步骤S140中，示例性地，表格识别结果用于表示待测图像中表格结构和/或表格内容。可选地，可以根据预先训练好的神经网络对表格特征进行处理得到表格识别结果。还可以是根据表格特征中的位置信息对表格特征进行组合得到表格识别结果。In step S140, for example, the table recognition result is used to represent the table structure and/or table content in the image to be tested. Optionally, the table features can be processed according to a pre-trained neural network to obtain a table recognition result. It is also possible to combine the form features according to the position information in the form features to obtain the form recognition result.

在本申请的技术方案中，从待检测图像中识别表格时，基于待测图像中的表格区域确定对应的文本特征和位置特征，对文本特征和位置特征进行融合得到表格特征，即实现了从多个维度对表格进行特征提取，从而利用表格特征能够更准确地识别待测图像中的表格，因此可以提高表格的识别精度。In the technical solution of the present application, when identifying a form from an image to be detected, the corresponding text feature and position feature are determined based on the form area in the image to be tested, and the text feature and position feature are fused to obtain the form feature, that is, from Feature extraction is performed on tables in multiple dimensions, so that the table features in the image to be tested can be used to more accurately identify the table in the image to be tested, so the recognition accuracy of the table can be improved.

在一种实施方式中，基于所述表格区域确定对应的文本特征和位置特征，包括：In one embodiment, determining corresponding text features and location features based on the table area includes:

确定所述表格区域中的每一表格元素对应的文本特征和位置特征；所述表格元素包括单元格和/或文本行。Determining text features and location features corresponding to each table element in the table area; the table elements include cells and/or text rows.

示例性地，表格区域可以包括多个表格元素，表格元素包括单元格和/或文本行。其中，单元格根据表格中的线条进行划分，例如，当表格是全线表时，则根据行列表格线确定单元格；当表格是三线表时，则根据其上下两个行表格线确定单元格。其中，文本行可以是单个字、词汇、词组、句子等，文本行还可以是根据字和符号组成的文本，在此不作限定。具体地，单元格和文本行都可以是通过目标检测确定的。Exemplarily, the table area may include a plurality of table elements, and the table elements include cells and/or text rows. Among them, the cells are divided according to the lines in the table. For example, when the table is a full-line table, the cells are determined according to the row-column grid lines; when the table is a three-line table, the cells are determined according to the upper and lower row grid lines. Wherein, the text line can be a single character, vocabulary, phrase, sentence, etc., and the text line can also be a text composed of words and symbols, which is not limited here. Specifically, both cells and text lines can be determined through object detection.

可选地，当表格元素中包括单元格或文本行时，确定所述表格区域中的每一表格元素对应的文本特征和位置特征，包括：Optionally, when the table elements include cells or text rows, determining the text features and position features corresponding to each table element in the table area includes:

在所述表格区域中分别确定单元格检测结果或文本行检测结果；Determining cell detection results or text line detection results respectively in the table area;

基于所述单元格检测结果确定对应的文本特征和位置特征；Determining corresponding text features and location features based on the cell detection result;

或，基于所述文本行检测结果确定对应的文本特征和位置特征。Or, determine the corresponding text feature and position feature based on the text line detection result.

在本实施例中，如图5所示，表格元素中包括单元格时，对表格区域进行检测，即根据行表格线确定出两个单元格检测结果，将图5中上面的单元格称为第一单元格检测结果，下面的单元格称为第二单元格检测结果。对第一单元格检测结果进行文本识别和位置识别，得到第一单元格检测结果对应的文本特征和位置特征，对第二单元格检测结果进行文本识别和位置识别，得到第二单元格检测结果对应的文本特征和位置特征。可以理解的是，第一单元格检测结果的文本特征是包括：Model type、Acc、BLEU-2、BLEU-4的特征，第一单元格检测结果的位置特征是第一单元格对应的位置信息，第二单元格检测结果对应的文本特征和位置特征同理。这样，可以准确地确定表格中各个单元格之间的位置关系和单元格中的文本关系，从而可以更准确地识别表格。In this embodiment, as shown in Figure 5, when the table element includes a cell, the table area is detected, that is, two cell detection results are determined according to the row table line, and the upper cell in Figure 5 is called The detection result of the first cell, the following cells are called the detection result of the second cell. Perform text recognition and position recognition on the detection result of the first cell to obtain the text feature and position feature corresponding to the detection result of the first cell, and perform text recognition and position recognition on the detection result of the second cell to obtain the detection result of the second cell Corresponding text features and location features. It can be understood that the text feature of the first cell detection result includes: Model type, Acc, BLEU-2, BLEU-4 features, and the location feature of the first cell detection result is the location information corresponding to the first cell , the text feature and position feature corresponding to the second cell detection result are the same. In this way, the positional relationship among the cells in the table and the textual relationship in the cells can be accurately determined, so that the table can be recognized more accurately.

在本实施例中，如图6所示，表格元素中包括文本行时，对表格区域进行检测，即根据文本内容确定出十二个文本行检测结果，针对每个文本行检测结果确定进行文本识别和位置识别，得到每个文本行检测结果对应的文本特征和位置特征。例如，针对Model type确定对应的文本特征和位置特征，针对Vision only确定对应的文本特征和位置特征。如此重复，可以确定出十二个文本特征和十二个位置特征。这样，可以准确地确定表格中各个文本行之间的位置关系和文本行的含义，从而可以更准确地识别表格。In this embodiment, as shown in Figure 6, when the table element includes a text row, the table area is detected, that is, twelve text row detection results are determined according to the text content, and the text is determined for each text row detection result. Recognition and position recognition to obtain the text features and position features corresponding to each text line detection result. For example, determine the corresponding text features and location features for Model type, and determine the corresponding text features and location features for Vision only. By repeating this, twelve text features and twelve location features can be determined. In this way, the positional relationship between each text line in the table and the meaning of the text line can be accurately determined, so that the table can be recognized more accurately.

优选地，如图7所示，在所述表格元素包括单元格和文本行的情况下，所述确定所述表格区域中的每一表格元素对应的文本特征和位置特征，包括：Preferably, as shown in Figure 7, in the case where the table elements include cells and text rows, the determination of the text features and position features corresponding to each table element in the table area includes:

S710、在所述表格区域中分别确定单元格检测结果和文本行检测结果；S710. Determine cell detection results and text line detection results respectively in the table area;

S720、基于所述单元格检测结果确定对应的文本特征和位置特征；S720. Determine a corresponding text feature and position feature based on the cell detection result;

S730、基于所述文本行检测结果确定对应的文本特征和位置特征。S730. Determine corresponding text features and position features based on the text line detection result.

在本实施例中，如图8所示，表格元素中包括单元格和文本行时，可以对表格区域进行表格元素(token)检测，具体地，可以使用Faster RCNN对表格区域进行检测，确定表格区域中的单元格检测结果和文本行检测结果。针对每个单元格检测结果进行文本识别和位置识别，得到每个单元格检测结果对应的文本特征和位置特征。例如，单元格中有Modeltype、Acc、BLEU-2、BLEU-4，那么单元格检测结果对应的文本特征则是包含Model type、Acc、BLEU-2、BLEU-4的特征。单元格检测结果的位置特征是该单元格对应的位置信息，例如，单元格的中心点位置或单元格的顶点位置等。In this embodiment, as shown in Figure 8, when a table element includes a cell and a text row, the table area can be detected for the table element (token), specifically, the table area can be detected using Faster RCNN to determine the form Cell detection results and text line detection results in ranges. Perform text recognition and location recognition for each cell detection result, and obtain text features and location features corresponding to each cell detection result. For example, if there are Modeltype, Acc, BLEU-2, and BLEU-4 in the cell, then the text feature corresponding to the cell detection result is the feature containing Model type, Acc, BLEU-2, and BLEU-4. The position feature of the cell detection result is the position information corresponding to the cell, for example, the position of the center point of the cell or the position of the vertex of the cell.

针对每个文本行检测结果确定进行文本识别和位置识别，得到每个文本行检测结果对应的文本特征和位置特征。例如，针对Model type确定对应的文本特征和位置特征，针对Acc确定对应的文本特征和位置特征。如此重复，可以确定出多个文本特征和位置特征。这样，不仅可以确定单元格之间的位置关系和单元格内的文本关系，还可以确定个文本行之间的位置关系和文本行的含义，自底向上的确定了少线表格中单元格和文本行之间的关系，从而可以更准确地识别少线表格。For each text line detection result, it is determined to perform text recognition and position recognition, and the text feature and position feature corresponding to each text line detection result are obtained. For example, determine the corresponding text feature and location feature for Model type, and determine the corresponding text feature and location feature for Acc. By repeating this, multiple text features and location features can be determined. In this way, not only the positional relationship between the cells and the textual relationship in the cell can be determined, but also the positional relationship between each text line and the meaning of the text line can be determined, and the cell and text relationship in the multi-line table can be determined from the bottom up. relationship between lines of text, allowing for more accurate identification of lines-less tables.

在一种实施方式中，在所述表格元素包括单元格的情况下，所述对所述文本特征和所述位置特征进行融合得到表格特征，包括：In one embodiment, when the table element includes a cell, the fusion of the text feature and the position feature to obtain the table feature includes:

针对每一单元格，将所述单元格对应的文本特征和位置特征进行拼接处理，得到所述单元格对应的表格特征。For each cell, the text feature and position feature corresponding to the cell are concatenated to obtain the table feature corresponding to the cell.

示例性地，根据文本识别模型对每一单元格进行识别，得到单元格中的内容，若识别出单元格中包含多个内容，可以将多个内容拼接得到该单元格对应的文本特征。根据位置识别模型对每一单元格的位置信息进行识别得到位置特征。将文本特征和位置特征进行拼接得到表格特征，以实现对多模态特征的融合，使得表格特征可以反映多种维度的特征。Exemplarily, each cell is recognized according to the text recognition model to obtain the content in the cell. If it is recognized that the cell contains multiple contents, the multiple contents can be spliced to obtain the text feature corresponding to the cell. According to the position identification model, the position information of each cell is identified to obtain the position feature. The text features and position features are concatenated to obtain the table features to realize the fusion of multi-modal features, so that the table features can reflect the characteristics of multiple dimensions.

在一种实施方式中，所述表格元素包括文本行的情况下，所述对所述文本特征和所述位置特征进行融合得到表格特征，包括：In one embodiment, when the table element includes a text row, the fusion of the text feature and the position feature to obtain the table feature includes:

针对每一文本行，将所述文本行对应的文本特征和位置特征进行拼接处理，得到所述文本行对应的表格特征。For each text line, the text feature and position feature corresponding to the text line are spliced to obtain the table feature corresponding to the text line.

示例性地，根据文本识别模型对每一文本行进行识别，得到每一文本行对应的文本特征。根据位置识别模型对每一文本行的位置信息进行识别得到位置特征。将文本特征和位置特征进行拼接得到表格特征，以实现对多模态特征的融合，使得表格特征可以反映多种维度的特征。Exemplarily, each text line is identified according to the text recognition model, and text features corresponding to each text line are obtained. According to the position recognition model, the position information of each text line is recognized to obtain the position feature. The text features and position features are concatenated to obtain the table features to realize the fusion of multi-modal features, so that the table features can reflect the characteristics of multiple dimensions.

在一种实施方式中，其中，所述文本特征包括：文本格式特征和语义特征；In one embodiment, the text features include: text format features and semantic features;

相应地，所述对所述文本特征和所述位置特征进行融合得到表格特征，包括：Correspondingly, the fusion of the text features and the position features to obtain table features includes:

对所述文本格式特征、所述语义特征和所述位置特征进行融合得到表格特征。A table feature is obtained by fusing the text format feature, the semantic feature and the position feature.

示例性地，可以根据训练好的神经网络模型对表格区域进行识别，输出文本格式特征、语义特征和位置特征。还可以是将表格区域输入至预先训练好的视觉分类模型得到文本格式特征，表格区域输入至预先训练好的文本识别模型得到语义特征，将表格区域输入至预先训练好的位置识别模型得到位置特征，从而确定出文本格式特征、语义特征和位置特征。Exemplarily, the table region can be identified according to the trained neural network model, and text format features, semantic features and position features can be output. It is also possible to input the table area into a pre-trained visual classification model to obtain text format features, input the table area to a pre-trained text recognition model to obtain semantic features, and input the table area to a pre-trained position recognition model to obtain location features , so as to determine the text format features, semantic features and location features.

在本实施例中，如图9所示，针对表格区域中的每一表格元素通过CNN在提取文本格式特征，通过BERT(Bidirectional Encoder Representations from Transformers)提取语义特征、根据位置编码器(position embedding)确定位置特征。对上述文本格式特征、语义特征和位置特征进行拼接处理，得到每一表格元素对应的表格特征，使得每个表格特征都是多维度的特征，从而可以对表格进行多模态的表示。这样，在表格区域中包括多个表格元素时，可以对应得到多个表格特征。In this embodiment, as shown in FIG. 9 , for each table element in the table area, CNN is used to extract text format features, and BERT (Bidirectional Encoder Representations from Transformers) is used to extract semantic features. Determine location characteristics. The above-mentioned text format features, semantic features and position features are spliced to obtain the table features corresponding to each table element, so that each table feature is a multi-dimensional feature, so that the table can be represented in multiple modes. In this way, when multiple table elements are included in the table area, multiple table features can be correspondingly obtained.

在一种实施方式中，所述利用所述表格特征确定所述待测图像中的表格识别结果，包括：In one embodiment, the determining the table recognition result in the image to be tested by using the table features includes:

将所述表格特征确定表格的预测序列；Determining the prediction sequence of the form by the form feature;

根据所述预测序列确定所述待测图像中的表格拓扑结构。The table topology in the image to be tested is determined according to the prediction sequence.

示例性地，预测序列可以是超文本标记语言定义的序列，即html标注的序列。可以是将表格特征输入至预设的分类模型中，得到所述待测图像中的表格预测序列；其中，分类模型是根据训练数据训练得到的，可选地，训练数据是图像样本的表格区域中的表格特征和表格区域对应的序列标注。将表格区域中的表格特征的作为分类模型的输入，表格区域对应的序列标注作为分类模型的目标，以对分类模型进行训练。具体地，分类模型可以采用transformer模型。Exemplarily, the prediction sequence may be a sequence defined by hypertext markup language, that is, a sequence of html annotations. It may be to input table features into a preset classification model to obtain a table prediction sequence in the image to be tested; wherein, the classification model is trained according to training data, and optionally, the training data is a table area of an image sample The tabular features in and the sequence labels corresponding to the tabular regions. The table feature in the table area is used as the input of the classification model, and the sequence label corresponding to the table area is used as the target of the classification model to train the classification model. Specifically, the classification model may adopt a transformer model.

具体地，将表格特征输入至预设的分类模型中输出预测序列，根据预测系列即可确定出待测图像中表格的行和列，从而确定出表格拓扑结构。如此，将表格结构识别转化为序列预测，增加了表格识别的准确性。进一步地，还可以将文本特征添加至表格拓扑结构中，从而补充表格中的内容。Specifically, the table features are input into a preset classification model to output a prediction sequence, and the rows and columns of the table in the image to be tested can be determined according to the prediction series, thereby determining the topological structure of the table. In this way, the table structure recognition is transformed into sequence prediction, which increases the accuracy of table recognition. Furthermore, text features can also be added to the table topology to supplement the content in the table.

在本实施例中，如图10所示，首先对表格区域进行token检测确定单元格检测结果和文本行检测结果。确定每个单元格检测结果和文本行检测结果对应的文本格式特征、语义特征和位置特征。针对每个单元格检测结果对于文本格式特征、语义特征和位置特征进行拼接，得到每个单元格检测结果对应的表格特征。针对每个文本行检测结果对于文本格式特征、语义特征和位置特征进行拼接，得到每个文本行检测结果对应的表格特征。如此，得到了自底向上多模态表格特征。再将上述表格特征输入至transformer模型，输出html标注的预测序列。根据html标注的预测序列则可以直接输出表格拓扑结构，实现自顶而下的表格识别，即使面对少线表格依然可以准确识别表格的结构。In this embodiment, as shown in FIG. 10 , token detection is first performed on the table area to determine cell detection results and text line detection results. Determine the text format feature, semantic feature and position feature corresponding to each cell detection result and text line detection result. For each cell detection result, the text format feature, semantic feature and position feature are spliced to obtain the table feature corresponding to each cell detection result. For each text line detection result, the text format feature, semantic feature and position feature are spliced to obtain the table feature corresponding to each text line detection result. In this way, bottom-up multimodal tabular features are obtained. Then input the above table features into the transformer model, and output the prediction sequence marked by html. According to the prediction sequence marked in html, the topological structure of the table can be directly output to realize top-down table recognition, and the structure of the table can be accurately recognized even in the face of a few-line table.

示例性装置Exemplary device

相应的，图11是根据本申请一实施例的表格识别装置的结构示意图。在一示例性实施例中，本申请实施例还提出了一种表格识别装置，该装置包括：Correspondingly, FIG. 11 is a schematic structural diagram of a form recognition device according to an embodiment of the present application. In an exemplary embodiment, the embodiment of the present application also proposes a form recognition device, which includes:

确定模块1110，用于从待测图像中检测确定表格区域；A determining module 1110, configured to detect and determine the table area from the image to be tested;

处理模块1120，用于基于所述表格区域确定对应的文本特征和位置特征；A processing module 1120, configured to determine corresponding text features and location features based on the table area;

融合模块1130，用于对所述文本特征和所述位置特征进行融合得到表格特征；A fusion module 1130, configured to fuse the text features and the location features to obtain table features;

识别模块1140，用于利用所述表格特征确定所述待测图像中的表格识别结果。A recognition module 1140, configured to determine a table recognition result in the image to be tested by using the table features.

在一种实施方式中，处理模块1120，还用于：In one embodiment, the processing module 1120 is also used to:

在一种实施方式中，在所述表格元素包括单元格和文本行的情况下，所述确定所述表格区域中的每一表格元素对应的文本特征和位置特征，包括：In one embodiment, in the case where the table elements include cells and text rows, the determining the text features and position features corresponding to each table element in the table area includes:

在所述表格区域中分别确定单元格检测结果和文本行检测结果；Determine the cell detection result and the text line detection result respectively in the table area;

基于所述文本行检测结果确定对应的文本特征和位置特征。The corresponding text feature and position feature are determined based on the text line detection result.

在一种实施方式中，在所述表格元素包括单元格的情况下，所述对融合模块1130，还用于：In one embodiment, when the table element includes a cell, the fusion module 1130 is further configured to:

在一种实施方式中，所述表格元素包括文本行的情况下，所述对融合模块1130，还用于：In one embodiment, when the form element includes a text row, the fusion module 1130 is further configured to:

相应地，所述对融合模块1130，还用于；Correspondingly, the pair of fusion modules 1130 is also used for;

在一种实施方式中，所述识别模块1140，还用于：In one embodiment, the identification module 1140 is also used to:

本实施例提供的表格识别装置，与本申请上述实施例所提供的表格识别方法属于同一申请构思，可执行本申请上述任意实施例所提供的表格识别方法，具备执行表格识别方法相应的功能模块和有益效果。未在本实施例中详尽描述的技术细节，可参见本申请上述实施例提供的表格识别方法的具体处理内容，此处不再加以赘述。The form recognition device provided in this embodiment belongs to the same application concept as the form recognition method provided in the above-mentioned embodiments of the present application, can execute the form recognition method provided in any of the above-mentioned embodiments of the present application, and has corresponding functional modules for executing the form recognition method and beneficial effects. For technical details that are not described in detail in this embodiment, refer to the specific processing content of the table recognition method provided in the above embodiments of the present application, and will not be repeated here.

示例性电子设备Exemplary electronic device

本申请另一实施例还提出一种电子设备，参见图12所示，该设备包括：Another embodiment of the present application also proposes an electronic device, as shown in FIG. 12 , the device includes:

存储器1200和处理器1210；memory 1200 and processor 1210;

其中，所述存储器1200与所述处理器1210连接，用于存储程序；Wherein, the memory 1200 is connected to the processor 1210 for storing programs;

所述处理器1210，用于通过运行所述存储器1200中存储的程序，实现上述任一实施例公开的表格识别方法。The processor 1210 is configured to execute the program stored in the memory 1200 to implement the form recognition method disclosed in any of the above embodiments.

具体的，上述电子设备还可以包括：总线、通信接口1220、输入设备1230和输出设备240。Specifically, the above electronic device may further include: a bus, a communication interface 1220 , an input device 1230 and an output device 240 .

处理器1210、存储器1200、通信接口1220、输入设备1230和输出设备1240通过总线相互连接。其中：The processor 1210, the memory 1200, the communication interface 1220, the input device 1230, and the output device 1240 are connected to each other through a bus. in:

总线可包括一通路，在计算机系统各个部件之间传送信息。A bus may include a pathway that carries information between various components of a computer system.

处理器1210可以是通用处理器，例如通用中央处理器(CPU)、微处理器等，也可以是特定应用集成电路(application-specific integrated circuit，ASIC)，或一个或多个用于控制本发明方案程序执行的集成电路。还可以是数字信号处理器(DSP)、专用集成电路(ASIC)、现成可编程门阵列(FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。The processor 1210 may be a general processor, such as a general central processing unit (CPU), a microprocessor, etc., or may be an application-specific integrated circuit (ASIC), or one or more integrated circuit for program execution. It can also be a digital signal processor (DSP), an application specific integrated circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, and discrete hardware components.

处理器1210可包括主处理器，还可包括基带芯片、调制解调器等。The processor 1210 may include a main processor, and may also include a baseband chip, a modem, and the like.

存储器1200中保存有执行本发明技术方案的程序，还可以保存有操作系统和其他关键业务。具体地，程序可以包括程序代码，程序代码包括计算机操作指令。更具体的，存储器1200可以包括只读存储器(read-only memory，ROM)、可存储静态信息和指令的其他类型的静态存储设备、随机存取存储器(random access memory，RAM)、可存储信息和指令的其他类型的动态存储设备、磁盘存储器、flash等等。The memory 1200 stores the program for executing the technical solution of the present invention, and may also store the operating system and other key services. Specifically, the program may include program code, and the program code includes computer operation instructions. More specifically, the memory 1200 may include read-only memory (read-only memory, ROM), other types of static storage devices that can store static information and instructions, random access memory (random access memory, RAM), that can store information and Other types of dynamic storage devices, disk storage, flash, etc. for instructions.

输入设备1230可包括接收用户输入的数据和信息的装置，例如键盘、鼠标、摄像头、扫描仪、光笔、语音输入装置、触摸屏、计步器或重力感应器等。The input device 1230 may include a device for receiving data and information input by a user, such as a keyboard, a mouse, a camera, a scanner, a light pen, a voice input device, a touch screen, a pedometer or a gravity sensor, and the like.

输出设备1240可包括允许输出信息给用户的装置，例如显示屏、打印机、扬声器等。Output devices 1240 may include devices that allow information to be output to a user, such as a display screen, printer, speakers, and the like.

通信接口1220可包括使用任何收发器一类的装置，以便与其他设备或通信网络通信，如以太网，无线接入网(RAN)，无线局域网(WLAN)等。Communication interface 1220 may include the use of any transceiver-like means to communicate with other devices or communication networks, such as Ethernet, radio access network (RAN), wireless local area network (WLAN), and the like.

处理器1210执行存储器1200中所存放的程序，以及调用其他设备，可用于实现本申请上述实施例所提供的任意一种表格识别方法的各个步骤。The processor 1210 executes the program stored in the memory 1200, and calls other devices, which can be used to implement each step of any form recognition method provided by the above-mentioned embodiments of the present application.

示例性计算机程序产品和存储介质Exemplary computer program product and storage medium

除了上述方法和设备以外，本申请的实施例还可以是计算机程序产品，其包括计算机程序指令，所述计算机程序指令在被处理器运行时使得所述处理器执行本说明书上述“示例性方法”部分中描述的根据本申请各种实施例的表格识别方法中的步骤。In addition to the above-mentioned methods and devices, embodiments of the present application may also be computer program products, which include computer program instructions that, when executed by a processor, cause the processor to perform the above-mentioned "exemplary method" of this specification. The steps in the table recognition method according to various embodiments of the application described in the section.

所述计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本申请实施例操作的程序代码，所述程序设计语言包括面向对象的程序设计语言，诸如Java、C++等，还包括常规的过程式程序设计语言，诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。The computer program product can be written in any combination of one or more programming languages to execute the program codes for performing the operations of the embodiments of the present application, and the programming languages include object-oriented programming languages, such as Java, C++, etc. , also includes conventional procedural programming languages, such as the "C" language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server to execute.

此外，本申请的实施例还可以是存储介质，其上存储有计算机程序，计算机程序被处理器执行本说明书上述“示例性方法”部分中描述的根据本申请各种实施例的表格识别方法中的步骤。In addition, the embodiment of the present application may also be a storage medium, on which a computer program is stored, and the computer program is executed by a processor in the table recognition method according to various embodiments of the present application described in the above "Exemplary Method" section of this specification. A step of.

上述的电子设备的具体工作内容，以及上述的计算机程序产品和存储介质上的计算机程序被处理器运行时的具体工作内容，均可以参见上述的方法实施例的内容，此处不再赘述。For the specific working content of the above-mentioned electronic device, and the specific working content of the above-mentioned computer program product and the computer program on the storage medium being run by the processor, please refer to the content of the above-mentioned method embodiment, and will not be repeated here.

对于前述的各方法实施例，为了简单描述，故将其都表述为一系列的动作组合，但是本领域技术人员应该知悉，本申请并不受所描述的动作顺序的限制，因为依据本申请，某些步骤可以采用其他顺序或者同时进行。其次，本领域技术人员也应该知悉，说明书中所描述的实施例均属于优选实施例，所涉及的动作和模块并不一定是本申请所必须的。For the aforementioned method embodiments, for the sake of simple description, they are expressed as a series of action combinations, but those skilled in the art should know that the application is not limited by the described action sequence, because according to the application, Certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification belong to preferred embodiments, and the actions and modules involved are not necessarily required by this application.

需要说明的是，本说明书中的各个实施例均采用递进的方式描述，每个实施例重点说明的都是与其他实施例的不同之处，各个实施例之间相同相似的部分互相参见即可。对于装置类实施例而言，由于其与方法实施例基本相似，所以描述的比较简单，相关之处参见方法实施例的部分说明即可。It should be noted that each embodiment in this specification is described in a progressive manner, and each embodiment focuses on the difference from other embodiments. For the same and similar parts in each embodiment, refer to each other, that is, Can. As for the device-type embodiments, since they are basically similar to the method embodiments, the description is relatively simple, and for related parts, please refer to part of the description of the method embodiments.

本申请各实施例方法中的步骤可以根据实际需要进行顺序调整、合并和删减，各实施例中记载的技术特征可以进行替换或者组合。The steps in the method of each embodiment of the present application can be adjusted, combined and deleted according to actual needs, and the technical features recorded in each embodiment can be replaced or combined.

本申请各实施例种装置及终端中的模块和子模块可以根据实际需要进行合并、划分和删减。The modules and submodules in the devices and terminals in the various embodiments of the present application can be combined, divided and deleted according to actual needs.

本申请所提供的几个实施例中，应该理解到，所揭露的终端，装置和方法，可以通过其它的方式实现。例如，以上所描述的终端实施例仅仅是示意性的，例如，模块或子模块的划分，仅仅为一种逻辑功能划分，实际实现时可以有另外的划分方式，例如多个子模块或模块可以结合或者可以集成到另一个模块，或一些特征可以忽略，或不执行。另一点，所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口，装置或模块的间接耦合或通信连接，可以是电性，机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed terminal, device and method may be implemented in other ways. For example, the terminal embodiments described above are only illustrative. For example, the division of modules or sub-modules is only a logical function division. In actual implementation, there may be other division methods. For example, multiple sub-modules or modules can be combined Or it can be integrated into another module, or some features can be ignored, or not implemented. In another point, the mutual coupling or direct coupling or communication connection shown or discussed may be through some interfaces, and the indirect coupling or communication connection of devices or modules may be in electrical, mechanical or other forms.

作为分离部件说明的模块或子模块可以是或者也可以不是物理上分开的，作为模块或子模块的部件可以是或者也可以不是物理模块或子模块，即可以位于一个地方，或者也可以分布到多个网络模块或子模块上。可以根据实际的需要选择其中的部分或者全部模块或子模块来实现本实施例方案的目的。The modules or sub-modules described as separate components may or may not be physically separated, and the components as modules or sub-modules may or may not be physical modules or sub-modules, that is, they may be located in one place, or may also be distributed to on multiple network modules or submodules. Part or all of the modules or sub-modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.

另外，在本申请各个实施例中的各功能模块或子模块可以集成在一个处理模块中，也可以是各个模块或子模块单独物理存在，也可以两个或两个以上模块或子模块集成在一个模块中。上述集成的模块或子模块既可以采用硬件的形式实现，也可以采用软件功能模块或子模块的形式实现。In addition, each functional module or submodule in each embodiment of the present application may be integrated into one processing module, each module or submodule may exist separately physically, or two or more modules or submodules may be integrated in one processing module. in a module. The above-mentioned integrated modules or sub-modules can be implemented in the form of hardware or in the form of software function modules or sub-modules.

专业人员还可以进一步意识到，结合本文中所公开的实施例描述的各示例的单元及算法步骤，能够以电子硬件、计算机软件或者二者的结合来实现，为了清楚地说明硬件和软件的可互换性，在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行，取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能，但是这种实现不应认为超出本申请的范围。Professionals can further realize that the units and algorithm steps of the examples described in conjunction with the embodiments disclosed herein can be implemented by electronic hardware, computer software or a combination of the two. In order to clearly illustrate the possible For interchangeability, in the above description, the composition and steps of each example have been generally described according to their functions. Whether these functions are executed by hardware or software depends on the specific application and design constraints of the technical solution. Those skilled in the art may use different methods to implement the described functions for each specific application, but such implementation should not be regarded as exceeding the scope of the present application.

结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件单元，或者二者的结合来实施。软件单元可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。The steps of the methods or algorithms described in conjunction with the embodiments disclosed herein may be directly implemented by hardware, software units executed by a processor, or a combination of both. The software unit can be placed in random access memory (RAM), internal memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other Any other known storage medium.

最后，还需要说明的是，在本文中，诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来，而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且，术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含，从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素，而且还包括没有明确列出的其他要素，或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下，由语句“包括一个……”限定的要素，并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。Finally, it should also be noted that in this text, relational terms such as first and second etc. are only used to distinguish one entity or operation from another, and do not necessarily require or imply that these entities or operations, any such actual relationship or order exists. Furthermore, the term "comprises", "comprises" or any other variation thereof is intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus comprising a set of elements includes not only those elements, but also includes elements not expressly listed. other elements of or also include elements inherent in such a process, method, article, or device. Without further limitations, an element defined by the phrase "comprising a ..." does not exclude the presence of additional identical elements in the process, method, article or apparatus comprising said element.

对所公开的实施例的上述说明，使本领域技术人员能够实现或使用本申请。对这些实施例的多种修改对本领域的技术人员来说将是显而易见的，本文中所定义的一般原理可以在不脱离本申请的精神或范围的情况下，在其它实施例中实现。因此，本申请将不会被限制于本文所示的这些实施例，而是要符合与本文所公开的原理和新颖特点相一致的最宽的范围。The above description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the general principles defined herein may be implemented in other embodiments without departing from the spirit or scope of the application. Therefore, the present application will not be limited to the embodiments shown herein, but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A form identification method, characterized in that, comprising:

Detect and determine the table area from the image to be tested;

determining corresponding text features and location features based on the table area;

Fusing the text feature and the position feature to obtain a table feature;

Using the table features to determine a table recognition result in the image to be tested.

2. The method according to claim 1, wherein said determining corresponding text features and location features based on said table area comprises:

Determining text features and location features corresponding to each table element in the table area; the table elements include cells and/or text rows.

3. The method according to claim 2, wherein, in the case where the form elements include cells and text rows, the determination of the text features and position features corresponding to each form element in the form area ,include:

Determine the cell detection result and the text line detection result respectively in the table area;

Determining corresponding text features and location features based on the cell detection result;

The corresponding text feature and position feature are determined based on the text line detection result.

4. The method according to claim 2, wherein, in the case where the form element comprises a cell, the fusion of the text feature and the position feature to obtain the form feature comprises:

For each cell, the text feature and position feature corresponding to the cell are concatenated to obtain the table feature corresponding to the cell.

5. The method according to claim 2, wherein, when the form element includes a text row, the fusion of the text feature and the position feature to obtain the form feature includes:

For each text line, the text feature and position feature corresponding to the text line are spliced to obtain the table feature corresponding to the text line.

6. The method according to claim 1, wherein the text features include: text format features and semantic features;

Correspondingly, the fusion of the text features and the position features to obtain table features includes:

A table feature is obtained by fusing the text format feature, the semantic feature and the position feature.

7. The method according to claim 1, wherein said utilizing said form feature to determine the form recognition result in said image to be tested comprises:

Determining the prediction sequence of the form by the form feature;

The table topology in the image to be tested is determined according to the prediction sequence.

8. A form recognition device, characterized in that, comprising:

Determining a module, used to detect and determine the table area from the image to be tested;

A processing module, configured to determine corresponding text features and location features based on the table area;

A fusion module, used to fuse the text features and the position features to obtain table features;

A recognition module, configured to use the table features to determine a table recognition result in the image to be tested.

9. An electronic device, characterized in that it comprises:

memory and processor;

The memory is connected to the processor for storing programs;

The processor implements the form recognition method according to any one of claims 1 to 7 by running the program in the memory.

10. A storage medium, characterized in that a computer program is stored on the storage medium, and when the computer program is run by a processor, the form recognition method according to any one of claims 1 to 7 is realized.