CN120011476A

CN120011476A - Document processing method, electronic device and program product

Info

Publication number: CN120011476A
Application number: CN202510085508.4A
Authority: CN
Inventors: 黄汝梅; 田誉; 劳文超; 郭惜; 谢炅峰; 陈少敏; 江嘉铭; 钟恒辉; 林健敏; 钟业荣
Original assignee: Guangdong Power Grid Co Ltd; Qingyuan Power Supply Bureau of Guangdong Power Grid Co Ltd
Current assignee: Guangdong Power Grid Co Ltd; Qingyuan Power Supply Bureau of Guangdong Power Grid Co Ltd
Priority date: 2025-01-20
Filing date: 2025-01-20
Publication date: 2025-05-16

Abstract

The present invention discloses a document processing method, electronic device and program product. The method comprises: in response to an upload operation of a document image of a first document in an electric power system, obtaining a document image, and determining the document content of the first document according to the document image; parsing the document content to obtain a named entity of the first document, and determining the document type of the first document according to the named entity and the document content; naming the first document according to the document type and the named entity, and adjusting the page of the first document according to the document type and the named entity to obtain a second document, and storing the second document according to the document naming. This solution realizes automatic identification of key information in a document and reduces manual intervention; at the same time, after the first document is named and the page is adjusted, a second document is obtained, and the second document is stored, which improves the archiving accuracy and management efficiency of the document.

Description

Document processing method, electronic device and program product

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to a document processing method, an electronic device, and a program product.

Background

With the advancement of digital transformation in electric power enterprises, the number and complexity of electric power documents are increasing. In the power industry, document processing jobs rely on manual processing, including scanning, sorting, naming, and archiving of documents. However, the manual processing mode is low in efficiency, errors are easy to generate, and particularly in the document classification and archiving process, the manual processing can cause problems of irregular document management, wrong classification and the like, so that the accuracy and the efficiency of information retrieval are affected. Meanwhile, the manual processing mode cannot effectively cope with complex power documents, such as power operation rules, equipment maintenance records and the like, of named entities and structured information related to the professional field, and the maintenance cost and the burden of manual intervention are increased.

Disclosure of Invention

The invention provides a document processing method, electronic equipment and a program product, which are used for solving the problems that the existing power document processing mode is low in efficiency and poor in accuracy and cannot effectively cope with complex power documents related to named entities and structured information in the professional field in the power industry.

According to an aspect of the present invention, there is provided a document processing method including:

responding to an uploading operation of a document image of a first document in the power system, acquiring the document image, and determining the document content of the first document according to the document image;

content analysis is carried out on the document content to obtain a named entity of the first document, and the document type of the first document is determined according to the named entity and the document content;

And carrying out document naming on the first document according to the document type and the named entity, carrying out page adjustment on the first document according to the document type and the named entity to obtain a second document, and storing the second document according to the document naming.

According to another aspect of the present invention, there is provided an electronic device including:

At least one processor, and

A memory communicatively coupled to the at least one processor, wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the document processing method of any one of the embodiments of the present invention.

According to another aspect of the invention, embodiments of the invention also provide a computer program product comprising a computer program which, when executed by a processor, implements a document processing method as in any of the embodiments of the invention.

According to the technical scheme, the document image is obtained by responding to the uploading operation of the document image of the first document in the electric power system, the document image and the document content of the first document are determined according to the document image, automatic acquisition of the document image and the document content of the first document is achieved, manual intervention is reduced, data support is provided for subsequent document content analysis and information extraction, the document content is analyzed to obtain a named entity of the first document, the document type of the first document is determined according to the named entity and the document content, key information in the document is identified, the named entity and the document type of the first document are determined, so that the accuracy of document type identification is ensured, the document is named according to the document type and the named entity, page adjustment is carried out on the first document according to the document type and the named entity, a second document is obtained, the second document is stored according to the document name, the document naming and page adjustment are carried out on the first document, the second document is obtained, and the accuracy and the archiving efficiency of the second document are improved.

It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the invention or to delineate the scope of the invention. Other features of the present invention will become apparent from the description that follows.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flowchart of a document processing method according to an embodiment of the present invention;

FIG. 2 is a flowchart of a document processing method according to a second embodiment of the present invention;

FIG. 3 is a flowchart of a document processing method according to a third embodiment of the present invention;

FIG. 4 is a flowchart of a document processing method according to a fourth embodiment of the present invention;

FIG. 5 is a block diagram of a document processing apparatus according to a fifth embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device for implementing a document processing method according to a sixth embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present invention and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the invention described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

It should be noted that references to "one", "a plurality" and "a plurality" in this disclosure are intended to be illustrative rather than limiting, and those skilled in the art will appreciate that "one or more" is intended to be construed as "one or more" unless the context clearly indicates otherwise.

The names of messages or information interacted between the devices in the embodiments of the present invention are for illustrative purposes only and are not intended to limit the scope of such messages or information.

It can be appreciated that before using the technical solutions disclosed in the embodiments of the present invention, the user should be informed and authorized of the type, the usage range, the usage scenario, etc. of the personal information related to the present invention in an appropriate manner according to the relevant laws and regulations.

For example, in response to receiving an active request from a user, a prompt is sent to the user to explicitly prompt the user that the operation it is requesting to perform will require personal information to be obtained and used with the user. Therefore, the user can automatically select whether to provide personal information for software or hardware such as electronic equipment, application programs, servers or storage media for executing the operation of the technical scheme according to the prompt information.

As an alternative but non-limiting implementation, in response to receiving an active request from a user, the manner in which the prompt information is sent to the user may be, for example, a popup, in which the prompt information may be presented in a text manner. In addition, a selection control for the user to select to provide personal information to the electronic device in a 'consent' or 'disagreement' manner can be carried in the popup window.

It will be appreciated that the above-described notification and user authorization acquisition process is merely illustrative and not limiting of the implementation of the present invention, and that other ways of satisfying relevant legal regulations may be applied to the implementation of the present invention.

It will be appreciated that the data (including but not limited to the data itself, the acquisition or use of the data) involved in the present technical solution should comply with the corresponding legal regulations and the requirements of the relevant regulations.

Example 1

Fig. 1 is a flowchart of a document processing method according to an embodiment of the present invention. The embodiment is applicable to the case of document processing of a power document in the power industry, the method can be executed by a document processing device, the document processing device can be implemented in the form of hardware and/or software, and optionally, the document processing device can be implemented by electronic equipment, and the electronic equipment can be a mobile terminal, a PC (personal computer) terminal, a server or the like.

As shown in fig. 1, the method may specifically include:

S101, responding to an uploading operation of a document image of a first document in the power system, acquiring the document image, and determining the document content of the first document according to the document image.

In an embodiment of the present invention, the first document may refer to a power document generated and used at various stages in the power system. The first document records information such as technical parameters, operation rules, management regulations, engineering progress, safety records and the like of the power system. By way of example, the first document may be a file, chart, report, or the like. The document image may refer to image information obtained after image preprocessing of the first document. The image preprocessing comprises image denoising, correction, binarization and the like. Document content may refer to specific literal content recorded in a first document.

Specifically, in response to an uploading operation of a document image of a first document in the power system, the first document may be acquired through a network interface or a local uploading manner. Further, a document image of the first document can be obtained after image preprocessing of the first document. Finally, the document content of the first document can be obtained after the document image of the first document is subjected to character recognition, and data support is provided for subsequent document content analysis and information extraction.

Alternatively, image preprocessing of the first document may be performed by removing noise in the image using a gaussian filter or median filter algorithm, correcting the image tilt by edge detection and affine transformation to ensure that the text is in a horizontal state, and converting the image into a black and white binary image using an adaptive thresholding method to highlight the text outline.

Optionally, the character recognition of the document image of the first document may be performed by an optical character recognition (Optical Character Recognition, OCR) technique, specifically, firstly, feature extraction is performed on the image by using a convolutional neural network (Convolutional Neural Network, CNN), character regions are recognized and located, then, line cutting is performed on the recognized character regions, each line of characters is accurately separated, and finally, each character is recognized by using a cyclic neural network (Recurrent Neural Network, RNN) and a Long Short-Term Memory (LSTM).

For example, if the first document is a power equipment maintenance report, key information such as a power equipment name, maintenance date, fault type and the like can be identified, and the identification result is corrected and optimized through a language model and context information to obtain final document content. If the first document is a multi-page document, the first document pair is page-split for individual processing of the multi-page document.

S102, analyzing the content of the document to obtain a named entity of the first document, and determining the document type of the first document according to the named entity and the document content.

In the embodiment of the invention, the named entity can refer to a specific name or identification related to the power industry, which is identified in the document content of the first document. By way of example, the named entity may be a power device, a technical term, a fault type or an operating procedure, or the like.

Specifically, the named entity of the first document can be obtained by performing content analysis on the document content of the first document. Further, a document type of the first document may be determined based on the named entity of the first document and the document content. By way of example, the document type of the first document may be determined based on a preset document type determination manner in combination with the named entity of the first document and the document content.

S103, naming the first document according to the document type and the named entity, carrying out page adjustment on the first document according to the document type and the named entity to obtain a second document, and storing the second document according to the document naming.

Specifically, the first document may be named and page adjusted according to the document type and the named entity to form the second document. Further, the second document may be stored according to the document naming to facilitate searching and management.

For example, the first document may be named in combination with the document type and the named entity based on a preset document naming manner; the method can be used for carrying out page adjustment on the first document by combining the document type and the named entity based on a preset page adjustment mode, and can be used for storing the second document by combining the document naming based on a preset storage mode.

By way of example, the second document may be stored by the intelligent archiving system to enable archiving of the document. The intelligent archiving system firstly uses a predefined document classification mode and determines the storage position of the intelligent archiving system in the document management system according to the type, the content and the metadata information of the document. Then, automatically generating index information of the document, including document name, type, creation time, keywords and the like, and storing the index information into an index database of the intelligent archiving system, so that subsequent retrieval and management are facilitated. At the same time, the document is version controlled, if the document is updated to the existing document, a new version is created and a history version is reserved. The intelligent archiving system sets corresponding access rights for the document according to a preset access right determining mode, and ensures the security of sensitive information.

As an alternative implementation manner of the embodiment of the invention, page adjustment is carried out on a first document according to the document type and a named entity to obtain a second document, and the method comprises the steps of carrying out page layout segmentation on the first document through an intelligent segmentation algorithm to obtain a plurality of page elements of the first document, matching the plurality of page elements with template elements in a preset format template, and adjusting the page elements in the first document according to a matching result to obtain the second document.

In the embodiment of the invention, the intelligent segmentation algorithm can adopt image processing technology, such as edge detection and region segmentation, to identify each page element in the document, including a title, a text, a table, a picture and the like. Standard layouts of different types of documents are defined in the layout templates, including specifications of margins, font sizes, line spacing, etc.

Specifically, after the naming of the document of the first document is determined, the page layout of the first document can be divided through an intelligent dividing algorithm, and a plurality of page elements of the first document are identified. And matching the identified plurality of page elements with template elements in a preset layout template to analyze and adjust page layout of the page elements in the first document according to the matching result, thereby obtaining a second document.

Example two

Fig. 2 is a flowchart of a document processing method according to a second embodiment of the present invention. The technical solution of the present embodiment further refines the process of resolving the content of the document in the foregoing embodiment to obtain the named entity of the first document based on the technical solution of the foregoing embodiment. Reference is made to the description of this example for a specific implementation. The technical features that are the same as or similar to those of the foregoing embodiments are not described herein.

As shown in fig. 2, the method may specifically include:

S201, responding to an uploading operation of a document image of a first document in the power system, acquiring the document image, and determining the document content of the first document according to the document image.

S202, performing text segmentation processing on the document content, and segmenting the document content into a text sequence containing a plurality of text fragments according to a preset text length.

Specifically, based on a preset text length, the document content may be subjected to text segmentation processing to form a text sequence. The preset text length may refer to the preset number of characters used when the document content is segmented. The text sequence comprises a plurality of text fragments with preset text lengths. For example, if the document content includes 10000 characters and the preset text length is 256 characters, the document content may be divided into text sequences including 40 text segments by using a sliding window technology, so as to ensure that the text segments overlap with each other, and maintain the continuity of the context.

And S203, performing vocabulary matching on each text segment in the text sequence by utilizing a predefined electric power domain dictionary to obtain vocabulary information corresponding to each character in the text segment.

In particular, the predefined power domain dictionary may be built up by power terminology. Exemplary power terminology includes transformers, circuit breakers, insulation, and the like. And further, performing vocabulary matching on each text segment in the text sequence by using a predefined power domain dictionary so as to obtain vocabulary information corresponding to each character in each text segment.

For example, word matching of individual text segments in a text sequence may employ a maximum forward matching algorithm to scan individual text segments from left to right and preferentially match the longest word. Meanwhile, for each character, the position of the character in the vocabulary may be noted. For example, the beginning position of a character in a vocabulary is denoted as B, the middle position is denoted as M, the ending position is denoted as E, the word of a single character is denoted as S, and the main transformer may be denoted as B-main, M-transformer, E-transformer.

S204, coding the text sequence, the vocabulary information and the predefined entity class description text by using a pre-training language model and an encoder to obtain a contextually relevant character vector representation and a tag knowledge vector.

In the embodiment of the invention, the pre-training language model can refer to a pre-training language model capable of encoding text. Illustratively, the pre-trained language model may be a pre-trained bi-directional encoder (Bidirectional Encoder Representations from Transformers, BERT) model based on a transducer architecture. The predefined entity class specification text may refer to predefined text that explains the entity class, such as "device name: the name of the power device mentioned in the text". Character vector representation may refer to representing characters in vector form. The tag knowledge vector may refer to a vector representing tag knowledge.

Specifically, the text sequence, the vocabulary information and the predefined entity class description text are input into a pre-training language model and an encoder, and the context-related character vector representation and the tag knowledge vector can be obtained after the encoding process.

Illustratively, when the pre-trained language model is a BERT model, the BERT model includes 12 layers of transducer encoders, each layer having 12 attention heads, and the BERT model can capture long-distance dependencies through a self-attention mechanism, and incorporate the output of each layer into lexical information. Further, for the text sequence "CLS, main, transformer, and transformer, the fault, SEP", the BERT model outputs a 768-dimensional sequence of vectors, each vector corresponding to a contextual representation of a character. Meanwhile, for the predefined entity class description text, independent parameters can be adopted by the BERT model to distinguish different input types, and each entity class is output in a vector representation.

The method comprises the steps of performing embedding processing on a text sequence and vocabulary information through an embedding layer of a pre-training language model to obtain initial character embedding and vocabulary embedding, performing vocabulary information fusion processing on each layer in the pre-training language model through an attention mechanism according to the initial character embedding and vocabulary embedding to obtain intermediate character representation of fused vocabulary information, performing multi-level feature extraction processing on the intermediate character representation through a multi-head self-attention mechanism and a feedforward neural network of the pre-training language model to obtain context-related character vector representation, performing encoding processing on the pre-defined entity category description text through the pre-training language model to obtain an initial tag knowledge vector, and performing tag knowledge enhancement processing on the initial tag knowledge vector to obtain the tag knowledge vector.

Specifically, the text sequence and the vocabulary information can be subjected to embedding processing through an embedding layer of the pre-training language model, so that initial character embedding and vocabulary embedding are obtained. Furthermore, the initial character embedding and the vocabulary embedding can be subjected to vocabulary information fusion processing by utilizing a multi-head self-attention mechanism through each layer in the pre-training language model, so that the intermediate character representation of the fused vocabulary information is obtained. And finally, carrying out multi-level feature extraction processing on the intermediate character representation based on a multi-head self-attention mechanism and a feedforward neural network in the pre-training language model to obtain the contextually relevant character vector representation. Wherein the multi-headed self-attention mechanism allows the model to simultaneously focus on different representation subspaces, enhancing the performance of the model in capturing complex language features.

Illustratively, when the pre-trained language model is a BERT model, the embedding layer of the BERT model includes character embedding, position embedding, and segmentation embedding. For text sequences, each character is mapped to a vector space of fixed dimensions, typically 768 dimensions. At the same time, the lexical information is also converted into vector representations of the same dimension. For example, for the input sequence "Main Transformer failure", each character would get an initial 768-dimensional vector representation, and the corresponding lexical information "B-device, O-failure" would also be converted into vector form. Furthermore, each BERT layer in the BERT model performs vocabulary information fusion processing through a multi-head attention mechanism, and obtains intermediate character representation of fused vocabulary information. Each attention head then generates an attention output, and a final attention layer output can be obtained by splicing and linearly transforming each attention output, so that the feedforward neural network performs nonlinear feature extraction on the final attention layer output to obtain a contextually relevant character vector representation.

Specifically, the pre-defined entity class description text can be encoded through a pre-training language model to obtain an initial tag knowledge vector. Furthermore, the label knowledge enhancement processing is carried out on the initial label knowledge vector through nonlinear transformation, so that the label knowledge vector is obtained, and the expression performance of the label knowledge is enhanced. The initial tag knowledge vector is illustratively transformed by a multi-layer perceptron (Multilayer Perceptron, MLP), where the MLP contains two hidden layers, each layer using a linear rectification function (RECTIFIED LINEAR Unit, reLU) activation function, and the last layer using a hyperbolic tangent activation function to ensure that the output is within the [ -1,1] range, so that the tag knowledge vector can better capture the semantic information of the entity class.

And S205, carrying out feature fusion on the character vector representation and the tag knowledge vector to obtain fusion feature representation, and carrying out labeling processing on each character in the text sequence based on the fusion feature representation by using a preset conditional random field to obtain a named entity of the first document.

In an embodiment of the invention, the conditional random field (Conditional Random Field, CRF) may refer to a discriminant probability map model for sequence labeling tasks. The CRF layer defines a transfer matrix a, where a _ij represents the score transferred from tag i to tag j. For a length n sequence, the CRF calculates a score for tag sequence y as:

Where f _i represents the emission score for the ith character labeled y _i. The CRF uses the viterbi algorithm to find the highest scoring one among all tag sequences.

Specifically, the fused feature representation may be obtained by feature fusing the character vector representation and the tag knowledge vector. And labeling each character in the text sequence represented by the fusion characteristic by using a preset conditional random field, so as to obtain a named entity of the first document. For example, for the sequence "Main Transformer failure", the tag sequence output by the CRF may be "B-device, I-device, O, B-failure".

By way of example, character vector representations and tag knowledge vectors may be feature fused by a gating mechanism. For each character vector w _i, the gating value g _i may be calculated:

g_i＝σ(W_g*[w_i;v_i;k]);

Where v _i denotes the vocabulary information vector, k denotes the tag knowledge vector, W _g denotes the learnable parameter matrix, and the fusion feature a _i is obtained:

a_i＝g_i·w_i+(1-g_i)*(W_v*v_i+W_k*k);

Where W _v and W _k each represent a matrix of learnable parameters.

S206, determining the document type of the first document according to the named entity and the document content.

S207, naming the first document according to the document type and the named entity, performing page adjustment on the first document according to the document type and the named entity to obtain a second document, and storing the second document according to the document naming.

According to the technical scheme, the method and the device for automatically acquiring the document image and the document content of the first document are realized by responding to the uploading operation of the document image of the first document in the electric power system, the document image and the document content of the first document are determined according to the document image, manual intervention is reduced, data support is provided for subsequent document content analysis and information extraction, text segmentation processing is carried out on the document content, the document content is segmented into a text sequence containing a plurality of text fragments according to preset text length, word matching is carried out on each text fragment in the text sequence by utilizing a pre-defined electric power domain dictionary, word information corresponding to each character in the text fragment is obtained, segmentation processing is carried out on the document content, key information in the document is accurately identified, the text sequence, word information and pre-defined entity category description text are encoded according to the document image, character vector representation and tag knowledge vector which are related are obtained, feature fusion is carried out on the character vector representation and tag knowledge vector which are used for subsequent document content analysis, feature fusion is carried out on the character vector representation is carried out on each character vector in the text sequence, word character sequence is carried out on each character sequence by utilizing a preset condition airport, word character sequence is carried out on each character sequence which is based on the text fragment, the first character sequence is accurately identified, the type of the first document is accurately identified by the first document is obtained, the first document is accurately identified by the method and device is carried out on the document according to the first type of the document, the text is accurately identified by the text is obtained by the text according to the text by the method, and the second document is stored according to the document naming, so that the second document is obtained after the first document is named and pages are adjusted, the second document is stored, and the archiving accuracy and the management efficiency of the document are improved.

Example III

Fig. 3 is a flowchart of a document processing method according to a third embodiment of the present invention. The process of determining the document type of the first document according to the named entity and the document content in the foregoing embodiment is further refined based on the technical solutions of the foregoing embodiments. Reference is made to the description of this example for a specific implementation. The technical features that are the same as or similar to those of the foregoing embodiments are not described herein.

As shown in fig. 3, the method may specifically include:

S301, responding to an uploading operation of a document image of a first document in the power system, acquiring the document image, and determining document content of the first document according to the document image.

S302, analyzing the content of the document to obtain a named entity of the first document.

S303, inserting the named entity into a preset position in the document content to obtain an enhanced text sequence, and performing coding processing on the enhanced text sequence to obtain text characteristic representation.

Specifically, the identified named entity can be inserted into a preset position of the document content as a special mark to form an enhanced text sequence so as to highlight important entities and provide more information for subsequent text feature extraction. For example, the document content of "insulation aging fault of 110kV main transformer" can be converted into "[ equipment ]110kV main transformer [/equipment ] insulation aging fault [/fault type ]". Further, the enhanced text sequence may be encoded using a pre-trained language model to generate a text feature representation that contains entity information and is contextually relevant.

S304, performing feature extraction processing on image content in the document image through a pre-trained feature extraction model to obtain image feature representation.

In the embodiment of the invention, the pre-trained feature extraction model may refer to a pre-trained model for extracting features of image content. By way of example, the pre-trained feature extraction model may be a visual geometry group (Visual Geometry Group, VGG-16) consisting of 16 layers. VGG-16 can be used as a deep convolutional neural network model, comprising 13 convolutional layers and 3 fully-connected layers.

Specifically, the image content in the document image can be input into a pre-trained feature extraction model, after each layer in the feature extraction model is subjected to feature extraction processing, the output of the last layer is used as an image feature representation, so that advanced visual features in the image, such as equipment shape, fault phenomenon and the like, can be extracted. For example, the image content in the document image may be adjusted to 224×224, and then sequentially passed through the layers in the feature extraction model, outputting a 4096-dimensional vector as the image feature representation.

S305, performing feature screening on the text feature representation and the image feature representation through a preset multi-mode collaborative pooling module to obtain the multi-mode representation after dimension reduction.

Specifically, after the text feature representation and the image feature representation of the document content are obtained, feature screening can be performed on the text feature representation and the image feature representation through a preset multi-modal collaborative pooling module so as to obtain the multi-modal representation after dimension reduction. The multi-modal collaborative pooling module may refer to a technology of fusing multi-modal features, and is used for effectively combining data of different modalities. The multi-modal collaborative pooling module can globally pool the text feature representation and the image feature representation to obtain two global feature representations, so that the multi-modal representation after dimension reduction corresponding to the document content can be obtained based on the two global features.

S306, performing feature fusion processing on the multi-modal representation through a preset feature fusion network to obtain a fused multi-modal document representation.

Specifically, feature fusion processing can be performed on the multi-modal representation through a preset feature fusion network, so that a fused multi-modal document representation is obtained. The feature fusion network may refer to a pre-trained network that can fuse features represented by multiple modes. The feature fusion network may be a cross-modal multi-granularity interaction fusion network for processing multi-modal data, for example.

The method comprises the steps of performing feature fusion processing on multi-modal representation through a preset feature fusion network to obtain a fused multi-modal document representation, performing coarse-granularity feature migration processing on text representation and image feature representation in the multi-modal representation through the feature fusion network to obtain initial cross-modal features, processing the text features and the image features in the initial cross-modal features through the attention mechanism in the feature fusion network to obtain document perceived visual representation and visual enhanced text representation, and performing multi-level feature extraction processing on the document perceived visual representation and visual enhanced text representation through the feature fusion network encoder of the transducer model to obtain the fused multi-modal document representation.

In an embodiment of the invention, the feature fusion network includes a modal mixture bias module, an attention mechanism, and an encoder of a transducer model. The mode mixing offset module is used for performing coarse granularity feature offset processing, and takes text feature representation V epsilon R ^m multiplied by d and image feature representation I epsilon R ^m multiplied by l as inputs, wherein m represents text paragraph number and d represents feature dimension, and controls information flow by combining a gating mechanism, namely:

G=sigmoid(W_fl[I;V]+b_fl);

Wherein W _fl∈R^2d×d and b _fl∈R^d represent learnable parameters, +..

In the embodiment of the invention, the attention mechanism is used for carrying out fine-grained feature interaction, and the similarity matrix S epsilon R ^m×l between the text features and the image features is calculated, wherein l represents the number of the image features, and the calculation formula is as follows:

S=H^TN₁W_slV^T;

Where N ₁∈R^l×d represents image features and W _sl∈R^d×d represents a learnable parameter matrix, and further, a visual representation of document perception and a visual enhanced text representation may be generated by the similarity matrix. For example, the attention weight matrix A _c＝softmax(S^T can be derived by a softmax operation, and by A visual representation of the perception of the document is calculated.

In an embodiment of the present invention, the transducer encoder is formed by stacking a plurality of identical layers, each layer comprising a multi-headed self-attention mechanism and a feedforward neural network. Further, for the input sequence, the output of each transducer layer is added to the input by residual connection and layer normalization. After multiple iterations, the model can capture the cross-modal interaction information of a deeper level.

Specifically, coarse granularity feature migration processing can be performed through a mode mixing migration module in a feature fusion network, so that text representation and image feature representation in multi-mode representation are primarily fused, and initial cross-mode features are obtained. For example, for a power document containing a text description of "main transformer oil temperature anomaly" and a corresponding device picture, the modal mixture migration module may preliminarily fuse the "oil temperature anomaly" information in the text with visual features in the image. Furthermore, fine-grained interaction processing can be performed on text features and image features in the initial cross-modal features through an attention mechanism in the feature fusion network, so that a visual representation of document perception and a visual enhanced text representation are obtained. Finally, the document perceived visual representation and the visual enhanced text representation can be subjected to multi-level feature extraction processing through an encoder of a transducer model in a feature fusion network, so that the text feature representation and the image feature representation are effectively fused, and the fused multi-mode document representation is obtained. By using the feature fusion network, the semantic information of the original text and the image is reserved, and the complex interaction relationship between the text and the image is captured, so that more comprehensive document understanding can be provided for the power document classification task.

For example, for transformer fault reports, the fusion representation not only comprises fault types and parameter information in the text description, but also integrates visual clues in images, such as abnormal appearance of equipment or meter reading, so that rich and accurate characteristic information is provided for subsequent classification tasks, and the understanding capability of a model on complex power documents is improved.

S307, classifying the multi-mode document representation through a preset feature aggregation network and a full connection layer to obtain the document type of the first document.

Specifically, the fused multi-mode document representations can be classified through a preset feature aggregation network and a full connection layer, so that the document type of the first document is obtained. The feature aggregation network captures the relation between paragraphs by using multi-layer document transformers, and each layer of transformers comprises a multi-head self-attention mechanism and a feedforward neural network. Furthermore, the characteristics of the characteristics aggregation network after aggregating the multi-mode document representations can be connected to the full connection layer and mapped to the predefined document type space to obtain probability distribution of each type, so that the type with the highest probability is selected as the document type of the first document.

S308, naming the first document according to the document type and the named entity, carrying out page adjustment on the first document according to the document type and the named entity to obtain a second document, and storing the second document according to the document naming.

According to the technical scheme, the document image is obtained by responding to the uploading operation of the document image of the first document in the electric power system, the document image and the document content of the first document are automatically obtained according to the document image, manual intervention is reduced, data support is provided for subsequent document content analysis and information extraction, the document content is analyzed to obtain a named entity of the first document, the named entity is inserted into a preset position in the document content to obtain an enhanced text sequence, the enhanced text sequence is encoded and processed to obtain text characteristic representation, the image content in the document image is subjected to characteristic extraction processing through a pre-trained characteristic extraction model to obtain the image characteristic representation, the text characteristic representation and the image characteristic representation are subjected to characteristic screening through a preset multi-modal collaborative pooling module to obtain multi-modal representation after dimension reduction, the multi-modal representation after dimension reduction is determined based on the text characteristic representation and the image characteristic representation is realized, the multi-modal document after fusion processing is obtained through a preset characteristic fusion network, the multi-modal representation after fusion is obtained, the named entity is inserted into a preset characteristic aggregation network and a full-connection layer, the named entity is further processed to obtain a text characteristic representation, the first document is further classified according to the first document type and the named entity, the named entity is further determined, the type of the document is accurately obtained, the document is classified according to the first document type is obtained, the named document is further determined, the document type is obtained, the document is classified according to the first document type is further is obtained, and the document type is classified, and the document is further is classified, the document type is obtained, and the document type is obtained, the method and the device have the advantages that the second document is obtained after the first document is named and pages are adjusted, the second document is stored, and the archiving accuracy and the management efficiency of the document are improved.

Example IV

Fig. 4 is a flowchart of a document processing method according to a fourth embodiment of the present invention. The technical solution of the present embodiment further refines the process of naming the first document according to the document type and the named entity in the foregoing embodiment on the basis of the technical solution of the foregoing embodiment. Reference is made to the description of this example for a specific implementation. The technical features that are the same as or similar to those of the foregoing embodiments are not described herein.

As shown in fig. 4, the method may specifically include:

s401, responding to an uploading operation of a document image of a first document in the power system, acquiring the document image, and determining document content of the first document according to the document image.

S402, analyzing the content of the document to obtain a named entity of the first document, and determining the document type of the first document according to the named entity and the document content.

S403, acquiring a naming template corresponding to the first document from a preset naming rule base according to the document type, and screening information of the named entity to obtain a core element for naming the document.

In the embodiment of the invention, the naming convention library can refer to a database containing various power document types and corresponding naming templates thereof. For example, for a "device failure report" type, the naming template may be "[ device name ] [ failure type ] [ date ]".

Specifically, based on the document type of the first document, a naming template corresponding to the document type may be obtained from a naming rule base. Furthermore, a predefined importance scoring algorithm may be used to perform key information screening processing on the identified named entities in consideration of factors such as entity types, occurrence frequencies, and context correlations, and screen out core elements for naming documents from a plurality of named entities.

S404, carrying out combination processing on the core elements through a naming template and an intelligent filling algorithm to obtain initial naming.

Specifically, based on a naming template and an intelligent filling algorithm, initial naming of the first document can be obtained by carrying out combination processing on the core elements. The intelligent filling algorithm considers special naming habits and specifications of the power industry, for example, a date format is uniformly used for 'YYYYMMDD', and a standard abbreviation is used for device names. For example, for a fault report describing an abnormality of the main transformer oil temperature, an initial document name of "main transformer_insulation aging fault report_ 20240327" may be generated.

The method comprises the steps of carrying out semantic analysis processing on core elements through a natural language processing algorithm to obtain semantic features and importance scores of all elements, sorting the core elements according to the semantic features and the importance scores to obtain an ordered element sequence, carrying out matching processing on the element sequence and the naming templates through a dynamic programming algorithm to obtain an element filling scheme, and filling the naming templates according to the element filling scheme through a context-aware text generation algorithm to obtain initial naming.

Specifically, the core elements may be semantically processed by natural language processing algorithms, for example, using a pre-trained word embedding model to convert each core element into a dense vector representation, and then capturing semantic features of the element in context using a attentive mechanism and a bi-directional long-short term memory network. Further, the importance score of each core element is obtained by combining the Frequency, the position and the semantic relevance of each core element in the document with other elements, for example, the importance score of each element is calculated by using a mixed algorithm based on Term Frequency-inverse document Frequency (Term Frequency-Inverse Document Frequency, TF-IDF) and text ranking algorithm (TextRank). For example, core elements such as "main transformer" and "insulation aging" may get a higher importance score for transformer fault reporting. And then, carrying out priority ranking processing on the core elements based on the semantic features and the importance scores, thereby obtaining an ordered element sequence. For example, improved quick ranking may be used, with importance scores as primary ranking keys and semantic similarity as secondary ranking keys to ensure that important core elements are prioritized and semantically similar elements are clustered together to facilitate the generation of consistent document names. And filling elements in the element sequence into corresponding positions of the named templates through a dynamic programming algorithm, so that matching processing is carried out on the ordered element sequence and the preset named templates, and an element filling scheme is obtained. The state of the dynamic programming algorithm can be defined as dp [ i ] [ j ], the score of the j slots before filling the template with the first i elements is represented, the importance score of the elements, the semantic matching degree of the slots and the continuity of filling are combined by a transfer equation, the time complexity of the algorithm is O (mn), wherein m represents the number of the elements, and n represents the number of the slots of the template. Finally, intelligent filling processing is carried out on the naming templates according to the element filling scheme through a context-aware text generation algorithm, and initial naming of the first document is generated. For example, a pre-training language model based on a transducer is used, elements in a filling scheme are used as input, and a document name conforming to the natural language expression habit is generated by combining the professional terms and naming standards of the power industry, so that the generated name can accurately convey document content and conform to the industry standard.

And S405, repeatedly searching the document library according to the initial naming to obtain a search result, and determining the document naming of the first document according to the initial naming and the search result.

Specifically, an efficient string matching algorithm, such as a quick pattern matching algorithm (KMP), may be used to quickly retrieve whether a document name identical to the initial name of the first document exists in the document library, and as a result of the retrieval. If the search result shows that the document name which is the same as the initial name of the first document exists in the document library, the initial document name is adjusted until the initial document name is different from the document name in the document library, and the initial document name is used as the document name of the first document. Illustratively, the end of the initial naming of the first document is appended with an incremental numeric suffix or timestamp to ensure uniqueness of the naming of the document.

S406, carrying out page adjustment on the first document according to the document type and the named entity to obtain a second document, and storing the second document according to the named entity of the document.

According to the technical scheme, the document image is obtained by responding to the uploading operation of the document image of the first document in the electric power system, the document image and the document content of the first document are automatically obtained according to the document image, manual intervention is reduced, data support is provided for subsequent document content analysis and information extraction, the document content is analyzed to obtain a named entity of the first document, the document type of the first document is determined according to the named entity and the document content, key information in the document is identified, the named entity and the document type of the first document are determined, accuracy of the document type identification is guaranteed, a naming template corresponding to the first document is obtained from a preset naming rule base according to the document type, information screening is conducted on the named entity, a core element for naming the document is obtained, initial naming is achieved based on the document type, the core element is combined through the naming template and an intelligent filling algorithm, initial naming is obtained, a repeated search is conducted on the document library according to obtain a result, the initial naming and the first document is determined according to the initial naming and the document content, the first naming is accurately obtained, the first document is stored according to the initial naming and the first naming result is stored, the first naming is accurately, and the document is stored according to the first naming is enabled to be unique, and the document naming is enabled to be unique.

Example five

Fig. 5 is a block diagram of a document processing apparatus according to a fifth embodiment of the present invention. The embodiment can be applied to the situation of document processing of the power documents in the power industry. As shown in fig. 5, the document processing apparatus includes a first determination module 501, a second determination module 502, and a document processing module 503. The system comprises a first determining module 501, a second determining module 502, a document processing module 503 and a page adjusting module, wherein the first determining module 501 is used for responding to the uploading operation of a document image of a first document in a power system, obtaining the document image and determining the document content of the first document according to the document image, the second determining module 502 is used for carrying out content analysis on the document content to obtain a named entity of the first document, determining the document type of the first document according to the named entity and the document content, carrying out document naming on the first document according to the document type and the named entity, carrying out page adjustment on the first document according to the document type and the named entity, obtaining a second document, and storing the second document according to the document naming.

According to the technical scheme, the first determining module 501 is used for responding to uploading operation of the document image of the first document in the electric power system, obtaining the document image, determining the document content of the first document according to the document image, automatically obtaining the document image and the document content of the first document, reducing manual intervention, providing data support for subsequent document content analysis and information extraction, carrying out content analysis on the document content through the second determining module 502 to obtain a named entity of the first document, determining the document type of the first document according to the named entity and the document content, identifying key information in the document, determining the named entity and the document type of the first document, ensuring the accuracy of document type identification, carrying out document naming on the first document according to the document type and the named entity, carrying out page adjustment on the first document according to the document type and the named entity, obtaining a second document, storing the second document according to the document name, obtaining the second document after the first document and page adjustment, and storing the second document, and improving the accuracy of document archiving management.

On the basis of any of the above optional solutions, optionally, the second determining module 502 includes a text segmentation unit, a first obtaining unit, a second obtaining unit, and a third obtaining unit. The method comprises the steps of carrying out text segmentation processing on document content, segmenting the document content into a text sequence comprising a plurality of text fragments according to preset text length, carrying out vocabulary matching on each text fragment in the text sequence by utilizing a predefined electric power domain dictionary to obtain vocabulary information corresponding to each character in the text fragment, carrying out encoding processing on the text sequence, the vocabulary information and a predefined entity class description text by utilizing a pre-training language model and an encoder to obtain a context-related character vector representation and a label knowledge vector, and carrying out feature fusion on the character vector representation and the label knowledge vector to obtain fusion feature representation, and carrying out labeling processing on each character in the text sequence by utilizing a preset conditional random field based on the fusion feature representation to obtain a named entity of the first document.

On the basis of any one of the above optional technical solutions, the optional second obtaining unit is specifically configured to perform embedding processing on the text sequence and the vocabulary information through an embedding layer of the pre-training language model to obtain initial character embedding and vocabulary embedding, perform vocabulary information fusion processing on each layer of the pre-training language model according to the initial character embedding and the vocabulary embedding through an attention mechanism to obtain intermediate character representation of fused vocabulary information, perform multi-level feature extraction processing on the intermediate character representation through a multi-head self-attention mechanism and a feedforward neural network of the pre-training language model to obtain a contextually relevant character vector representation, perform encoding processing on a predefined entity category description text through the pre-training language model to obtain an initial tag knowledge vector, and perform tag knowledge enhancement processing on the initial tag knowledge vector to obtain the tag knowledge vector.

On the basis of any of the above optional solutions, optionally, the second determining module 502 further includes a fourth acquiring unit, a fifth acquiring unit, a sixth acquiring unit, a seventh acquiring unit, and an eighth acquiring unit. The method comprises the steps of obtaining a document content, obtaining a text sequence, obtaining a text feature representation, obtaining an image feature representation, obtaining a text feature representation, obtaining a sixth obtaining unit, obtaining a multi-modal representation after dimension reduction by carrying out feature screening on the text feature representation and the image feature representation through a preset multi-modal collaborative pooling module, obtaining a multi-modal representation after dimension reduction, obtaining a multi-modal document representation after fusion through a preset feature fusion network, obtaining a multi-modal document representation after fusion, and obtaining an eighth obtaining unit, and carrying out classification processing on the multi-modal document representation through a preset feature aggregation network and a full-connection layer to obtain the document type of a first document.

Based on any of the above optional solutions, optionally, the feature fusion network includes a modal mixture bias module, an attention mechanism, and an encoder of a transducer model. Correspondingly, the seventh acquisition unit is specifically configured to perform coarse granularity feature migration processing on the text representation and the image feature representation in the multi-modal representation through a mode mixing migration module in the feature fusion network to obtain initial cross-modal features, process the text features and the image features in the initial cross-modal features through an attention mechanism in the feature fusion network to obtain document-aware visual representation and visual enhanced text representation, and perform multi-level feature extraction processing on the document-aware visual representation and the visual enhanced text representation through an encoder of a Transformer model in the feature fusion network to obtain the fused multi-modal document representation.

On the basis of any of the above optional technical solutions, the document processing module 503 may include a core element acquiring unit, an initial naming acquiring unit, and a document naming determining unit. The system comprises a core element acquisition unit, an initial naming acquisition unit and a document naming determination unit, wherein the core element acquisition unit is used for acquiring a naming template corresponding to a first document from a preset naming rule base according to the type of the document, screening information on a naming entity to obtain a core element for naming the document, the initial naming acquisition unit is used for carrying out combination processing on the core element through the naming template and an intelligent filling algorithm to obtain initial naming, the document naming determination unit is used for carrying out repeated retrieval on the document base according to the initial naming to obtain a retrieval result, and determining the naming of the document of the first document according to the initial naming and the retrieval result.

On the basis of any of the above optional technical schemes, an initial naming acquisition unit is optional, and is specifically configured to perform semantic analysis processing on core elements through a natural language processing algorithm to obtain semantic features and importance scores of the elements, sort the core elements according to the semantic features and the importance scores to obtain an ordered element sequence, perform matching processing on the element sequence and a naming template through a dynamic programming algorithm to obtain an element filling scheme, and fill the naming template according to the element filling scheme through a context-aware text generation algorithm to obtain initial naming.

In addition to any of the above alternatives, the document processing module 503 may further include a page layout dividing unit and a page element adjusting unit. The page layout segmentation unit is used for carrying out page layout segmentation on the first document through an intelligent segmentation algorithm to obtain a plurality of page elements of the first document, and the page element adjustment unit is used for matching the plurality of page elements with template elements in a preset format template and adjusting the page elements in the first document according to a matching result to obtain a second document.

The document processing device provided by the embodiment of the invention can execute the document processing method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example six

Fig. 6 is a schematic structural diagram of an electronic device for implementing a document processing method according to a sixth embodiment of the present invention. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. Electronic equipment may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices (e.g., helmets, glasses, watches, etc.), and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the inventions described and/or claimed herein.

As shown in fig. 6, the electronic device 10 includes at least one processor 11, and a memory, such as a Read Only Memory (ROM) 12, a Random Access Memory (RAM) 13, etc., communicatively connected to the at least one processor 11, in which the memory stores a computer program executable by the at least one processor, and the processor 11 may perform various appropriate actions and processes according to the computer program stored in the Read Only Memory (ROM) 12 or the computer program loaded from the storage unit 18 into the Random Access Memory (RAM) 13. In the RAM 13, various programs and data required for the operation of the electronic device 10 may also be stored. The processor 11, the ROM 12 and the RAM 13 are connected to each other via a bus 14. An input/output (I/O) interface 15 is also connected to bus 14.

Various components in the electronic device 10 are connected to the I/O interface 15, including an input unit 16, such as a keyboard, mouse, etc., an output unit 17, such as various types of displays, speakers, etc., a storage unit 18, such as a magnetic disk, optical disk, etc., and a communication unit 19, such as a network card, modem, wireless communication transceiver, etc. The communication unit 19 allows the electronic device 10 to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processor 11 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of processor 11 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various specialized Artificial Intelligence (AI) computing chips, various processors running machine learning model algorithms, digital Signal Processors (DSPs), and any suitable processor, controller, microcontroller, etc. The processor 11 performs the respective methods and processes described above, such as a document processing method.

In some embodiments, the document processing method may be implemented as a computer program tangibly embodied on a computer-readable storage medium, such as storage unit 18. In some embodiments, part or all of the computer program may be loaded and/or installed onto the electronic device 10 via the ROM 12 and/or the communication unit 19. When the computer program is loaded into RAM 13 and executed by processor 11, one or more steps of the document processing method described above may be performed. Alternatively, in other embodiments, processor 11 may be configured to perform the document processing method in any other suitable manner (e.g., by means of firmware).

Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuit systems, field Programmable Gate Arrays (FPGAs), application Specific Integrated Circuits (ASICs), application Specific Standard Products (ASSPs), systems On Chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include being implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be a special or general purpose programmable processor, operable to receive data and instructions from, and to transmit data and instructions to, a storage system, at least one input device, and at least one output device.

A computer program for carrying out methods of the present invention may be written in any combination of one or more programming languages. These computer programs may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the computer programs, when executed by the processor, cause the functions/acts specified in the flowchart and/or block diagram block or blocks to be implemented. The computer program may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a computer-readable storage medium may be a tangible medium that can contain, or store a computer program for use by or in connection with an instruction execution system, apparatus, or device. The computer readable storage medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. Alternatively, the computer readable storage medium may be a machine readable signal medium. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

To provide for interaction with a user, the systems and techniques described here can be implemented on an electronic device having a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to the user and a keyboard and a pointing device (e.g., a mouse or a trackball) by which the user can provide input to the electronic device. Other kinds of devices may also be used to provide for interaction with a user, for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback), and input from the user may be received in any form, including acoustic input, speech input, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a Local Area Network (LAN), a Wide Area Network (WAN), a blockchain network, and the Internet.

The computing system may include clients and servers. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.

It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present invention may be performed in parallel, sequentially, or in a different order, so long as the desired results of the technical solution of the present invention are achieved, and the present invention is not limited herein.

The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims

1. A document processing method, comprising:

responding to an uploading operation of a document image of a first document in a power system, acquiring the document image, and determining document content of the first document according to the document image;

2. The method of claim 1, wherein said parsing the document content to obtain a named entity of the first document comprises:

performing text segmentation processing on the document content, and segmenting the document content into a text sequence containing a plurality of text fragments according to a preset text length;

Performing vocabulary matching on each text segment in the text sequence by utilizing a predefined electric power domain dictionary to obtain vocabulary information corresponding to each character in the text segment;

encoding the text sequence, the vocabulary information and the predefined entity class description text by using a pre-training language model and an encoder to obtain a contextually relevant character vector representation and a tag knowledge vector;

And carrying out feature fusion on the character vector representation and the tag knowledge vector to obtain fusion feature representation, and carrying out labeling processing on each character in the text sequence based on the fusion feature representation by using a preset conditional random field to obtain a named entity of the first document.

3. The method of claim 2, wherein said encoding the text sequence, the lexical information, and the predefined entity class specification text using a pre-trained language model and an encoder to obtain a contextually relevant character vector representation and a tag knowledge vector comprises:

Embedding the text sequence and the vocabulary information through an embedding layer of the pre-training language model to obtain initial character embedding and vocabulary embedding;

Carrying out vocabulary information fusion processing on each layer in the pre-training language model according to the initial character embedding and vocabulary embedding through an attention mechanism to obtain intermediate character representation of fused vocabulary information;

performing multi-level feature extraction processing on the intermediate character representation through a multi-head self-attention mechanism and a feedforward neural network of the pre-training language model to obtain a contextually relevant character vector representation;

And carrying out coding processing on the predefined entity class description text through the pre-training language model to obtain an initial tag knowledge vector, and carrying out tag knowledge enhancement processing on the initial tag knowledge vector to obtain a tag knowledge vector.

4. The method of claim 1, wherein said determining a document type of said first document from said named entity and said document content comprises:

inserting the named entity into a preset position in the document content to obtain an enhanced text sequence, and performing coding processing on the enhanced text sequence to obtain text characteristic representation;

performing feature extraction processing on image content in the document image through a pre-trained feature extraction model to obtain image feature representation;

performing feature screening on the text feature representation and the image feature representation through a preset multi-modal collaborative pooling module to obtain a multi-modal representation after dimension reduction;

performing feature fusion processing on the multi-modal representation through a preset feature fusion network to obtain a fused multi-modal document representation;

And classifying the multi-mode document representation through a preset feature aggregation network and a full connection layer to obtain the document type of the first document.

5. The method of claim 4, wherein the feature fusion network comprises a modal mixture bias module, an attention mechanism and an encoder of a transducer model, wherein the feature fusion processing is performed on the multimodal representation through a preset feature fusion network to obtain a fused multimodal document representation, and the method comprises:

Performing coarse granularity feature migration processing on the text representation and the image feature representation in the multi-modal representation through a modal mixing migration module in the feature fusion network to obtain initial cross-modal features;

Processing the text features and the image features in the initial cross-modal features through an attention mechanism in a feature fusion network to obtain a visual representation of document perception and a visual enhanced text representation;

and carrying out multi-level feature extraction processing on the visual representation perceived by the document and the text representation enhanced by the encoder of the transducer model in the feature fusion network to obtain the fused multi-mode document representation.

6. The method of claim 1, wherein said naming the first document according to the document type and the named entity comprises:

Acquiring a naming template corresponding to the first document from a preset naming rule base according to the document type, and screening information of the naming entity to obtain a core element for naming the document;

Combining the core elements through the naming template and an intelligent filling algorithm to obtain initial naming;

And repeatedly searching the document library according to the initial naming to obtain a search result, and determining the document naming of the first document according to the initial naming and the search result.

7. The method of claim 6, wherein the combining the core elements through a preset naming template and an intelligent filling algorithm to obtain an initial naming includes:

carrying out semantic analysis processing on the core elements through a natural language processing algorithm to obtain semantic features and importance scores of the elements;

Sorting the core elements according to the semantic features and the importance scores to obtain an ordered element sequence;

Matching the element sequence and the naming template through a dynamic programming algorithm to obtain an element filling scheme;

and filling the naming templates according to the element filling scheme by a context-aware text generation algorithm to obtain initial naming.

8. The method of claim 1, wherein the performing page adjustment on the first document according to the document type and the named entity to obtain a second document comprises:

performing page layout segmentation on the first document through an intelligent segmentation algorithm to obtain a plurality of page elements of the first document;

and matching the plurality of page elements with template elements in a preset format template, and adjusting the page elements in the first document according to a matching result to obtain a second document.

9. An electronic device, the electronic device comprising:

At least one processor, and

A memory communicatively coupled to the at least one processor, wherein,

The memory stores a computer program executable by the at least one processor to enable the at least one processor to perform the document processing method of any one of claims 1-8.

10. A computer program product comprising a computer program which, when executed by a processor, implements the document processing method of any of claims 1-8.