Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a schematic diagram according to a first embodiment of the present application. As shown in fig. 1, the method for establishing an information extraction model according to this embodiment may specifically include the following steps:
s101, obtaining a plurality of training data, and training a deep learning model for preset times by using the training data;
s102, obtaining a plurality of texts to be processed marked with subjects, and predicting the plurality of texts to be processed by using the deep learning model to obtain predicates and objects corresponding to the marked subjects in the texts to be processed;
s103, obtaining confidence data according to each text to be processed, the subject in each text to be processed, and the predicate and the object corresponding to the subject;
s104, after the deep learning model is trained by using the training data and the confidence data, the step of obtaining the text to be processed is executed, and the steps are circularly carried out until the training of the deep learning model reaches a termination condition, so that an information extraction model is obtained.
In the method for establishing the information extraction model provided by this embodiment, after the deep learning model is trained for the preset times by using the training data, the deep learning model is used to predict the obtained text to be processed, which is labeled with the subject, and confidence data is obtained according to the text to be processed, the subject labeled in the text to be processed, and the prediction result, and then the deep learning model is trained by combining the training data and the confidence data, so that the deep learning model can autonomously mine the relationship information in the text to be processed, thereby continuously learning new knowledge, simplifying the training step of the information extraction model, improving the training efficiency of the information extraction model, and enabling the established information extraction model to have the capability of extracting relationship information from texts of different relationship categories.
In this embodiment, the training data obtained by executing S101 is labeled data, and each training data is composed of a training text, a subject in the training text, and a predicate and an object corresponding to the subject.
For example, if the author of the "three-body" novel is liuxixin ", the subject (subject) of the training text is" three-body ", the predicate (predicate) corresponding to the subject" three-body "is" author ", and the object (object) corresponding to the subject" three-body "is" liuxixin ", the present embodiment takes" the author of the "three-body" novel is liuxixin, three-body, author, liuxixin "as one training data.
Specifically, the deep learning model in the present embodiment is composed of a pre-trained model (e.g., ERNIE), a predicate predictor (predictor), and an object predictor (object predictor); the pre-training model is used for obtaining a semantic vector, the predicate predictor is used for predicting a predicate according to the semantic vector, and the object predictor is used for predicting an object according to the semantic vector and the predicate obtained through prediction.
Therefore, in the present embodiment, when the deep learning model is trained for the preset number of times by using the training data in step S101, the optional implementation manners that can be adopted are: for each training data, after converting the training text into a vector sequence, splicing the vector sequence with the position information of the subject in the training text; taking the splicing result as the input of a pre-training model to obtain a semantic vector output by the pre-training model; the semantic vector is used as the input of a predicate predictor, and a predicate corresponding to a subject is used as the output of the predicate predictor; and taking the semantic vector and a vector corresponding to the predicate in the semantic vector as the input of the object predictor, taking the object corresponding to the subject as the output of the object predictor, and training the deep learning model for a preset number of times. The preset times in this embodiment may be set by a user, or may be fixed times, for example, one time.
In the embodiment, the deep learning model is trained for a preset number of times by using training data, so that the trained deep learning model has certain capability of extracting relationship information, that is, a predicate and an object corresponding to a subject in a text can be predicted according to the input text and the subject in the text.
After performing S101 training for a preset number of times on the deep learning model by using the obtained training data, the present embodiment performs S102 to obtain a plurality of texts to be processed, which are marked with subjects, and predicts the plurality of texts to be processed by using the deep learning model, so as to obtain predicates and objects corresponding to the marked subjects in the texts to be processed.
In this embodiment, when S102 is executed to obtain a plurality of texts to be processed labeled with subjects, the optional implementation manner that can be adopted is as follows: and acquiring a plurality of texts containing the relation words as a plurality of texts to be processed, and labeling the subject in each text to be processed, for example, acquiring the plurality of texts from a network.
It can be understood that, in the embodiment, a manner of presetting a relation word list may be adopted to obtain a text containing relation words from a network, where the relation word list may contain words that can be used as relation words; in this embodiment, it may also be determined whether the text acquired from the network is a text containing the relation word by using a relation word recognition model obtained through pre-training, that is, the relation word recognition model may recognize whether the text contains the relation word.
Research shows that the acquired text needs to contain the relation words in order to realize that the deep learning model can better and autonomously mine the relation information in the new text; and by acquiring the text containing the relation words, the deep learning model can extract the relation information from the texts with different relation categories. For example, if the president of the text "a" is B ", the president is the relation word between the country" a "and the name" B "in the text.
In this embodiment, when S102 is executed to predict a plurality of texts to be processed by using a deep learning model to obtain a predicate and an object corresponding to a labeled subject in each text to be processed, an optional implementation manner that can be adopted is as follows: for each text to be processed, after converting the text to be processed into a vector sequence, splicing the vector sequence with the position information of the subject marked in the text to be processed; taking the splicing result as the input of a pre-training model to obtain a semantic vector output by the pre-training model; taking the semantic vector as the input of a predicate predictor, and obtaining a predicate corresponding to the subject according to the output of the predicate predictor; and taking the semantic vector and a vector of the predicate obtained by corresponding to the semantic vector as the input of the object predictor, and obtaining the object corresponding to the subject according to the output of the object predictor.
After executing S102 to obtain a predicate and an object corresponding to a subject labeled in a text to be processed using a deep learning model, executing S103 to obtain confidence data according to each text to be processed, the subject in each text to be processed, and the predicate and the object corresponding to the subject.
It is understood that, when the confidence data is obtained by executing S103, all the texts to be processed, the subject of each text to be processed, and the predicate and the object corresponding to the subject, which are obtained in S102, may be used as the confidence data.
However, before the training of the deep learning model does not reach the termination condition, the accuracy of the prediction results obtained by the deep learning model according to the text to be processed and the subject is relatively unstable, so that the accuracy of the prediction results of some texts to be processed is relatively low, and if the deep learning model is subjected to subsequent training by using the prediction results with relatively low accuracy, the training speed of the deep learning model is reduced, and the training time of the deep learning model is prolonged.
In order to ensure that the confidence data has higher accuracy, in this embodiment, when S103 is executed to obtain the confidence data according to each text to be processed, the subject in each text to be processed, and the predicate and the object corresponding to the subject, the optional practice manners that may be adopted are: determining the confidence of each text to be processed; selecting a text to be processed with confidence coefficient exceeding a preset threshold value; and taking the selected text to be processed, the subject of the text to be processed, and the predicate and the subject corresponding to the subject as confidence data.
When determining the confidence of the text to be processed, the embodiment may use a confidence model obtained through pre-training, where the confidence model may obtain the confidence of the text to be processed according to the text to be processed, a subject in the text to be processed, and at least one of a predicate and an object corresponding to the subject; the confidence may also be calculated according to information such as the number of sources of the text to be processed, the subjects in the text to be processed, and the occurrence numbers of predicates and objects corresponding to the subjects.
It is understood that, after the confidence data is obtained in step S103, the obtained confidence data may be entered into the knowledge base, so that the confidence data may be directly obtained from the knowledge base in the subsequent training process of the deep learning model.
After the confidence data is obtained in step S103, step S104 is executed to train the deep learning model by using the training data and the confidence data, and then the step of obtaining the text to be processed is executed, so that the process is performed in a circulating manner until the training of the deep learning model reaches the termination condition, and the information extraction model is obtained.
Similar to the method when the deep learning model is trained by using the training data when S101 is executed in the present embodiment, when S104 is executed to train the deep learning model by using the confidence data, the present embodiment may adopt an optional implementation manner as follows: for each confidence data, after converting the text to be processed into a vector sequence, splicing the vector sequence with the position information of the subject in the text to be processed; taking the splicing result as the input of a pre-training model to obtain a semantic vector output by the pre-training model; the semantic vector is used as the input of a predicate predictor, and a predicate corresponding to a subject is used as the output of the predicate predictor; and taking the semantic vector and a vector corresponding to the predicate in the semantic vector as the input of the object predictor, taking the object corresponding to the subject as the output of the object predictor, and training the deep learning model.
The way of performing S104 to train the deep learning model by using the training data in the present embodiment is the same as the way of performing S101 to train the deep learning model by using the training data in the present embodiment, and is not described herein again.
That is to say, in the process of establishing the information extraction model, the deep learning model is trained by using two training modes, and since the second training mode can complete training by combining training data and continuously acquired confidence data, the purpose that the deep learning model continuously learns new knowledge through continuously updated data is achieved, so that the established information extraction model can have the capability of extracting relationship information from texts of different relationship categories.
In this embodiment, the termination condition that the deep learning model is trained in S104 may be that the training frequency of the deep learning model exceeds the termination frequency, or that the accuracy of the relationship information extracted by the deep learning model exceeds the termination accuracy.
In the present embodiment, the information extraction model obtained in S104 is executed, and a predicate and an object corresponding to a subject in a text can be output based on the input text and the subject in the text. The text input to the information extraction model may be a text containing related words.
By the method provided by the embodiment, the deep learning model is trained by continuously acquiring the confidence data, so that the deep learning model can autonomously mine the relationship information in different texts, new knowledge is continuously learned, the training steps of the information extraction model are simplified, the training efficiency of the information extraction model is improved, and the established information extraction model has the capability of extracting the relationship information from the texts in different relationship types.
Fig. 2 is a schematic diagram according to a second embodiment of the present application. As shown in fig. 2, the apparatus for creating an information extraction model according to this embodiment includes:
the first training unit 201 is configured to obtain a plurality of training data, and train the deep learning model for a preset number of times by using the training data;
the prediction unit 202 is configured to obtain a plurality of texts to be processed, which are marked with subjects, and predict the plurality of texts to be processed by using the deep learning model to obtain predicates and objects corresponding to the marked subjects in the texts to be processed;
the processing unit 203 is configured to obtain confidence data according to each text to be processed, a subject in each text to be processed, and a predicate and an object corresponding to the subject;
and the second training unit 204 is configured to, after the deep learning model is trained by using the training data and the confidence data, go to a step of acquiring the text to be processed, and perform the steps in a circulating manner until the training of the deep learning model reaches a termination condition, so as to obtain an information extraction model.
The training data acquired by the first training unit 201 in this embodiment is labeled data, and each training data is composed of a training text, a subject in the training text, and a predicate and an object corresponding to the subject.
Specifically, the deep learning model in the present embodiment is composed of a pre-trained model (e.g., ERNIE), a predicate predictor (predictor), and an object predictor (object predictor); the pre-training model is used for obtaining a semantic vector, the predicate predictor is used for predicting a predicate according to the semantic vector, and the object predictor is used for predicting an object according to the semantic vector and the predicate obtained through prediction.
Therefore, when the first training unit 201 in this embodiment performs training on the deep learning model for a preset number of times by using the training data, the optional implementation manner that can be adopted is as follows: for each training data, after converting the training text into a vector sequence, splicing the vector sequence with the position information of the subject in the training text; taking the splicing result as the input of a pre-training model to obtain a semantic vector output by the pre-training model; the semantic vector is used as the input of a predicate predictor, and a predicate corresponding to a subject is used as the output of the predicate predictor; and taking the semantic vector and a vector corresponding to the predicate in the semantic vector as the input of the object predictor, taking the object corresponding to the subject as the output of the object predictor, and training the deep learning model for a preset number of times. The preset times in this embodiment may be set by a user, or may be fixed times, for example, one time.
The first training unit 201 in this embodiment trains the deep learning model a preset number of times using training data, so that the trained deep learning model has a certain capability of extracting relationship information, that is, a predicate and an object corresponding to a subject in a text can be predicted from the input text and the subject in the text.
In the present embodiment, after the first training unit 201 performs training on the deep learning model for the preset number of times by using the acquired training data, the prediction unit 202 acquires a plurality of texts to be processed, to which the subjects are labeled, and predicts the plurality of texts to be processed by using the deep learning model, so as to obtain the predicates and objects corresponding to the labeled subjects in each text to be processed.
When the prediction unit 202 in this embodiment obtains a plurality of texts to be processed labeled with a subject, the optional implementation manner that can be adopted is as follows: and acquiring a plurality of texts containing the relation words as a plurality of texts to be processed, and labeling the subject in each text to be processed, for example, acquiring the plurality of texts from a network.
It is understood that the prediction unit 202 may obtain the text containing the relation words from the network by presetting a relation word list, where the relation word list may include words that can be used as the relation words; the prediction unit 202 may also determine whether the text obtained from the network is a text containing the relation words by using a relation word recognition model obtained through pre-training, that is, the relation word recognition model can recognize whether the text contains the relation words.
When the prediction unit 202 in this embodiment predicts a plurality of texts to be processed by using a deep learning model to obtain a predicate and an object corresponding to a labeled subject in each text to be processed, an optional implementation manner that can be adopted is as follows: for each text to be processed, after converting the text to be processed into a vector sequence, splicing the vector sequence with the position information of the subject marked in the text to be processed; taking the splicing result as the input of a pre-training model to obtain a semantic vector output by the pre-training model; taking the semantic vector as the input of a predicate predictor, and obtaining a predicate corresponding to the subject according to the output of the predicate predictor; and taking the semantic vector and a vector of the predicate obtained by corresponding to the semantic vector as the input of the object predictor, and obtaining the object corresponding to the subject according to the output of the object predictor.
In the present embodiment, after the predicate and the object corresponding to the subject labeled in the text to be processed are obtained by the prediction unit 202 using the deep learning model, the confidence data is obtained by the processing unit 203 from each text to be processed, the subject in each text to be processed, and the predicate and the object corresponding to the subject.
It is understood that, when obtaining the confidence data, the processing unit 203 in this embodiment may use all the texts to be processed, the subject of each text to be processed, and the predicate and object corresponding to the subject, which are acquired by the prediction unit 202, as the confidence data.
In order to ensure that the confidence data has higher accuracy, when the processing unit 203 in this embodiment obtains the confidence data according to each text to be processed, the subject in each text to be processed, and the predicate and the object corresponding to the subject, an optional practice manner that may be adopted is: determining the confidence of each text to be processed; selecting a text to be processed with confidence coefficient exceeding a preset threshold value; and taking the selected text to be processed, the subject of the text to be processed, and the predicate and the subject corresponding to the subject as confidence data.
When determining the confidence of the text to be processed, the processing unit 203 may use a confidence model obtained by pre-training to obtain the confidence; the processing unit 203 may also calculate the confidence level according to information such as the number of sources of the text to be processed, the subjects in the text to be processed, and the number of occurrences of the predicates and objects corresponding to the subjects.
It is understood that, after obtaining the confidence data, the processing unit 203 in this embodiment may enter the obtained confidence data into the knowledge base, so as to obtain the confidence data directly from the knowledge base in the subsequent training process of the deep learning model.
In this embodiment, after the processing unit 203 obtains the confidence data, the second training unit 204 trains the deep learning model by using the training data and the confidence data, and then the step of obtaining the text to be processed is executed, so that the process is performed in a loop until the training of the deep learning model reaches the termination condition, and the information extraction model is obtained.
Similar to the method of the present embodiment when the first training unit 201 trains the deep learning model using the training data, the second training unit 204 in the present embodiment may adopt the following optional implementation manners when training the deep learning model using the confidence data: for each confidence data, after converting the text to be processed into a vector sequence, splicing the vector sequence with the position information of the subject in the text to be processed; taking the splicing result as the input of a pre-training model to obtain a semantic vector output by the pre-training model; the semantic vector is used as the input of a predicate predictor, and a predicate corresponding to a subject is used as the output of the predicate predictor; and taking the semantic vector and a vector corresponding to the predicate in the semantic vector as the input of the object predictor, taking the object corresponding to the subject as the output of the object predictor, and training the deep learning model.
The way in which the second training unit 204 trains the deep learning model using the training data in this embodiment is the same as the way in which the first training unit 201 trains the deep learning model using the training data in this embodiment, and is not described herein again.
In this embodiment, the termination condition that the second training unit 204 trains the deep learning model may be that the training frequency of the deep learning model exceeds the termination frequency, or that the accuracy of the relationship information extracted by the deep learning model exceeds the termination accuracy.
In the present embodiment, the information extraction model obtained by the second training unit 204 can output a predicate and an object corresponding to a subject in a text based on the input text and the subject in the text. The text input to the information extraction model may be a text containing related words.
There is also provided, in accordance with an embodiment of the present application, an electronic device, a computer-readable storage medium, and a computer program product.
FIG. 3 illustrates a schematic block diagram of an example electronic device 300 that can be used to implement embodiments of the present disclosure. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital assistants, cellular telephones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the disclosure described and/or claimed herein.
As shown in fig. 3, the apparatus 300 includes a computing unit 301 that can perform various appropriate actions and processes according to a computer program stored in a Read Only Memory (ROM)302 or a computer program loaded from a storage unit 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data required for the operation of the device 300 can also be stored. The calculation unit 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.
Various components in device 300 are connected to I/O interface 305, including: an input unit 306 such as a keyboard, a mouse, or the like; an output unit 307 such as various types of displays, speakers, and the like; a storage unit 308 such as a magnetic disk, optical disk, or the like; and a communication unit 309 such as a network card, modem, wireless communication transceiver, etc. The communication unit 309 allows the device 300 to exchange information/data with other devices via a computer network such as the internet and/or various telecommunication networks.
The computing unit 301 may be a variety of general and/or special purpose processing components having processing and computing capabilities. Some examples of the computing unit 301 include, but are not limited to, a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), various dedicated Artificial Intelligence (AI) computing chips, various computing units running machine learning model algorithms, a Digital Signal Processor (DSP), and any suitable processor, controller, microcontroller, and so forth. The calculation unit 301 executes the respective methods and processes described above, such as a method of building an information extraction model. For example, in some embodiments, the method of creating an information extraction model may be implemented as a computer software program tangibly embodied in a machine-readable medium, such as storage unit 308. In some embodiments, part or all of the computer program may be loaded and/or installed onto device 300 via ROM 302 and/or communication unit 309. When loaded into RAM 303 and executed by computing unit 301, may perform one or more of the steps of the above described method of creating an information extraction model. Alternatively, in other embodiments, the computing unit 301 may be configured to perform the method of establishing the information extraction model by any other suitable means (e.g., by means of firmware).
Various implementations of the systems and techniques described here above may be implemented in digital electronic circuitry, integrated circuitry, Field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), system on a chip (SOCs), load programmable logic devices (CPLDs), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
Program code for implementing the methods of the present disclosure may be written in any combination of one or more programming languages. These program codes may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus, such that the program codes, when executed by the processor or controller, cause the functions/operations specified in the flowchart and/or block diagram to be performed. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), the internet, and blockchain networks.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The Server can be a cloud Server, also called a cloud computing Server or a cloud host, and is a host product in a cloud computing service system, so as to solve the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service ("Virtual Private Server", or simply "VPS"). The server may also be a server of a distributed system, or a server incorporating a blockchain.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present disclosure may be executed in parallel, sequentially, or in different orders, as long as the desired results of the technical solutions disclosed in the present disclosure can be achieved, and the present disclosure is not limited herein.
The above detailed description should not be construed as limiting the scope of the disclosure. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present disclosure should be included in the scope of protection of the present disclosure.