[go: up one dir, main page]

CN112597307A - Extraction method, device and equipment of figure action related data and storage medium - Google Patents

Extraction method, device and equipment of figure action related data and storage medium Download PDF

Info

Publication number
CN112597307A
CN112597307A CN202011545182.2A CN202011545182A CN112597307A CN 112597307 A CN112597307 A CN 112597307A CN 202011545182 A CN202011545182 A CN 202011545182A CN 112597307 A CN112597307 A CN 112597307A
Authority
CN
China
Prior art keywords
text data
analysis
data
preset
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011545182.2A
Other languages
Chinese (zh)
Inventor
蔡壮壮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
OneConnect Financial Technology Co Ltd Shanghai
Original Assignee
OneConnect Financial Technology Co Ltd Shanghai
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by OneConnect Financial Technology Co Ltd Shanghai filed Critical OneConnect Financial Technology Co Ltd Shanghai
Priority to CN202011545182.2A priority Critical patent/CN112597307A/en
Publication of CN112597307A publication Critical patent/CN112597307A/en
Priority to PCT/CN2021/124629 priority patent/WO2022134779A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Creation or modification of classes or clusters
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/335Filtering based on additional data, e.g. user or group profiles
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Machine Translation (AREA)

Abstract

本发明涉及人工智能领域,公开了一种人物动作相关数据的提取方法、装置、设备及存储介质,用于通过中文自然语言处理HanLP算法对文本数据进行句法分析和词性标注,并筛选出正在发生的行为动作的相关数据,提高了数据提取的精确度,降低了提取的数据集的噪声。人物动作相关数据的提取方法包括:获取预置的文本数据;对预置的文本数据进行分类处理,筛选出包含人物信息的文本数据,得到初始文本数据;对初始文本数据进行分词处理和词性标注,生成中间文本数据;对中间文本数据进行依存句法分析和语义依存分析,生成分析文本数据;对分析文本数据进行过滤处理,生成目标文本数据。此外,本发明还涉及区块链技术,目标文本数据可存储于区块链中。

Figure 202011545182

The invention relates to the field of artificial intelligence, and discloses a method, device, equipment and storage medium for extracting data related to character actions, which are used for syntactic analysis and part-of-speech tagging of text data through Chinese natural language processing HanLP algorithm, and screening out the The relevant data of the behavior and actions improve the accuracy of data extraction and reduce the noise of the extracted data set. The method for extracting character action-related data includes: acquiring preset text data; classifying the preset text data, filtering out text data containing character information, and obtaining initial text data; performing word segmentation processing and part-of-speech tagging on the initial text data , generate intermediate text data; perform dependency syntax analysis and semantic dependency analysis on the intermediate text data to generate analysis text data; filter the analysis text data to generate target text data. In addition, the present invention also relates to blockchain technology, and target text data can be stored in the blockchain.

Figure 202011545182

Description

Extraction method, device and equipment of figure action related data and storage medium
Technical Field
The invention relates to the field of natural language processing, in particular to a method, a device, equipment and a storage medium for extracting character motion related data.
Background
The natural language processing comprises two parts of natural language understanding and natural language generation, and the realization of the natural language communication between a human and a machine means that a computer can understand the meaning of a natural language text and express given intention, thought and the like by the natural language text, wherein the former is called natural language understanding, the latter is called natural language generation, and the natural language processing is an important direction in the fields of computer science and artificial intelligence, wherein the Chinese natural language processing HanLP algorithm is a text data extraction algorithm and comprises word segmentation, part of speech tagging, entity recognition and the like.
In recent years, under the promotion of big data and deep learning, natural language processing technology is rapidly developed, at present, the extraction algorithm of the main predicate object and the predicate object of text data is roughly divided into two methods, one is a method based on deep learning, the other is a method based on language rules, the method based on deep learning needs a large amount of labeled data, the extraction effect of language description related to character actions is not ideal, the extraction method based on the language rules has large errors, the extraction method does not meet the requirements of data extraction related to character action, and the extracted data has large noise.
Disclosure of Invention
The invention provides a method, a device, equipment and a storage medium for extracting character action related data, which are used for carrying out syntactic analysis and part-of-speech tagging on text data through a Chinese natural language processing HanLP algorithm, and screening out the related data of the occurring behavior action based on the grammatical relation and the modal verb of a subject-predicate guest, thereby improving the accuracy of data extraction and reducing the noise of an extracted data set.
The invention provides a method for extracting data related to human actions, which comprises the following steps: acquiring preset text data, wherein the preset text data is novel text data containing character behavior and actions; classifying the preset text data, and screening out text data containing character information to obtain initial text data; performing word segmentation processing and part-of-speech tagging on the initial text data based on a preset Chinese natural language processing HanLP algorithm to generate intermediate text data; performing dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset HanLP algorithm to generate analysis text data; and filtering the analyzed text data to obtain target text data containing a plurality of character behaviors.
Optionally, in a first implementation manner of the first aspect of the present invention, the classifying the preset text data, and screening out text data including personal information, to obtain initial text data includes: classifying the preset text data according to preset classification rules, screening out text data containing character pronouns or character names, and generating classified text data; and identifying target punctuations in the classified text data, deleting text data containing character conversations according to the target punctuations, and generating initial text data, wherein the target punctuations are used for indicating character conversations.
Optionally, in a second implementation manner of the first aspect of the present invention, the performing word segmentation processing and part-of-speech tagging on the initial text data based on a preset chinese natural language processing HanLP algorithm, and generating intermediate text data includes: sentence division processing is carried out on the initial text data through punctuations to obtain a sentence division result; performing word segmentation processing on the sentence segmentation result based on a preset Chinese natural language processing HanLP algorithm to obtain a word segmentation result; and performing part-of-speech tagging on the word segmentation result based on the preset Chinese natural language processing HanLP algorithm and a preset HanLP part-of-speech tagging set to generate intermediate text data.
Optionally, in a third implementation manner of the first aspect of the present invention, the performing, based on the preset chinese natural language processing HanLP algorithm, dependency syntax analysis and semantic dependency analysis on the intermediate text data, and generating analyzed text data includes: calling the preset Chinese natural language processing HanLP algorithm to identify and analyze the relation between grammatical components in the intermediate text data, and when the core relation of an object points to a verb predicate, extracting the core subject-predicate relationship to generate first analysis text data; calling the preset Chinese natural language processing HanLP algorithm to analyze semantic association in the intermediate text data, determining a relation type, screening out text data containing a construction relation, and generating second analysis text data; and combining the first analysis text data and the second analysis text data to generate analysis text data.
Optionally, in a fourth implementation manner of the first aspect of the present invention, the filtering the analysis text data to generate target text data, where the target text data includes the extracted multiple character behaviors includes: acquiring the analysis text data, filtering the text data containing the emotional verbs in the analysis text data, and generating filtered text data; and carrying out normalization processing on the filtered text data to generate target text data containing a plurality of character behaviors and actions.
Optionally, in a fifth implementation manner of the first aspect of the present invention, the filtering the text data that includes the verb-of-state in the analyzed text data, and generating filtered text data includes: identifying text data containing verb emotion in the analysis text data, wherein the verb emotion is used for indicating character behavior actions which do not occur; and deleting the text data containing the emotional verbs to generate filtered text data.
Optionally, in a sixth implementation manner of the first aspect of the present invention, after performing dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset chinese natural language processing HanLP algorithm to generate analyzed text data, and before performing filtering processing on the analyzed text data to generate target text data, the method further includes: and identifying whether the analysis text data contains the character behavior action which occurs in the past or not, if the analysis text data does not contain the character behavior action which occurs in the past, retaining the analysis text data, and if the analysis text data contains the character behavior action which occurs in the past, deleting the related data containing the character behavior action which occurs in the past.
The second aspect of the present invention provides an extraction apparatus for data related to human actions, comprising: the acquisition module is used for acquiring preset text data, wherein the preset text data is novel text data containing character behavior and actions; the classification module is used for classifying the preset text data, screening out the text data containing the character information and obtaining initial text data; the word segmentation module is used for carrying out word segmentation processing and part-of-speech tagging on the initial text data based on a preset Chinese natural language processing HanLP algorithm to generate intermediate text data; the analysis module is used for carrying out dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm to generate analysis text data; and the filtering module is used for filtering the analyzed text data to obtain target text data containing a plurality of character behaviors and actions.
Optionally, in a first implementation manner of the second aspect of the present invention, the classification module includes: the classification unit is used for classifying the preset text data according to a preset classification rule, screening out text data containing character pronouns or character names and generating classified text data; and the deleting unit is used for identifying a target punctuation mark in the classified text data, deleting text data containing character conversation according to the target punctuation mark and generating initial text data, wherein the target punctuation mark is used for indicating character conversation.
Optionally, in a second implementation manner of the second aspect of the present invention, the word segmentation module includes: a sentence dividing unit, configured to perform sentence dividing processing on the initial text data through punctuation marks to obtain a sentence dividing result; the word segmentation unit is used for carrying out word segmentation processing on the sentence segmentation result based on a preset Chinese natural language processing HanLP algorithm to obtain a word segmentation result; and the part-of-speech tagging unit is used for carrying out part-of-speech tagging on the word segmentation result based on the preset HanLP algorithm for Chinese natural language processing and a preset HanLP part-of-speech tagging set so as to generate intermediate text data.
Optionally, in a third implementation manner of the second aspect of the present invention, the analysis module includes: the first analysis unit is used for calling the preset Chinese natural language processing HanLP algorithm to identify and analyze the relation between grammatical elements in the intermediate text data, and when the core relation of the object points to a verb predicate, the core subject-predicate relation is extracted to generate first analysis text data; the second analysis unit is used for calling the preset Chinese natural language processing HanLP algorithm to analyze semantic association in the intermediate text data, determining the relation type, screening out text data containing the construction relation and generating second analysis text data; and the merging unit is used for merging the first analysis text data and the second analysis text data to generate analysis text data.
Optionally, in a fourth implementation manner of the second aspect of the present invention, the filtering module includes: the filtering unit is used for filtering the text data containing the emotional verbs in the analysis text data to generate filtered text data; and the normalization unit is used for performing normalization processing on the filtered text data to generate target text data containing a plurality of character behaviors and actions.
Optionally, in a fifth implementation manner of the second aspect of the present invention, the filtering unit is specifically configured to: identifying text data containing verb emotion in the analysis text data, wherein the verb emotion is used for indicating character behavior actions which do not occur; and deleting the text data containing the emotional verbs to generate filtered text data.
Optionally, in a sixth implementation manner of the second aspect of the present invention, the apparatus further includes: and the identification module is used for identifying whether the analysis text data contains the character behavior action which occurs in the past or not, keeping the analysis text data when the analysis text data does not contain the character behavior action which occurs in the past, and deleting the related data containing the character behavior action which occurs in the past when the analysis text data contains the character behavior action which occurs in the past.
A third aspect of the present invention provides a character motion-related data extraction device, including: a memory and at least one processor, the memory having instructions stored therein; the at least one processor calls the instructions in the memory to enable the extraction device of the human action related data to execute the extraction method of the human action related data.
A fourth aspect of the present invention provides a computer-readable storage medium having stored therein instructions, which, when run on a computer, cause the computer to execute the above-described extraction method of character motion-related data.
In the technical scheme provided by the invention, preset text data is obtained, wherein the preset text data is novel text data containing character behavior and actions; classifying the preset text data, and screening out text data containing character information to obtain initial text data; performing word segmentation processing and part-of-speech tagging on the initial text data based on a preset Chinese natural language processing HanLP algorithm to generate intermediate text data; performing dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset HanLP algorithm to generate analysis text data; and filtering the analyzed text data to obtain target text data containing a plurality of character behaviors. In the embodiment of the invention, the text data is subjected to syntactic analysis and part-of-speech tagging through the HanLP algorithm for Chinese natural language processing, and relevant data of the behavior action which is happening is screened out based on the grammatical relation and the modal verb of the subject-predicate guest, so that the accuracy of data extraction is improved, and the noise of the extracted data set is reduced.
Drawings
FIG. 1 is a schematic diagram of an embodiment of a method for extracting data related to human actions according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of another embodiment of a method for extracting data related to human actions according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an embodiment of an apparatus for extracting data related to human actions according to an embodiment of the present invention;
fig. 4 is a schematic diagram of another embodiment of the device for extracting the data related to the human actions according to the embodiment of the invention;
fig. 5 is a schematic diagram of an embodiment of a device for extracting data related to human actions in an embodiment of the present invention.
Detailed Description
The embodiment of the invention provides a method, a device, equipment and a storage medium for extracting character action related data, which are used for carrying out syntactic analysis and part-of-speech tagging on text data through a Chinese natural language processing HanLP algorithm and screening out the related data of the occurring behavior action based on the grammatical relation of a leading verb and a predicate verb, thereby improving the accuracy of data extraction and reducing the noise of an extracted data set.
The terms "first," "second," "third," "fourth," and the like in the description and in the claims, as well as in the drawings, if any, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that the embodiments described herein may be practiced otherwise than as specifically illustrated or described herein. Furthermore, the terms "comprises," "comprising," or "having," and any variations thereof, are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
For convenience of understanding, a detailed flow of the embodiment of the present invention is described below, and referring to fig. 1, an embodiment of the method for extracting data related to a character action according to the embodiment of the present invention includes:
101. and acquiring preset text data, wherein the preset text data is novel text data containing character behavior and actions.
The server acquires preset text data, wherein the preset text data is novel text data containing character behavior and actions. The server obtains a plurality of novel texts in the appointed label from the network through the crawler, and a preset data set is made based on the plurality of novel texts.
It is to be understood that the executing subject of the present invention may be an extracting apparatus of data related to human actions, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
102. And classifying the preset text data, and screening out the text data containing the character information to obtain initial text data.
The server classifies the preset text data, screens out the text data containing the character information and obtains initial text data. Specifically, the server classifies preset text data according to preset classification rules, screens out text data containing character pronouns or character names, and generates classified text data; and the server filters the classified text data, identifies target punctuations and deletes text data containing character conversations to generate initial text data, wherein the target punctuations are used for indicating the character conversations. The server divides the preset text data into two types according to whether the character information is contained or not, eliminates the text data which does not contain the character information, for example, "dog runs in a yard", "bird chamois on the outside of a window", "squirrel uses the fluffy big tail as a quilt cover", and the like, and screens out the text data which comprises the character pronouns or the names of the characters, wherein the character pronouns comprise me(s), you(s), he(s) and she(s). The preset punctuation mark is a combination of 'colon' and 'double quotation marks' and is used for indicating character dialogue, and although the text data with the character dialogue contains character information, the method is not suitable for analyzing and extracting the data related to the character behaviors and actions in the scheme, so that the data need to be removed.
103. And performing word segmentation processing and part-of-speech tagging on the initial text data based on a preset Chinese natural language processing HanLP algorithm to generate intermediate text data.
The server carries out word segmentation processing and part-of-speech tagging on the initial text data based on a preset HanLP algorithm of Chinese natural language processing to generate intermediate text data. Specifically, the server performs sentence division processing on the initial text data through punctuation marks to obtain a sentence division result; the server carries out word segmentation processing on the sentence segmentation result based on a preset Chinese natural language processing HanLP algorithm to obtain a word segmentation result; and the server carries out part-of-speech tagging on the word segmentation result based on a preset Chinese natural language processing HanLP algorithm and a preset HanLP part-of-speech tagging set to generate intermediate text data. The word is the most basic unit of the text, the word segmentation is the most basic step in natural language processing, the word segmentation algorithm is divided into a dictionary method and a statistical method, wherein the method based on the dictionary and the artificial rules is to match the word to be analyzed with the entry in the dictionary according to a certain strategy, and the statistical method is the statistical frequency of the occurrence of the basic character string in the corpus. Each punctuation mark is provided with a corresponding regular expression, sentence division processing is carried out on the initial text data through the punctuation marks, a long sentence is divided into a plurality of short sentences, and first text data are obtained. The Chinese natural language processing (HanLP) is a toolkit consisting of a series of models and algorithms, aims to promote the application of natural language processing in a production environment, has the characteristics of complete functions, high performance, clear architecture, new linguistic data and customization, and performs word segmentation processing on text data through the HanLP firstly in the scheme, for example, inputting 'Xiaoming is eating', and the result after word segmentation is 'Xiaoming', 'eating'. The part-of-speech tagging refers to a process of tagging each word in the word segmentation result with a correct part-of-speech, namely a process of determining that each word in the word segmentation result is a noun, a verb, an adjective or other parts-of-speech, in the scheme, part-of-speech tagging is performed on the word segmentation result through a preset HanLP part-of-speech tagging set, the part-of-speech corresponding to "Xiaoming" is a "noun", the part-of-speech corresponding to "now" is a "subtext", and the part-of-speech corresponding to "eating" is a "verb".
104. And performing dependency syntax analysis and semantic dependency analysis on the intermediate text data based on a preset Chinese natural language processing HanLP algorithm to generate analysis text data.
The server carries out dependency syntax analysis and semantic dependency analysis on the intermediate text data based on a preset Chinese natural language processing HanLP algorithm to generate analysis text data. Dependency Parsing (DP) analyzes the dependency relationship between the components in the language units to reveal the syntax structure, i.e. analyzes the grammatical components such as "major predicate object", "shape complement" and the like in the sentence, and analyzes the relationship of each component, and Semantic Dependency Parsing (SDP) analyzes the semantic association between the language units in the sentence and presents the semantic association as the dependency structure, the semantic dependency parsing is not affected by the syntax structure, the language units with direct semantic association are directly connected with the dependency arcs and labeled with the corresponding semantic relationships, which is also an important difference between the semantic dependency parsing and the syntax parsing. For example, "xiaoming has eaten an apple" and "an apple has been eaten by xiaoming", although three sentences have different syntactic structures and produce different syntactic analysis results, the semantic relationship among the language units in the three sentences does not change, and the same semantic information is expressed, that is, xiaoming implements an eating action, which is implemented on an apple.
105. And filtering the analysis text data to obtain target text data containing a plurality of character behaviors.
And the server filters the analysis text data to obtain target text data containing a plurality of character behaviors and actions. Specifically, the server acquires analysis text data, filters the text data containing the emotional verbs in the analysis text data, and generates filtered text data; and the server performs normalization processing on the filtered text data to generate target text data, wherein the target text data comprises the extracted multiple character behavior actions. After the screened main predicate person acts, when an emotional verb modifying the predicate verb appears in the sentence, the condition is not met, because the sentence presents an action or a state at a certain future time due to the appearance of the emotional verb, the person action does not occur yet, for example, "a little will go out to swing" and the action of swinging does not occur yet, and therefore, related text data needs to be filtered and deleted.
In the embodiment of the invention, the text data is subjected to syntactic analysis and part-of-speech tagging through the HanLP algorithm for Chinese natural language processing, and relevant data of the behavior action which is happening is screened out based on the grammatical relation and the modal verb of the subject-predicate guest, so that the accuracy of data extraction is improved, and the noise of the extracted data set is reduced.
Referring to fig. 2, another embodiment of the method for extracting data related to human actions according to the embodiment of the present invention includes:
201. and acquiring preset text data, wherein the preset text data is novel text data containing character behavior and actions.
The server acquires preset text data, wherein the preset text data is novel text data containing character behavior and actions. The server obtains a plurality of novel texts in the appointed label from the network through the crawler, and a preset data set is made based on the plurality of novel texts.
It is to be understood that the executing subject of the present invention may be an extracting apparatus of data related to human actions, and may also be a terminal or a server, which is not limited herein. The embodiment of the present invention is described by taking a server as an execution subject.
202. And classifying the preset text data, and screening out the text data containing the character information to obtain initial text data.
The server classifies the preset text data, screens out the text data containing the character information and obtains initial text data. Specifically, the server classifies preset text data according to preset classification rules, screens out text data containing character pronouns or character names, and generates classified text data; and the server filters the classified text data, identifies target punctuations and deletes text data containing character conversations to generate initial text data, wherein the target punctuations are used for indicating the character conversations. The server divides the preset text data into two types according to whether the character information is contained or not, eliminates the text data which does not contain the character information, for example, "dog runs in a yard", "bird chamois on the outside of a window", "squirrel uses the fluffy big tail as a quilt cover", and the like, and screens out the text data which comprises the character pronouns or the names of the characters, wherein the character pronouns comprise me(s), you(s), he(s) and she(s). The preset punctuation mark is a combination of 'colon' and 'double quotation marks' and is used for indicating character dialogue, and although the text data with the character dialogue contains character information, the method is not suitable for analyzing and extracting the data related to the character behaviors and actions in the scheme, so that the data need to be removed.
203. And performing word segmentation processing and part-of-speech tagging on the initial text data based on a preset Chinese natural language processing HanLP algorithm to generate intermediate text data.
The server carries out word segmentation processing and part-of-speech tagging on the initial text data based on a preset HanLP algorithm of Chinese natural language processing to generate intermediate text data. Specifically, the server performs sentence division processing on the initial text data through punctuation marks to obtain a sentence division result; the server carries out word segmentation processing on the sentence segmentation result based on a preset Chinese natural language processing HanLP algorithm to obtain a word segmentation result; and the server carries out part-of-speech tagging on the word segmentation result based on a preset Chinese natural language processing HanLP algorithm and a preset HanLP part-of-speech tagging set to generate intermediate text data. The word is the most basic unit of the text, the word segmentation is the most basic step in natural language processing, the word segmentation algorithm is divided into a dictionary method and a statistical method, wherein the method based on the dictionary and the artificial rules is to match the word to be analyzed with the entry in the dictionary according to a certain strategy, and the statistical method is the statistical frequency of the occurrence of the basic character string in the corpus. Each punctuation mark is provided with a corresponding regular expression, sentence division processing is carried out on the initial text data through the punctuation marks, a long sentence is divided into a plurality of short sentences, and first text data are obtained. The Chinese natural language processing (HanLP) is a toolkit consisting of a series of models and algorithms, aims to promote the application of natural language processing in a production environment, has the characteristics of complete functions, high performance, clear architecture, new linguistic data and customization, and performs word segmentation processing on text data through the HanLP firstly in the scheme, for example, inputting 'Xiaoming is eating', and the result after word segmentation is 'Xiaoming', 'eating'. The part-of-speech tagging refers to a process of tagging each word in the word segmentation result with a correct part-of-speech, namely a process of determining that each word in the word segmentation result is a noun, a verb, an adjective or other parts-of-speech, in the scheme, part-of-speech tagging is performed on the word segmentation result through a preset HanLP part-of-speech tagging set, the part-of-speech corresponding to "Xiaoming" is a "noun", the part-of-speech corresponding to "now" is a "subtext", and the part-of-speech corresponding to "eating" is a "verb".
204. And calling a preset Chinese natural language processing HanLP algorithm to identify and analyze the relation between grammatical components in the intermediate text data, and when the core relation of the object points to a verb predicate, extracting the core subject-predicate relationship to generate first analysis text data.
And the server calls a preset Chinese natural language processing HanLP algorithm to identify and analyze the relation between grammatical components in the intermediate text data, and when the core relation of the object points to a verb of a predicate, the core subject-predicate relation is extracted to generate first analysis text data. For example, "xiaoming is playing in a room," xiaoming "belongs to a lexical subject," positive "belongs to a lexical object," in "belongs to a prepositional modifier," room "belongs to a prepositional site modifier," lii "belongs to a temporal preposition," playing "belongs to a verb predicate," game "belongs to a direct object, and the verb" plays "is a core word, and thus the sentence can be extracted as a" xiaoming game "including a relationship of a subject and a predicate.
205. And calling a preset Chinese natural language processing HanLP algorithm to analyze semantic association in the intermediate text data, determining the relationship type, screening out text data containing the construction relationship, and generating second analysis text data.
And the server calls a preset Chinese natural language processing HanLP algorithm to analyze semantic association in the intermediate text data, determines the relationship type, screens out text data containing the construction relationship and generates second analysis text data. The relationship types comprise an event relation, a party relation, an event-sensitive relation, a lead relation, an event-related relation, a guest relation, an event relation, a source relation, an event-related relation and a comparison role, for example, "sending her flowers with little brightness", the semantic relationship type in the sentence is the event relation, "sending flowers" is a specific action made by a person, the screening condition in the scheme is met, "sending flowers with little brightness in a room while watching television and speaking", the sentence comprises a plurality of predicate verbs "eat", "see" and "speak", and the predicate verbs have an order-bearing relation, and the screening condition in the scheme is also met.
206. And combining the first analysis text data and the second analysis text data to generate analysis text data.
And the server combines the first analysis text data and the second analysis text data to generate analysis text data. In the scheme, word segmentation, part-of-speech tagging, syntactic analysis and semantic analysis are all based on a HanLP algorithm, each layer can form an independent data result, and the data result of each layer can be used independently and also can be transmitted to the next layer for further analysis.
207. And filtering the analysis text data to obtain target text data containing a plurality of character behaviors.
And the server filters the analysis text data to obtain target text data containing a plurality of character behaviors and actions. Specifically, the server acquires analysis text data, filters the text data containing the emotional verbs in the analysis text data, and generates filtered text data; and the server performs normalization processing on the filtered text data to generate target text data, wherein the target text data comprises the extracted multiple character behavior actions. After the screened main predicate person acts, when an emotional verb modifying the predicate verb appears in the sentence, the condition is not met, because the sentence presents an action or a state at a certain future time due to the appearance of the emotional verb, the person action does not occur yet, for example, "a little will go out to swing" and the action of swinging does not occur yet, and therefore, related text data needs to be filtered and deleted.
In the embodiment of the invention, the text data is subjected to syntactic analysis and part-of-speech tagging through the HanLP algorithm for Chinese natural language processing, and relevant data of the behavior action which is happening is screened out based on the grammatical relation and the modal verb of the subject-predicate guest, so that the accuracy of data extraction is improved, and the noise of the extracted data set is reduced.
With reference to fig. 3, the method for extracting data related to human actions in the embodiment of the present invention is described above, and an embodiment of the apparatus for extracting data related to human actions in the embodiment of the present invention includes:
the acquisition module 301 is configured to acquire preset text data, where the preset text data is novel text data containing character behaviors and actions;
the classification module 302 is configured to classify preset text data, and screen out text data including character information to obtain initial text data;
the word segmentation module 303 is configured to perform word segmentation processing and part-of-speech tagging on the initial text data based on a preset chinese natural language processing HanLP algorithm, and generate intermediate text data;
the analysis module 304 is used for performing dependency syntax analysis and semantic dependency analysis on the intermediate text data based on a preset Chinese natural language processing HanLP algorithm to generate analysis text data;
and the filtering module 305 is configured to filter the analysis text data to obtain target text data including a plurality of character behaviors.
In the embodiment of the invention, the text data is subjected to syntactic analysis and part-of-speech tagging through the HanLP algorithm for Chinese natural language processing, and relevant data of the behavior action which is happening is screened out based on the grammatical relation and the modal verb of the subject-predicate guest, so that the accuracy of data extraction is improved, and the noise of the extracted data set is reduced.
Referring to fig. 4, another embodiment of the device for extracting data related to human actions according to the embodiment of the present invention includes:
the acquisition module 301 is configured to acquire preset text data, where the preset text data is novel text data containing character behaviors and actions;
the classification module 302 is configured to classify preset text data, and screen out text data including character information to obtain initial text data;
the word segmentation module 303 is configured to perform word segmentation processing and part-of-speech tagging on the initial text data based on a preset chinese natural language processing HanLP algorithm, and generate intermediate text data;
the analysis module 304 is used for performing dependency syntax analysis and semantic dependency analysis on the intermediate text data based on a preset Chinese natural language processing HanLP algorithm to generate analysis text data;
and the filtering module 305 is configured to filter the analysis text data to obtain target text data including a plurality of character behaviors.
Optionally, the classification module 302 includes:
a classification unit 3021 configured to classify preset text data according to preset classification rules, screen out text data including a character pronoun or a character name, and generate classified text data;
a deleting unit 3022 configured to recognize a target punctuation mark in the classified text data and delete text data including a human dialogue according to the target punctuation mark to generate initial text data, the target punctuation mark being used to indicate the human dialogue.
Optionally, the word segmentation module 303 includes:
a clause unit 3031, configured to perform clause processing on the initial text data through punctuation marks to obtain a clause result;
a word segmentation unit 3032, configured to perform word segmentation processing on the sentence segmentation result based on a preset chinese natural language processing HanLP algorithm, to obtain a word segmentation result;
and a part-of-speech tagging unit 3033, configured to perform part-of-speech tagging on the word segmentation result based on a preset chinese natural language processing HanLP algorithm and a preset HanLP part-of-speech tagging set, and generate intermediate text data.
Optionally, the analysis module 304 includes:
the first analysis unit 3041 is configured to invoke a preset chinese natural language processing HanLP algorithm to identify and analyze relationships between grammatical elements in the intermediate text data, and when a core relationship of an object points to a verb predicate, extract a core subject-predicate relationship to generate first analysis text data;
a second analysis unit 3042, configured to invoke a preset chinese natural language processing HanLP algorithm to analyze semantic association in the intermediate text data, determine a relationship type, screen out text data including a relationship between events, and generate second analysis text data;
a merging unit 3043, configured to merge the first analysis text data and the second analysis text data to generate analysis text data.
Optionally, the filtering module 305 includes:
the filtering unit 3051, configured to filter and analyze text data including the verb in the text data, and generate filtered text data;
and a normalization unit 3052, configured to perform normalization processing on the filtered text data, and generate target text data including a plurality of character behaviors.
Optionally, after the analyzing module 304 and before the filtering module 305, the device for extracting the data related to the human actions further includes:
and the recognition module 306 is used for recognizing whether the analysis text data contains the character behavior action which occurs in the past or not, keeping the analysis text data when the analysis text data does not contain the character behavior action which occurs in the past, and deleting the related data containing the character behavior action which occurs in the past when the analysis text data contains the character behavior action which occurs in the past.
Specifically, for example, in the case where "xiaoming has already eaten" is a verb predicate, but when a general past appears in a sentence, the state of xiaoming in the past is expressed in the semantic relationship and the current action is not performed, and therefore, it is necessary to delete the relevant text data.
In the embodiment of the invention, the text data is subjected to syntactic analysis and part-of-speech tagging through the HanLP algorithm for Chinese natural language processing, and relevant data of the behavior action which is happening is screened out based on the grammatical relation and the modal verb of the subject-predicate guest, so that the accuracy of data extraction is improved, and the noise of the extracted data set is reduced.
Fig. 3 and 4 describe the extraction device of the data related to the human movement in the embodiment of the present invention in detail from the perspective of the modular functional entity, and the extraction device of the data related to the human movement in the embodiment of the present invention is described in detail from the perspective of hardware processing.
Fig. 5 is a schematic structural diagram of a device for extracting human motion related data, where the device 500 for extracting human motion related data may have a relatively large difference due to different configurations or performances, and may include one or more processors (CPUs) 510 (e.g., one or more processors) and a memory 520, and one or more storage media 530 (e.g., one or more mass storage devices) for storing applications 533 or data 532. Memory 520 and storage media 530 may be, among other things, transient or persistent storage. The program stored on the storage medium 530 may include one or more modules (not shown), each of which may include a series of instruction operations in the extraction device 500 for the character motion-related data. Still further, the processor 510 may be configured to communicate with the storage medium 530 and execute a series of instruction operations in the storage medium 530 on the human motion related data extraction device 500.
The human-action-related data extraction device 500 may also include one or more power supplies 540, one or more wired or wireless network interfaces 550, one or more input-output interfaces 560, and/or one or more operating systems 531, such as Windows Server, Mac OS X, Unix, Linux, FreeBSD, and the like. Those skilled in the art will appreciate that the configuration of the extraction device for human motion related data shown in fig. 5 does not constitute a limitation of the extraction device for human motion related data, and may include more or less components than those shown, or some components may be combined, or a different arrangement of components may be used.
The invention also provides a device for extracting data related to human actions, which comprises a memory and a processor, wherein computer readable instructions are stored in the memory, and when being executed by the processor, the computer readable instructions cause the processor to execute the steps of the method for extracting data related to human actions in the embodiments.
The invention also provides a computer readable storage medium, which can be a non-volatile computer readable storage medium, and can also be a volatile computer readable storage medium, wherein the computer readable storage medium has stored therein instructions, which when run on a computer, cause the computer to execute the steps of the method for extracting the data related to the human actions.
Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to the use of the blockchain node, and the like.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a read-only memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1.一种人物动作相关数据的提取方法,其特征在于,所述人物动作相关数据的提取方法包括:1. an extraction method of character action related data, is characterized in that, the extraction method of described character action relevant data comprises: 获取预置的文本数据,所述预置的文本数据为包含人物行为动作的文本数据;Obtaining preset text data, the preset text data is text data containing the behavior and actions of characters; 对所述预置的文本数据进行分类处理,筛选出包含人物信息的文本数据,得到初始文本数据;classifying the preset text data, screening out text data containing personal information, and obtaining initial text data; 基于预置的中文自然语言处理HanLP算法对所述初始文本数据进行分词处理和词性标注,生成中间文本数据;Perform word segmentation and part-of-speech tagging on the initial text data based on the preset Chinese natural language processing HanLP algorithm to generate intermediate text data; 基于所述预置的中文自然语言处理HanLP算法对所述中间文本数据进行依存句法分析和语义依存分析,生成分析文本数据;Performing dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm to generate analysis text data; 对所述分析文本数据进行过滤处理,得到包含多个人物行为动作的目标文本数据。The analysis text data is filtered to obtain target text data including the behaviors and actions of a plurality of characters. 2.根据权利要求1所述的人物动作相关数据的提取方法,其特征在于,所述对所述预置的文本数据进行分类处理,筛选出包含人物信息的文本数据,得到初始文本数据包括:2. The extraction method of character action-related data according to claim 1, wherein the described preset text data is classified and processed, and the text data containing character information is screened out, and obtaining the initial text data comprises: 将所述预置的文本数据按照预置的分类规则进行分类,筛选出包含人物代词或人物姓名的文本数据,生成分类文本数据;Classifying the preset text data according to preset classification rules, filtering out text data including character pronouns or character names, and generating classified text data; 识别所述分类文本数据中的目标标点符号,并根据所述目标标点符号删除包含人物对话的文本数据,生成初始文本数据,所述目标标点符号用于指示人物对话。Identifying target punctuation marks in the classified text data, and deleting text data containing dialogues between characters according to the target punctuation marks, and generating initial text data, the target punctuation marks are used to indicate dialogues between characters. 3.根据权利要求1所述的人物动作相关数据的提取方法,其特征在于,所述基于预置的中文自然语言处理HanLP算法对所述初始文本数据进行分词处理和词性标注,生成中间文本数据包括:3. the extraction method of character action-related data according to claim 1, is characterized in that, described initial text data based on preset Chinese natural language processing HanLP algorithm carries out word segmentation processing and part-of-speech tagging, and generates intermediate text data include: 通过标点符号对所述初始文本数据进行分句处理,得到分句结果;Perform sentence segmentation processing on the initial text data through punctuation to obtain a sentence segmentation result; 基于预置的中文自然语言处理HanLP算法对所述分句结果进行分词处理,得到分词结果;Perform word segmentation processing on the sentence segmentation result based on the preset Chinese natural language processing HanLP algorithm to obtain the word segmentation result; 基于所述预置的中文自然语言处理HanLP算法和预置的HanLP词性标注集对所述分词结果进行词性标注,生成中间文本数据。Based on the preset HanLP algorithm for Chinese natural language processing and the preset HanLP part-of-speech tagging set, part-of-speech tagging is performed on the word segmentation result to generate intermediate text data. 4.根据权利要求1所述的人物动作相关数据的提取方法,其特征在于,所述基于所述预置的中文自然语言处理HanLP算法对所述中间文本数据进行依存句法分析和语义依存分析,生成分析文本数据包括:4. The extraction method of character action-related data according to claim 1, wherein the Chinese natural language processing HanLP algorithm based on the preset carries out dependency syntax analysis and semantic dependency analysis on the intermediate text data, Generating analytical text data includes: 调用所述预置的中文自然语言处理HanLP算法识别并分析所述中间文本数据中语法成分之间的关系,当宾语的核心关系指向谓语动词时,抽取核心主谓宾关系,生成第一分析文本数据;Call the preset Chinese natural language processing HanLP algorithm to identify and analyze the relationship between the grammatical components in the intermediate text data, when the core relationship of the object points to the predicate verb, extract the core subject-predicate-object relationship, and generate the first analysis text data; 调用所述预置的中文自然语言处理HanLP算法分析所述中间文本数据中的语义关联,确定关系类型并筛选出包含施事关系的文本数据,生成第二分析文本数据;Invoke the preset Chinese natural language processing HanLP algorithm to analyze the semantic association in the intermediate text data, determine the relationship type and filter out the text data containing the agency relationship, and generate the second analysis text data; 将所述第一分析文本数据和所述第二分析文本数据进行合并,生成分析文本数据。The first analysis text data and the second analysis text data are combined to generate analysis text data. 5.根据权利要求1所述的人物动作相关数据的提取方法,其特征在于,所述对所述分析文本数据进行过滤处理,生成目标文本数据,所述目标文本数据包括提取到的多个人物行为动作包括:5. The method for extracting character action-related data according to claim 1, wherein the analysis text data is filtered to generate target text data, and the target text data comprises a plurality of extracted characters Actions include: 过滤所述分析文本数据中包含情态动词的文本数据,生成过滤文本数据;Filtering the text data containing modal verbs in the analysis text data to generate filtered text data; 将所述过滤文本数据进行归一化处理,生成包含多个人物行为动作的目标文本数据。The filtered text data is normalized to generate target text data including multiple characters' actions. 6.根据权利要求5所述的人物动作相关数据的提取方法,其特征在于,所述过滤所述分析文本数据中包含情态动词的文本数据,生成过滤文本数据包括:6. The extraction method of character action-related data according to claim 5, wherein the filtering described analysis text data contains text data of modal verbs, and generating the filtering text data comprises: 识别所述分析文本数据中包含情态动词的文本数据,所述情态动词用于指示还未发生的人物行为动作;Identifying text data that contains modal verbs in the analyzed text data, the modal verbs are used to indicate character actions that have not yet occurred; 将所述包含情态动词的文本数据删除,生成过滤文本数据。The text data containing the modal verb is deleted to generate filtered text data. 7.根据权利要求1-6中任一项所述的人物动作相关数据的提取方法,其特征在于,在基于所述预置的中文自然语言处理HanLP算法对所述中间文本数据进行依存句法分析和语义依存分析,生成分析文本数据之后,在对所述分析文本数据进行过滤处理,生成目标文本数据之前,所述方法还包括:7. The method for extracting character action-related data according to any one of claims 1 to 6, wherein the intermediate text data is subjected to dependency syntax analysis based on the preset Chinese natural language processing HanLP algorithm and semantic dependency analysis, after the analysis text data is generated, and before the analysis text data is filtered and the target text data is generated, the method further includes: 识别所述分析文本数据中是否包含过去发生的人物行为动作,当所述分析文本数据中不包含过去发生的人物行为动作时,保留所述分析文本数据,当所述分析文本数据中包含过去发生的人物行为动作时,将包含所述过去发生的人物行为动作的相关数据删除。Identifying whether the analysis text data contains the behaviors and actions of characters that occurred in the past, when the analysis text data does not contain the behaviors and actions of characters that occurred in the past, keep the analysis text data, and when the analysis text data contains the behaviors that occurred in the past When the character behavior and action are mentioned, the relevant data including the character behavior and action that happened in the past will be deleted. 8.一种人物动作相关数据的提取装置,其特征在于,所述人物动作相关数据的提取装置包括:8. A device for extracting data related to character movements, wherein the device for extracting data related to character movements comprises: 获取模块,用于获取预置的文本数据,所述预置的文本数据为包含人物行为动作的小说文本数据;an acquisition module, used for acquiring preset text data, the preset text data being novel text data containing the actions of characters; 分类模块,用于对所述预置的文本数据进行分类处理,筛选出包含人物信息的文本数据,得到初始文本数据;a classification module, configured to classify and process the preset text data, screen out the text data containing the character information, and obtain the initial text data; 分词模块,用于基于预置的中文自然语言处理HanLP算法对所述初始文本数据进行分词处理和词性标注,生成中间文本数据;The word segmentation module is used to perform word segmentation and part-of-speech tagging on the initial text data based on the preset Chinese natural language processing HanLP algorithm to generate intermediate text data; 分析模块,用于基于所述预置的中文自然语言处理HanLP算法对所述中间文本数据进行依存句法分析和语义依存分析,生成分析文本数据;an analysis module, configured to perform dependency syntax analysis and semantic dependency analysis on the intermediate text data based on the preset Chinese natural language processing HanLP algorithm, and generate analysis text data; 过滤模块,用于对所述分析文本数据进行过滤处理,得到包含多个人物行为动作的目标文本数据。The filtering module is used for filtering the analysis text data to obtain target text data including the behaviors and actions of a plurality of characters. 9.一种人物动作相关数据的提取设备,其特征在于,所述人物动作相关数据的提取设备包括:存储器和至少一个处理器,所述存储器中存储有指令;9. A device for extracting data related to character actions, wherein the device for extracting data related to character actions comprises: a memory and at least one processor, wherein instructions are stored in the memory; 所述至少一个处理器调用所述存储器中的所述指令,以使得所述人物动作相关数据的提取设备执行如权利要求1-7中任意一项所述的人物动作相关数据的提取方法。The at least one processor invokes the instructions in the memory, so that the device for extracting data related to character motion executes the method for extracting data related to character motion according to any one of claims 1-7. 10.一种计算机可读存储介质,所述计算机可读存储介质上存储有指令,其特征在于,所述指令被处理器执行时实现如权利要求1-7中任一项所述人物动作相关数据的提取方法。10. A computer-readable storage medium on which instructions are stored, characterized in that, when the instructions are executed by a processor, the relevant actions of the characters according to any one of claims 1-7 are implemented when the instructions are executed. Data extraction method.
CN202011545182.2A 2020-12-23 2020-12-23 Extraction method, device and equipment of figure action related data and storage medium Pending CN112597307A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011545182.2A CN112597307A (en) 2020-12-23 2020-12-23 Extraction method, device and equipment of figure action related data and storage medium
PCT/CN2021/124629 WO2022134779A1 (en) 2020-12-23 2021-10-19 Method, apparatus and device for extracting character action related data, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011545182.2A CN112597307A (en) 2020-12-23 2020-12-23 Extraction method, device and equipment of figure action related data and storage medium

Publications (1)

Publication Number Publication Date
CN112597307A true CN112597307A (en) 2021-04-02

Family

ID=75200609

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011545182.2A Pending CN112597307A (en) 2020-12-23 2020-12-23 Extraction method, device and equipment of figure action related data and storage medium

Country Status (2)

Country Link
CN (1) CN112597307A (en)
WO (1) WO2022134779A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113065332A (en) * 2021-04-22 2021-07-02 深圳壹账通智能科技有限公司 Text processing method, device and equipment based on reading model and storage medium
WO2022134779A1 (en) * 2020-12-23 2022-06-30 深圳壹账通智能科技有限公司 Method, apparatus and device for extracting character action related data, and storage medium

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116627926A (en) * 2023-05-30 2023-08-22 济南浪潮数据技术有限公司 A method, system, device and storage medium for log compression and analysis
CN117609518B (en) * 2024-01-17 2024-04-26 江西科技师范大学 Hierarchical Chinese entity relation extraction method and system for centering structure
CN120449874B (en) * 2025-07-09 2025-09-19 北京达佳互联信息技术有限公司 Motion information processing method, device, terminal, electronic equipment and storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106920200A (en) * 2015-12-28 2017-07-04 中国移动通信集团公司 A kind of information processing method and device
CN108182232A (en) * 2017-12-27 2018-06-19 掌阅科技股份有限公司 Personage's methods of exhibiting, electronic equipment and computer storage media based on e-book
CN110309393A (en) * 2019-03-28 2019-10-08 平安科技(深圳)有限公司 Data processing method, device, equipment and readable storage medium storing program for executing
CN110309513A (en) * 2019-07-09 2019-10-08 北京金山数字娱乐科技有限公司 A kind of method and apparatus of context dependent analysis
CN111126201A (en) * 2019-12-11 2020-05-08 上海众源网络有限公司 Character recognition method and device in script

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9361293B2 (en) * 2013-09-18 2016-06-07 International Business Machines Corporation Using renaming directives to bootstrap industry-specific knowledge and lexical resources
CN110457676B (en) * 2019-06-26 2022-06-21 平安科技(深圳)有限公司 Evaluation information extraction method and device, storage medium and computer equipment
CN111177401A (en) * 2019-12-12 2020-05-19 西安交通大学 A method for extracting knowledge from free text in power grid
CN112597307A (en) * 2020-12-23 2021-04-02 深圳壹账通智能科技有限公司 Extraction method, device and equipment of figure action related data and storage medium

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106920200A (en) * 2015-12-28 2017-07-04 中国移动通信集团公司 A kind of information processing method and device
CN108182232A (en) * 2017-12-27 2018-06-19 掌阅科技股份有限公司 Personage's methods of exhibiting, electronic equipment and computer storage media based on e-book
CN110309393A (en) * 2019-03-28 2019-10-08 平安科技(深圳)有限公司 Data processing method, device, equipment and readable storage medium storing program for executing
CN110309513A (en) * 2019-07-09 2019-10-08 北京金山数字娱乐科技有限公司 A kind of method and apparatus of context dependent analysis
CN111126201A (en) * 2019-12-11 2020-05-08 上海众源网络有限公司 Character recognition method and device in script

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022134779A1 (en) * 2020-12-23 2022-06-30 深圳壹账通智能科技有限公司 Method, apparatus and device for extracting character action related data, and storage medium
CN113065332A (en) * 2021-04-22 2021-07-02 深圳壹账通智能科技有限公司 Text processing method, device and equipment based on reading model and storage medium

Also Published As

Publication number Publication date
WO2022134779A1 (en) 2022-06-30

Similar Documents

Publication Publication Date Title
JP7346609B2 (en) Systems and methods for performing semantic exploration using natural language understanding (NLU) frameworks
US12380277B2 (en) Written-modality prosody subsystem in a natural language understanding (NLU) framework
CN112182252B (en) Intelligent drug question answering method and equipment based on drug knowledge graph
CN112597307A (en) Extraction method, device and equipment of figure action related data and storage medium
Gardent et al. Creating training corpora for nlg micro-planning
KR101498331B1 (en) System for extracting term from document containing text segment
JP6676110B2 (en) Utterance sentence generation apparatus, method and program
WO2017198031A1 (en) Semantic parsing method and apparatus
KR20220064016A (en) Method for extracting construction safety accident based data mining using big data
JP2005165958A (en) Information search system, information search support system, method and program thereof
CN110929520A (en) Non-named entity object extraction method and device, electronic equipment and storage medium
CN112580331A (en) Method and system for establishing knowledge graph of policy text
CN117313695B (en) Text sensitivity detection method, device, electronic device and readable storage medium
Karlsson et al. A process model of morphology and lexicon.
CN119007870A (en) Training method, training device, training electronic device, training storage medium and training computer program product for language understanding model of molecular processing
Azad et al. Picking pearl from seabed: Extracting artefacts from noisy issue triaging collaborative conversations for hybrid cloud services
Alashqar Automatic generation of uml diagrams from scenario-based user requirements
Margan et al. LaNCoA: a python toolkit for language networks construction and analysis
CN112347786B (en) Artificial intelligence scoring training method and device
WO2021221535A1 (en) System and method for augmenting a training set for machine learning algorithms
CN109800430B (en) Semantic understanding method and system
CN113032529B (en) English phrase recognition method, device, medium and electronic equipment
HK40050553A (en) Method, apparatus and device for extracting character action-related data, and storage medium
US11971915B2 (en) Language processor, language processing method and language processing program
JP3691773B2 (en) Sentence analysis method and sentence analysis apparatus capable of using the method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40050553

Country of ref document: HK

SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination