[go: up one dir, main page]

CN114328822B - A contract text intelligent analysis method based on deep data mining - Google Patents

A contract text intelligent analysis method based on deep data mining Download PDF

Info

Publication number
CN114328822B
CN114328822B CN202111485260.9A CN202111485260A CN114328822B CN 114328822 B CN114328822 B CN 114328822B CN 202111485260 A CN202111485260 A CN 202111485260A CN 114328822 B CN114328822 B CN 114328822B
Authority
CN
China
Prior art keywords
contract
word
words
contract text
confusion
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111485260.9A
Other languages
Chinese (zh)
Other versions
CN114328822A (en
Inventor
焦洪林
陆向东
朱坚
王雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fujia Newland Software Engineering Co ltd
Original Assignee
Fujia Newland Software Engineering Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fujia Newland Software Engineering Co ltd filed Critical Fujia Newland Software Engineering Co ltd
Priority to CN202111485260.9A priority Critical patent/CN114328822B/en
Publication of CN114328822A publication Critical patent/CN114328822A/en
Application granted granted Critical
Publication of CN114328822B publication Critical patent/CN114328822B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an intelligent analysis method of contract texts based on deep data mining, which belongs to the technical field of text processing and comprises the steps of S10, obtaining contract texts to be analyzed and historical contract texts to form a contract text set, S20, preprocessing the contract text set, S30, respectively extracting keywords from the contract texts to be analyzed and the historical contract texts based on a multi-feature word weight formula to obtain first keywords and second keywords, S40, searching similar historical contract texts based on the first keywords and the second keywords, S50, searching confusion words of the contract texts to be analyzed based on language model confusion and an N-Gram language model, and matching correct words corresponding to the confusion words, and S60, displaying the first keywords, the similar historical contract texts, the confusion words and the correct words to complete intelligent analysis of the contract texts to be analyzed. The method has the advantage that the quality and the efficiency of contract text analysis are greatly improved.

Description

Contract text intelligent analysis method based on deep data mining
Technical Field
The invention relates to the technical field of text processing, in particular to an intelligent contract text analysis method based on deep data mining.
Background
In recent years, with the development of internet technology, corporate law enforcement officers are faced with a work demand for rapidly analyzing, managing, and writing a large number of contracts in the form of electronic documents in a short time. How to quickly and accurately acquire summary information from various contracts and manage and edit the contracts is a main problem to be solved at present. Contract text has the following characteristics relative to other documents:
1. The topic type is clear, the contract text is usually edited and managed by the departments, the affiliated institutions and the business conditions, and each document basically has the affiliated departments or business type classification.
2. The word use specialization is that the words used in the contract text generally use some special words in the corresponding range according to the departments and topics, rather than using various life and network words like the documents of novels, forums, microblogs and the like.
3. Content normalization, wherein the content of the contract text is generally a statement sentence and does not contain excessive modification and descriptive content, so that errors are mostly word errors and special word errors when writing the contract text, and grammar and semantic errors are rarely involved.
4. The contract text generally has no abstract, and sometimes takes a meeting or report name as a document title, and cannot provide enough document summary information.
Based on the characteristics of the contract text, no corresponding method in the prior art can accurately extract keywords from the contract text, match similar contracts and correct content, so that errors are easy to occur and the efficiency is low when the contract text is analyzed. Therefore, how to provide an intelligent analysis method for contract text based on deep data mining, so as to improve the quality and efficiency of the analysis of the contract text, becomes a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to solve the technical problem of providing an intelligent analysis method for contract text based on deep data mining, which realizes the improvement of the quality and efficiency of the analysis of the contract text.
The invention discloses an intelligent analysis method for contract text based on depth data mining, which comprises the following steps:
step S10, acquiring a contract text to be analyzed and a large number of historical contract texts to form a contract text set;
s20, preprocessing the contract text set;
Step S30, based on a multi-feature word weight formula, extracting keywords from the preprocessed contract text to be analyzed and the preprocessed historical contract text respectively to obtain a plurality of first keywords and a plurality of second keywords;
Step S40, similar historical contract texts are searched based on the first keywords and the second keywords;
S50, searching confusion words of the contract text to be analyzed based on the language model confusion degree and the N-Gram language model, and matching correct words corresponding to the confusion words;
and step S60, displaying the first keyword of the contract text to be analyzed, similar historical contract text, confusion words and correct words, and completing intelligent analysis of the contract text to be analyzed.
Further, in the step S10, the obtaining a large amount of historical contract text specifically includes:
setting a time span, and acquiring historical contract texts of a large number of different departments in different areas based on the time span.
Further, the step S20 specifically includes:
S21, searching for repeated contract texts in the contract text set based on the contract titles, and merging repeated contract texts;
s22, eliminating noise data of each contract text in the contract text set;
Step S23, a sensitive word stock is created, and sensitive words of all contract texts in the contract text set are filtered based on the sensitive word stock.
Further, in the step S22, the noise data includes at least URL address, special symbol, expression, picture, and zero-width character.
Further, in the step S30, the multi-feature term weight formula specifically includes:
WNEW-TF-IDF=WTF-IDF×Wword;
WTF-IDF=TF(i)×IDF(i);
Wword=αWl+βWc+γWlen;
wherein W NEW-TF-IDF represents multi-feature word weight, W TF-IDF represents weighted feature weight, W word represents word weight comprising position weight, part-of-speech weight and word length weight, TF (i) represents word frequency of the ith word, IDF (i) represents inverse document frequency of the ith word, namely, the smaller the number of contracted texts comprising the ith word is, the larger the value is, N i represents the number of times the ith word appears, N represents the total number of all keywords, N represents the total number of contracted texts, df (i) represents the number of documents in which the ith word appears, alpha, beta and gamma all represent weight coefficients, W l represents position weight, W c represents part-of-speech weight, W len represents word length weight, i len represents word length of the ith word, avg (len) represents average price word length.
Further, the step S40 specifically includes:
step S41, splicing the first keywords and the corresponding contract titles to obtain first key information, and splicing the second keywords and the corresponding contract titles to obtain second key information;
step S42, inputting the first key information and the second key information into a BERT model and an average pooling layer in sequence for feature extraction to obtain a first feature sentence vector and a second feature sentence vector;
And step S43, sequentially calculating cosine similarity of the first feature sentence vector and each second feature sentence vector, and matching similar historical contract texts based on the cosine similarity.
Further, in the step S43, the calculation formula of the cosine similarity is:
Where sim represents cosine similarity, x i represents a first feature sentence vector, y i represents a second feature sentence vector, and m represents the total number of feature sentence vectors.
Further, the step S50 specifically includes:
Step S51, setting a likelihood threshold value and creating an confusion word set, wherein the confusion word set comprises one-to-one correspondence between a plurality of confusion words and correct words;
Step S52, calculating likelihood estimation values of all the sentences in the contract text to be analyzed in sequence based on the confusion degree of the language model, judging whether the likelihood estimation values are lower than a likelihood threshold value, if so, indicating that suspected confusion words exist, and entering step S53;
S53, sorting sentences with suspected confusion words through an N-Gram language model, and selecting words with highest scores as confusion words based on sorting results;
And step S54, matching correct words corresponding to the confusion words by using the confusion word set.
The invention has the advantages that:
The method comprises the steps of constructing multi-feature word weights by combining the position weights, the part-of-speech weights and the word length weights of words on the basis of traditional weighted feature weights, extracting keywords based on the multi-feature word weights, fully considering the position, the part-of-speech and the word length characteristics of the words, greatly improving the accuracy of keyword extraction, searching similar historical contract texts through keyword matching, greatly improving the searching efficiency relative to full text searching, searching confusion words through language model confusion and N-Gram language models, and matching correct words corresponding to the confusion words based on the established confusion word sets, so that content error correction of contract texts to be analyzed is realized, and compared with traditional manual analysis, the quality and the efficiency of contract text analysis are greatly improved.
Drawings
The invention will be further described with reference to examples of embodiments with reference to the accompanying drawings.
FIG. 1 is a flow chart of a method for intelligent analysis of contract text based on deep data mining of the present invention.
Detailed Description
The technical scheme of the embodiment of the application has the general idea that the multi-feature word weight is built by combining the position weight, the part-of-speech weight and the word length weight of the words to extract the keywords so as to improve the accuracy of extracting the keywords, similar historical contract texts are searched through keyword matching so as to improve the searching efficiency, confusion words are searched through a language model confusion degree and an N-Gram language model, and correct words corresponding to the confusion words are matched based on the built confusion word set so as to realize content error correction, thereby improving the quality and the efficiency of contract text analysis.
Referring to fig. 1, a preferred embodiment of the intelligent analysis method for contract text based on depth data mining according to the present invention includes the following steps:
step S10, acquiring a contract text to be analyzed and a large number of historical contract texts to form a contract text set;
step S20, preprocessing the contract text set, namely removing some invalid data to improve text processing efficiency;
Step S30, based on a multi-feature word weight formula, extracting keywords from the preprocessed contract text to be analyzed and the preprocessed historical contract text respectively to obtain a plurality of first keywords and a plurality of second keywords; because the length of the contract text is often longer, if browsing takes a long time throughout, the important information of the contract text can be conveniently and quickly obtained by the staff through extracting the keywords;
Step S40, similar historical contract texts are searched based on the first keywords and the second keywords, so that some reference information can be conveniently obtained from the similar historical contract texts;
Step S50, searching confusion words of the contract text to be analyzed based on a language model confusion degree (PPL) and an N-Gram language model, and matching correct words corresponding to the confusion words;
And step S60, displaying the first keyword of the contract text to be analyzed, the similar historical contract text, the confusion word and the correct word, completing intelligent analysis of the contract text to be analyzed, and automatically replacing the corresponding confusion word by using the correct word.
In the step S10, the obtaining a large number of historical contract texts specifically includes:
Setting a time span, and acquiring historical contract texts of a large number of different departments in different areas based on the time span so as to improve the richness of the sample.
The step S20 specifically includes:
S21, searching for repeated contract texts in the contract text set based on the contract titles, and merging repeated contract texts;
s22, eliminating noise data of each contract text in the contract text set;
Step S23, a sensitive word stock is created, and sensitive words of all contract texts in the contract text set are filtered based on the sensitive word stock.
In the step S22, the noise data at least includes URL address, special symbol, expression, picture, and zero-width character.
In the step S30, the multi-feature term weight formula specifically includes:
WNEW-TF-IDF=WTF-IDF×Wword;
WTF-IDF=TF(i)×IDF(i);
Wword=αWl+βWc+γWlen;
Wherein W NEW-TF-IDF represents multi-feature word weight, W TF-IDF represents weighted feature weight, W word represents word weight comprising position weight, part-of-speech weight and word length weight, TF (i) represents word frequency of the ith word, IDF (i) represents inverse document frequency of the ith word, namely, the smaller the number of contracted texts comprising the ith word is, the larger the value is, which indicates that the ith word has good type distinguishing effect, N i represents the number of times the ith word appears, N represents the total number of all keywords, N represents the total number of contracted texts, df (i) represents the number of documents in which the ith word appears, alpha, beta and gamma represent weight coefficients, the values are preferably 0.6, 0.3 and 0.1 respectively, W l represents position weight, W c represents part-of-speech weight, W len represents word length weight, i len represents word length of the ith word, avg (len) represents average price length.
The TF-IDF algorithm shows that the characteristic words with high enough occurrence frequency in the text and low enough occurrence frequency in other texts of the whole text set are keywords of the text, but the structure of the TF-IDF algorithm is too simple to effectively reflect the importance of words and the position distribution of the characteristic words, and the weight of the words cannot be effectively adjusted, so that the accuracy of the TF-IDF algorithm is not high, the TF-IDF algorithm does not reflect the importance of the positions, parts of speech and word lengths of the words, the information reflected by the contents of different structures is different for a contract, the weight of the contract title is distributed according to different structural characteristics, namely, the weight of the contract title is distributed according to different structural characteristics, and therefore, the invention combines the characteristics of sample data to improve the traditional TF-IDF algorithm, endows different coefficients to the characteristic words with different positions, parts of speech and word lengths in the contract, and multiplies the characteristic words by the TF-IDF values of the characteristic words to enhance the text expression effect.
Since the title of the text of the contract can generally summarize the main content of the contract, the probability that the words appearing in the title become keywords is higher, and the words appearing in the beginning or ending may reflect the hidden keywords or related keywords of the contract and should be properly paid attention, the position weight of the title of the contract is adjusted to be highest, the position weight of the beginning or ending is secondary, and the position weights of other positions are smallest.
The part of speech in the Chinese is divided into two types, namely real word and imaginary word, the real word comprises nouns, verbs, adjectives, pronouns, numerical words, measuring words and the like, the imaginary word comprises prepositions, conjunctions, exclamation, auxiliary words and the like, and the part of speech of the key word is usually mainly nouns or noun phrases, and then verbs, adverbs and other modifier words.
The too short keywords can not embody the containing information, the too long keywords and the more containing information are, the keyword can be segmented again, the word length of the segmented contract text is found after the word length is counted, the word length of the keywords is generally between [2 and 7], and the too long and too short word length is needed to be filtered.
The step S40 specifically includes:
Step S41, splicing the first keywords and the corresponding contract titles to obtain first key information, and splicing the second keywords and the corresponding contract titles to obtain second key information;
step S42, inputting the first key information and the second key information into a BERT model and an average pooling layer in sequence for feature extraction to obtain a first feature sentence vector and a second feature sentence vector;
And step S43, sequentially calculating cosine similarity of the first feature sentence vector and each second feature sentence vector, and matching similar historical contract texts based on the cosine similarity.
In the step S43, the calculation formula of the cosine similarity is as follows:
The method comprises the steps of obtaining a similarity of cosine, wherein sim represents the similarity of cosine, x i represents a first feature sentence vector, y i represents a second feature sentence vector, m represents the total number of feature sentence vectors, and the larger the sim value is, the smaller the included angle between the two feature sentence vectors is, the higher the similarity is, and finally, the history contract text with the highest similarity is returned.
The similar contract text search can be used for matching similar historical contract texts for the currently written or managed contract texts and providing related references and references for related personnel, the semantic search of the similar contract texts is actually used for judging the semantic similarity between the original texts and the target texts, the traditional semantic matching is biased to vocabulary semantic, form matching and syntactic similarity, text features which are well defined in advance are required to be extracted, a similarity detection algorithm is written to obtain the similarity between the texts, and a neural network-based method is used for considering how to distinguish semantic differences between two texts and how to construct the relevance between the two texts when a model is constructed. Because the text of the contract has longer space, if the feature vector comparison is carried out based on the full text, the extracted feature vector cannot well represent the key information of the contract, and the finally retrieved similar result has larger difference from the actual result.
The step S50 specifically includes:
Step S51, setting a likelihood threshold value, and creating an confusion word set, wherein the confusion word set comprises one-to-one correspondence between a plurality of confusion words and correct words, and can be updated as required, so that the expansibility is strong;
Step S52, calculating likelihood estimation values of all the sentences in the contract text to be analyzed in sequence based on the confusion degree (PPL) of the language model, judging whether the likelihood estimation values are lower than a likelihood threshold value, if so, indicating that suspected confusion words exist, and entering into step S53;
The language model confusion is the multiplicative inverse of the language model allocation probability, and the formula is:
Wherein S represents the input text, N represents the sentence length, P (W i) represents the probability of the ith word;
S53, sorting sentences with suspected confusion words through an N-Gram language model, and selecting words with highest scores as confusion words based on sorting results;
the correctness is judged by means of a statistical and probabilistic N-Gram language model based on score prediction of a text, an ordered word sequence containing N words is needed when the method is applied, a binary model Bi-Gram (N-2) is needed if the existence of a certain word depends on only one word in front of the word, a ternary model Tri-Gram (N-3) is needed if the existence of a certain word depends on two words in front of the word, and the method is similar. Assuming that a sentence s in the contracted text is composed of a series of words q 1,q2,…,qn with specific sequences, according to the chain rule, the probability of occurrence of the sentence s is:
The N-Gram language model assumes that the sum of the probabilities of occurrence of any word is related to the N-1 words in front of it, namely:
when modeling is performed by using the ternary model, the i-th word is related to the first 2 words, namely:
And step S54, matching correct words corresponding to the confusion words by using the confusion word set.
In summary, the invention has the advantages that:
The method comprises the steps of constructing multi-feature word weights by combining the position weights, the part-of-speech weights and the word length weights of words on the basis of traditional weighted feature weights, extracting keywords based on the multi-feature word weights, fully considering the position, the part-of-speech and the word length characteristics of the words, greatly improving the accuracy of keyword extraction, searching similar historical contract texts through keyword matching, greatly improving the searching efficiency relative to full text searching, searching confusion words through language model confusion and N-Gram language models, and matching correct words corresponding to the confusion words based on the established confusion word sets, so that content error correction of contract texts to be analyzed is realized, and compared with traditional manual analysis, the quality and the efficiency of contract text analysis are greatly improved.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the invention, and that equivalent modifications and variations of the invention in light of the spirit of the invention will be covered by the claims of the present invention.

Claims (8)

1. A contract text intelligent analysis method based on depth data mining is characterized by comprising the following steps:
step S10, acquiring a contract text to be analyzed and a large number of historical contract texts to form a contract text set;
s20, preprocessing the contract text set;
Step S30, based on a multi-feature word weight formula, extracting keywords from the preprocessed contract text to be analyzed and the preprocessed historical contract text respectively to obtain a plurality of first keywords and a plurality of second keywords;
Step S40, similar historical contract texts are searched based on the first keywords and the second keywords;
S50, searching confusion words of the contract text to be analyzed based on the language model confusion degree and the N-Gram language model, and matching correct words corresponding to the confusion words;
and step S60, displaying the first keyword of the contract text to be analyzed, similar historical contract text, confusion words and correct words, and completing intelligent analysis of the contract text to be analyzed.
2. The method for intelligent analysis of contract text based on deep data mining according to claim 1, wherein in the step S10, the step of obtaining a large amount of historical contract text is specifically as follows:
setting a time span, and acquiring historical contract texts of a large number of different departments in different areas based on the time span.
3. The method for intelligent analysis of contract text based on deep data mining according to claim 1, wherein said step S20 comprises the following steps:
S21, searching for repeated contract texts in the contract text set based on the contract titles, and merging repeated contract texts;
s22, eliminating noise data of each contract text in the contract text set;
Step S23, a sensitive word stock is created, and sensitive words of all contract texts in the contract text set are filtered based on the sensitive word stock.
4. The method for intelligent analysis of contract text based on depth data mining according to claim 3, wherein in step S22, the noise data includes at least URL address, special symbol, expression, picture and zero-width character.
5. The method for intelligent analysis of contract text based on deep data mining according to claim 1, wherein in the step S30, the multi-feature word weight formula is specifically:
WNEW-TF-IDF=WTF-IDF×Wword;
WTF-IDF=TF(i)×IDF(i);
Wword=αWl+βWc+γWlen;
wherein W NEW-TF-IDF represents multi-feature word weight, W TF-IDF represents weighted feature weight, W word represents word weight comprising position weight, part-of-speech weight and word length weight, TF (i) represents word frequency of the ith word, IDF (i) represents inverse document frequency of the ith word, namely, the smaller the number of contracted texts comprising the ith word is, the larger the value is, N i represents the number of times the ith word appears, N represents the total number of all keywords, N represents the total number of contracted texts, df (i) represents the number of documents in which the ith word appears, alpha, beta and gamma all represent weight coefficients, W l represents position weight, W c represents part-of-speech weight, W len represents word length weight, i len represents word length of the ith word, avg (len) represents average price word length.
6. The method for intelligent analysis of contract text based on deep data mining according to claim 1, wherein the step S40 specifically comprises the following steps:
step S41, splicing the first keywords and the corresponding contract titles to obtain first key information, and splicing the second keywords and the corresponding contract titles to obtain second key information;
step S42, inputting the first key information and the second key information into a BERT model and an average pooling layer in sequence for feature extraction to obtain a first feature sentence vector and a second feature sentence vector;
And step S43, sequentially calculating cosine similarity of the first feature sentence vector and each second feature sentence vector, and matching similar historical contract texts based on the cosine similarity.
7. The intelligent analysis method of contract text based on depth data mining according to claim 6, wherein in the step S43, the cosine similarity calculation formula is:
Where sim represents cosine similarity, x i represents a first feature sentence vector, y i represents a second feature sentence vector, and m represents the total number of feature sentence vectors.
8. The method for intelligent analysis of contract text based on deep data mining according to claim 1, wherein said step S50 comprises the following steps:
Step S51, setting a likelihood threshold value and creating an confusion word set, wherein the confusion word set comprises one-to-one correspondence between a plurality of confusion words and correct words;
Step S52, calculating likelihood estimation values of all the sentences in the contract text to be analyzed in sequence based on the confusion degree of the language model, judging whether the likelihood estimation values are lower than a likelihood threshold value, if so, indicating that suspected confusion words exist, and entering step S53;
S53, sorting sentences with suspected confusion words through an N-Gram language model, and selecting words with highest scores as confusion words based on sorting results;
And step S54, matching correct words corresponding to the confusion words by using the confusion word set.
CN202111485260.9A 2021-12-07 2021-12-07 A contract text intelligent analysis method based on deep data mining Active CN114328822B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111485260.9A CN114328822B (en) 2021-12-07 2021-12-07 A contract text intelligent analysis method based on deep data mining

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111485260.9A CN114328822B (en) 2021-12-07 2021-12-07 A contract text intelligent analysis method based on deep data mining

Publications (2)

Publication Number Publication Date
CN114328822A CN114328822A (en) 2022-04-12
CN114328822B true CN114328822B (en) 2025-04-04

Family

ID=81049667

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111485260.9A Active CN114328822B (en) 2021-12-07 2021-12-07 A contract text intelligent analysis method based on deep data mining

Country Status (1)

Country Link
CN (1) CN114328822B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116955536A (en) * 2023-07-26 2023-10-27 北京长城电子商务有限公司 Contract-based automatic matching rule algorithm
CN118520862B (en) * 2024-07-17 2024-11-12 沈阳慧筑云科技有限公司 A method for intelligently generating contract templates based on user contract habits
CN118607522B (en) * 2024-08-08 2024-10-18 沈阳慧筑云科技有限公司 A personalized user behavior prompting method based on big data and big language model

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134952A (en) * 2019-04-29 2019-08-16 华南师范大学 A method, device and storage medium for rejecting wrong text
CN110765765A (en) * 2019-09-16 2020-02-07 平安科技(深圳)有限公司 Contract key clause extraction method and device based on artificial intelligence and storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5203324B2 (en) * 2009-09-16 2013-06-05 日本電信電話株式会社 Text analysis apparatus, method and program for typographical error
CN108334533B (en) * 2017-10-20 2021-12-24 腾讯科技(深圳)有限公司 Keyword extraction method and device, storage medium and electronic device
CN111241814B (en) * 2019-12-31 2023-04-28 中移(杭州)信息技术有限公司 Error correction method, device, electronic equipment and storage medium for speech recognition text

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110134952A (en) * 2019-04-29 2019-08-16 华南师范大学 A method, device and storage medium for rejecting wrong text
CN110765765A (en) * 2019-09-16 2020-02-07 平安科技(深圳)有限公司 Contract key clause extraction method and device based on artificial intelligence and storage medium

Also Published As

Publication number Publication date
CN114328822A (en) 2022-04-12

Similar Documents

Publication Publication Date Title
CN110442760B (en) A synonym mining method and device for question answering retrieval system
US12135939B2 (en) Systems and methods for deviation detection, information extraction and obligation deviation detection
US9971974B2 (en) Methods and systems for knowledge discovery
CN108304375B (en) Information identification method and equipment, storage medium and terminal thereof
CN114328822B (en) A contract text intelligent analysis method based on deep data mining
US7376634B2 (en) Method and apparatus for implementing Q&A function and computer-aided authoring
US9792277B2 (en) System and method for determining the meaning of a document with respect to a concept
US9183274B1 (en) System, methods, and data structure for representing object and properties associations
US8370129B2 (en) System and methods for quantitative assessment of information in natural language contents
CN104794169B (en) A kind of subject terminology extraction method and system based on sequence labelling model
US20050080613A1 (en) System and method for processing text utilizing a suite of disambiguation techniques
CN112632969B (en) Incremental industry dictionary updating method and system
CN108124477A (en) Segmenter is improved based on pseudo- data to handle natural language
JP2005526317A (en) Method and system for automatically searching a concept hierarchy from a document corpus
CN113886604A (en) Job knowledge map generation method and system
CN113515939B (en) System and method for extracting key information of investigation report text
CN111767733A (en) A document classification method based on statistical word segmentation
US20140089246A1 (en) Methods and systems for knowledge discovery
CN118797005A (en) Intelligent question-answering method, device, electronic device, storage medium and product
CN113076740A (en) Synonym mining method and device in government affair service field
Hirpassa Information extraction system for Amharic text
CN113392189B (en) News text processing method based on automatic word segmentation
Lazemi et al. Persian plagirisim detection using CNN s
Chakraborty et al. N-Gram based Assamese Question Pattern Extraction and Probabilistic Modelling
CN112559768B (en) Short text mapping and recommendation method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant