CN114328822B - A contract text intelligent analysis method based on deep data mining - Google Patents
A contract text intelligent analysis method based on deep data mining Download PDFInfo
- Publication number
- CN114328822B CN114328822B CN202111485260.9A CN202111485260A CN114328822B CN 114328822 B CN114328822 B CN 114328822B CN 202111485260 A CN202111485260 A CN 202111485260A CN 114328822 B CN114328822 B CN 114328822B
- Authority
- CN
- China
- Prior art keywords
- contract
- word
- words
- contract text
- confusion
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides an intelligent analysis method of contract texts based on deep data mining, which belongs to the technical field of text processing and comprises the steps of S10, obtaining contract texts to be analyzed and historical contract texts to form a contract text set, S20, preprocessing the contract text set, S30, respectively extracting keywords from the contract texts to be analyzed and the historical contract texts based on a multi-feature word weight formula to obtain first keywords and second keywords, S40, searching similar historical contract texts based on the first keywords and the second keywords, S50, searching confusion words of the contract texts to be analyzed based on language model confusion and an N-Gram language model, and matching correct words corresponding to the confusion words, and S60, displaying the first keywords, the similar historical contract texts, the confusion words and the correct words to complete intelligent analysis of the contract texts to be analyzed. The method has the advantage that the quality and the efficiency of contract text analysis are greatly improved.
Description
Technical Field
The invention relates to the technical field of text processing, in particular to an intelligent contract text analysis method based on deep data mining.
Background
In recent years, with the development of internet technology, corporate law enforcement officers are faced with a work demand for rapidly analyzing, managing, and writing a large number of contracts in the form of electronic documents in a short time. How to quickly and accurately acquire summary information from various contracts and manage and edit the contracts is a main problem to be solved at present. Contract text has the following characteristics relative to other documents:
1. The topic type is clear, the contract text is usually edited and managed by the departments, the affiliated institutions and the business conditions, and each document basically has the affiliated departments or business type classification.
2. The word use specialization is that the words used in the contract text generally use some special words in the corresponding range according to the departments and topics, rather than using various life and network words like the documents of novels, forums, microblogs and the like.
3. Content normalization, wherein the content of the contract text is generally a statement sentence and does not contain excessive modification and descriptive content, so that errors are mostly word errors and special word errors when writing the contract text, and grammar and semantic errors are rarely involved.
4. The contract text generally has no abstract, and sometimes takes a meeting or report name as a document title, and cannot provide enough document summary information.
Based on the characteristics of the contract text, no corresponding method in the prior art can accurately extract keywords from the contract text, match similar contracts and correct content, so that errors are easy to occur and the efficiency is low when the contract text is analyzed. Therefore, how to provide an intelligent analysis method for contract text based on deep data mining, so as to improve the quality and efficiency of the analysis of the contract text, becomes a technical problem to be solved urgently.
Disclosure of Invention
The invention aims to solve the technical problem of providing an intelligent analysis method for contract text based on deep data mining, which realizes the improvement of the quality and efficiency of the analysis of the contract text.
The invention discloses an intelligent analysis method for contract text based on depth data mining, which comprises the following steps:
step S10, acquiring a contract text to be analyzed and a large number of historical contract texts to form a contract text set;
s20, preprocessing the contract text set;
Step S30, based on a multi-feature word weight formula, extracting keywords from the preprocessed contract text to be analyzed and the preprocessed historical contract text respectively to obtain a plurality of first keywords and a plurality of second keywords;
Step S40, similar historical contract texts are searched based on the first keywords and the second keywords;
S50, searching confusion words of the contract text to be analyzed based on the language model confusion degree and the N-Gram language model, and matching correct words corresponding to the confusion words;
and step S60, displaying the first keyword of the contract text to be analyzed, similar historical contract text, confusion words and correct words, and completing intelligent analysis of the contract text to be analyzed.
Further, in the step S10, the obtaining a large amount of historical contract text specifically includes:
setting a time span, and acquiring historical contract texts of a large number of different departments in different areas based on the time span.
Further, the step S20 specifically includes:
S21, searching for repeated contract texts in the contract text set based on the contract titles, and merging repeated contract texts;
s22, eliminating noise data of each contract text in the contract text set;
Step S23, a sensitive word stock is created, and sensitive words of all contract texts in the contract text set are filtered based on the sensitive word stock.
Further, in the step S22, the noise data includes at least URL address, special symbol, expression, picture, and zero-width character.
Further, in the step S30, the multi-feature term weight formula specifically includes:
WNEW-TF-IDF=WTF-IDF×Wword;
WTF-IDF=TF(i)×IDF(i);
Wword=αWl+βWc+γWlen;
wherein W NEW-TF-IDF represents multi-feature word weight, W TF-IDF represents weighted feature weight, W word represents word weight comprising position weight, part-of-speech weight and word length weight, TF (i) represents word frequency of the ith word, IDF (i) represents inverse document frequency of the ith word, namely, the smaller the number of contracted texts comprising the ith word is, the larger the value is, N i represents the number of times the ith word appears, N represents the total number of all keywords, N represents the total number of contracted texts, df (i) represents the number of documents in which the ith word appears, alpha, beta and gamma all represent weight coefficients, W l represents position weight, W c represents part-of-speech weight, W len represents word length weight, i len represents word length of the ith word, avg (len) represents average price word length.
Further, the step S40 specifically includes:
step S41, splicing the first keywords and the corresponding contract titles to obtain first key information, and splicing the second keywords and the corresponding contract titles to obtain second key information;
step S42, inputting the first key information and the second key information into a BERT model and an average pooling layer in sequence for feature extraction to obtain a first feature sentence vector and a second feature sentence vector;
And step S43, sequentially calculating cosine similarity of the first feature sentence vector and each second feature sentence vector, and matching similar historical contract texts based on the cosine similarity.
Further, in the step S43, the calculation formula of the cosine similarity is:
Where sim represents cosine similarity, x i represents a first feature sentence vector, y i represents a second feature sentence vector, and m represents the total number of feature sentence vectors.
Further, the step S50 specifically includes:
Step S51, setting a likelihood threshold value and creating an confusion word set, wherein the confusion word set comprises one-to-one correspondence between a plurality of confusion words and correct words;
Step S52, calculating likelihood estimation values of all the sentences in the contract text to be analyzed in sequence based on the confusion degree of the language model, judging whether the likelihood estimation values are lower than a likelihood threshold value, if so, indicating that suspected confusion words exist, and entering step S53;
S53, sorting sentences with suspected confusion words through an N-Gram language model, and selecting words with highest scores as confusion words based on sorting results;
And step S54, matching correct words corresponding to the confusion words by using the confusion word set.
The invention has the advantages that:
The method comprises the steps of constructing multi-feature word weights by combining the position weights, the part-of-speech weights and the word length weights of words on the basis of traditional weighted feature weights, extracting keywords based on the multi-feature word weights, fully considering the position, the part-of-speech and the word length characteristics of the words, greatly improving the accuracy of keyword extraction, searching similar historical contract texts through keyword matching, greatly improving the searching efficiency relative to full text searching, searching confusion words through language model confusion and N-Gram language models, and matching correct words corresponding to the confusion words based on the established confusion word sets, so that content error correction of contract texts to be analyzed is realized, and compared with traditional manual analysis, the quality and the efficiency of contract text analysis are greatly improved.
Drawings
The invention will be further described with reference to examples of embodiments with reference to the accompanying drawings.
FIG. 1 is a flow chart of a method for intelligent analysis of contract text based on deep data mining of the present invention.
Detailed Description
The technical scheme of the embodiment of the application has the general idea that the multi-feature word weight is built by combining the position weight, the part-of-speech weight and the word length weight of the words to extract the keywords so as to improve the accuracy of extracting the keywords, similar historical contract texts are searched through keyword matching so as to improve the searching efficiency, confusion words are searched through a language model confusion degree and an N-Gram language model, and correct words corresponding to the confusion words are matched based on the built confusion word set so as to realize content error correction, thereby improving the quality and the efficiency of contract text analysis.
Referring to fig. 1, a preferred embodiment of the intelligent analysis method for contract text based on depth data mining according to the present invention includes the following steps:
step S10, acquiring a contract text to be analyzed and a large number of historical contract texts to form a contract text set;
step S20, preprocessing the contract text set, namely removing some invalid data to improve text processing efficiency;
Step S30, based on a multi-feature word weight formula, extracting keywords from the preprocessed contract text to be analyzed and the preprocessed historical contract text respectively to obtain a plurality of first keywords and a plurality of second keywords; because the length of the contract text is often longer, if browsing takes a long time throughout, the important information of the contract text can be conveniently and quickly obtained by the staff through extracting the keywords;
Step S40, similar historical contract texts are searched based on the first keywords and the second keywords, so that some reference information can be conveniently obtained from the similar historical contract texts;
Step S50, searching confusion words of the contract text to be analyzed based on a language model confusion degree (PPL) and an N-Gram language model, and matching correct words corresponding to the confusion words;
And step S60, displaying the first keyword of the contract text to be analyzed, the similar historical contract text, the confusion word and the correct word, completing intelligent analysis of the contract text to be analyzed, and automatically replacing the corresponding confusion word by using the correct word.
In the step S10, the obtaining a large number of historical contract texts specifically includes:
Setting a time span, and acquiring historical contract texts of a large number of different departments in different areas based on the time span so as to improve the richness of the sample.
The step S20 specifically includes:
S21, searching for repeated contract texts in the contract text set based on the contract titles, and merging repeated contract texts;
s22, eliminating noise data of each contract text in the contract text set;
Step S23, a sensitive word stock is created, and sensitive words of all contract texts in the contract text set are filtered based on the sensitive word stock.
In the step S22, the noise data at least includes URL address, special symbol, expression, picture, and zero-width character.
In the step S30, the multi-feature term weight formula specifically includes:
WNEW-TF-IDF=WTF-IDF×Wword;
WTF-IDF=TF(i)×IDF(i);
Wword=αWl+βWc+γWlen;
Wherein W NEW-TF-IDF represents multi-feature word weight, W TF-IDF represents weighted feature weight, W word represents word weight comprising position weight, part-of-speech weight and word length weight, TF (i) represents word frequency of the ith word, IDF (i) represents inverse document frequency of the ith word, namely, the smaller the number of contracted texts comprising the ith word is, the larger the value is, which indicates that the ith word has good type distinguishing effect, N i represents the number of times the ith word appears, N represents the total number of all keywords, N represents the total number of contracted texts, df (i) represents the number of documents in which the ith word appears, alpha, beta and gamma represent weight coefficients, the values are preferably 0.6, 0.3 and 0.1 respectively, W l represents position weight, W c represents part-of-speech weight, W len represents word length weight, i len represents word length of the ith word, avg (len) represents average price length.
The TF-IDF algorithm shows that the characteristic words with high enough occurrence frequency in the text and low enough occurrence frequency in other texts of the whole text set are keywords of the text, but the structure of the TF-IDF algorithm is too simple to effectively reflect the importance of words and the position distribution of the characteristic words, and the weight of the words cannot be effectively adjusted, so that the accuracy of the TF-IDF algorithm is not high, the TF-IDF algorithm does not reflect the importance of the positions, parts of speech and word lengths of the words, the information reflected by the contents of different structures is different for a contract, the weight of the contract title is distributed according to different structural characteristics, namely, the weight of the contract title is distributed according to different structural characteristics, and therefore, the invention combines the characteristics of sample data to improve the traditional TF-IDF algorithm, endows different coefficients to the characteristic words with different positions, parts of speech and word lengths in the contract, and multiplies the characteristic words by the TF-IDF values of the characteristic words to enhance the text expression effect.
Since the title of the text of the contract can generally summarize the main content of the contract, the probability that the words appearing in the title become keywords is higher, and the words appearing in the beginning or ending may reflect the hidden keywords or related keywords of the contract and should be properly paid attention, the position weight of the title of the contract is adjusted to be highest, the position weight of the beginning or ending is secondary, and the position weights of other positions are smallest.
The part of speech in the Chinese is divided into two types, namely real word and imaginary word, the real word comprises nouns, verbs, adjectives, pronouns, numerical words, measuring words and the like, the imaginary word comprises prepositions, conjunctions, exclamation, auxiliary words and the like, and the part of speech of the key word is usually mainly nouns or noun phrases, and then verbs, adverbs and other modifier words.
The too short keywords can not embody the containing information, the too long keywords and the more containing information are, the keyword can be segmented again, the word length of the segmented contract text is found after the word length is counted, the word length of the keywords is generally between [2 and 7], and the too long and too short word length is needed to be filtered.
The step S40 specifically includes:
Step S41, splicing the first keywords and the corresponding contract titles to obtain first key information, and splicing the second keywords and the corresponding contract titles to obtain second key information;
step S42, inputting the first key information and the second key information into a BERT model and an average pooling layer in sequence for feature extraction to obtain a first feature sentence vector and a second feature sentence vector;
And step S43, sequentially calculating cosine similarity of the first feature sentence vector and each second feature sentence vector, and matching similar historical contract texts based on the cosine similarity.
In the step S43, the calculation formula of the cosine similarity is as follows:
The method comprises the steps of obtaining a similarity of cosine, wherein sim represents the similarity of cosine, x i represents a first feature sentence vector, y i represents a second feature sentence vector, m represents the total number of feature sentence vectors, and the larger the sim value is, the smaller the included angle between the two feature sentence vectors is, the higher the similarity is, and finally, the history contract text with the highest similarity is returned.
The similar contract text search can be used for matching similar historical contract texts for the currently written or managed contract texts and providing related references and references for related personnel, the semantic search of the similar contract texts is actually used for judging the semantic similarity between the original texts and the target texts, the traditional semantic matching is biased to vocabulary semantic, form matching and syntactic similarity, text features which are well defined in advance are required to be extracted, a similarity detection algorithm is written to obtain the similarity between the texts, and a neural network-based method is used for considering how to distinguish semantic differences between two texts and how to construct the relevance between the two texts when a model is constructed. Because the text of the contract has longer space, if the feature vector comparison is carried out based on the full text, the extracted feature vector cannot well represent the key information of the contract, and the finally retrieved similar result has larger difference from the actual result.
The step S50 specifically includes:
Step S51, setting a likelihood threshold value, and creating an confusion word set, wherein the confusion word set comprises one-to-one correspondence between a plurality of confusion words and correct words, and can be updated as required, so that the expansibility is strong;
Step S52, calculating likelihood estimation values of all the sentences in the contract text to be analyzed in sequence based on the confusion degree (PPL) of the language model, judging whether the likelihood estimation values are lower than a likelihood threshold value, if so, indicating that suspected confusion words exist, and entering into step S53;
The language model confusion is the multiplicative inverse of the language model allocation probability, and the formula is:
Wherein S represents the input text, N represents the sentence length, P (W i) represents the probability of the ith word;
S53, sorting sentences with suspected confusion words through an N-Gram language model, and selecting words with highest scores as confusion words based on sorting results;
the correctness is judged by means of a statistical and probabilistic N-Gram language model based on score prediction of a text, an ordered word sequence containing N words is needed when the method is applied, a binary model Bi-Gram (N-2) is needed if the existence of a certain word depends on only one word in front of the word, a ternary model Tri-Gram (N-3) is needed if the existence of a certain word depends on two words in front of the word, and the method is similar. Assuming that a sentence s in the contracted text is composed of a series of words q 1,q2,…,qn with specific sequences, according to the chain rule, the probability of occurrence of the sentence s is:
The N-Gram language model assumes that the sum of the probabilities of occurrence of any word is related to the N-1 words in front of it, namely:
when modeling is performed by using the ternary model, the i-th word is related to the first 2 words, namely:
And step S54, matching correct words corresponding to the confusion words by using the confusion word set.
In summary, the invention has the advantages that:
The method comprises the steps of constructing multi-feature word weights by combining the position weights, the part-of-speech weights and the word length weights of words on the basis of traditional weighted feature weights, extracting keywords based on the multi-feature word weights, fully considering the position, the part-of-speech and the word length characteristics of the words, greatly improving the accuracy of keyword extraction, searching similar historical contract texts through keyword matching, greatly improving the searching efficiency relative to full text searching, searching confusion words through language model confusion and N-Gram language models, and matching correct words corresponding to the confusion words based on the established confusion word sets, so that content error correction of contract texts to be analyzed is realized, and compared with traditional manual analysis, the quality and the efficiency of contract text analysis are greatly improved.
While specific embodiments of the invention have been described above, it will be appreciated by those skilled in the art that the specific embodiments described are illustrative only and not intended to limit the scope of the invention, and that equivalent modifications and variations of the invention in light of the spirit of the invention will be covered by the claims of the present invention.
Claims (8)
1. A contract text intelligent analysis method based on depth data mining is characterized by comprising the following steps:
step S10, acquiring a contract text to be analyzed and a large number of historical contract texts to form a contract text set;
s20, preprocessing the contract text set;
Step S30, based on a multi-feature word weight formula, extracting keywords from the preprocessed contract text to be analyzed and the preprocessed historical contract text respectively to obtain a plurality of first keywords and a plurality of second keywords;
Step S40, similar historical contract texts are searched based on the first keywords and the second keywords;
S50, searching confusion words of the contract text to be analyzed based on the language model confusion degree and the N-Gram language model, and matching correct words corresponding to the confusion words;
and step S60, displaying the first keyword of the contract text to be analyzed, similar historical contract text, confusion words and correct words, and completing intelligent analysis of the contract text to be analyzed.
2. The method for intelligent analysis of contract text based on deep data mining according to claim 1, wherein in the step S10, the step of obtaining a large amount of historical contract text is specifically as follows:
setting a time span, and acquiring historical contract texts of a large number of different departments in different areas based on the time span.
3. The method for intelligent analysis of contract text based on deep data mining according to claim 1, wherein said step S20 comprises the following steps:
S21, searching for repeated contract texts in the contract text set based on the contract titles, and merging repeated contract texts;
s22, eliminating noise data of each contract text in the contract text set;
Step S23, a sensitive word stock is created, and sensitive words of all contract texts in the contract text set are filtered based on the sensitive word stock.
4. The method for intelligent analysis of contract text based on depth data mining according to claim 3, wherein in step S22, the noise data includes at least URL address, special symbol, expression, picture and zero-width character.
5. The method for intelligent analysis of contract text based on deep data mining according to claim 1, wherein in the step S30, the multi-feature word weight formula is specifically:
WNEW-TF-IDF=WTF-IDF×Wword;
WTF-IDF=TF(i)×IDF(i);
Wword=αWl+βWc+γWlen;
wherein W NEW-TF-IDF represents multi-feature word weight, W TF-IDF represents weighted feature weight, W word represents word weight comprising position weight, part-of-speech weight and word length weight, TF (i) represents word frequency of the ith word, IDF (i) represents inverse document frequency of the ith word, namely, the smaller the number of contracted texts comprising the ith word is, the larger the value is, N i represents the number of times the ith word appears, N represents the total number of all keywords, N represents the total number of contracted texts, df (i) represents the number of documents in which the ith word appears, alpha, beta and gamma all represent weight coefficients, W l represents position weight, W c represents part-of-speech weight, W len represents word length weight, i len represents word length of the ith word, avg (len) represents average price word length.
6. The method for intelligent analysis of contract text based on deep data mining according to claim 1, wherein the step S40 specifically comprises the following steps:
step S41, splicing the first keywords and the corresponding contract titles to obtain first key information, and splicing the second keywords and the corresponding contract titles to obtain second key information;
step S42, inputting the first key information and the second key information into a BERT model and an average pooling layer in sequence for feature extraction to obtain a first feature sentence vector and a second feature sentence vector;
And step S43, sequentially calculating cosine similarity of the first feature sentence vector and each second feature sentence vector, and matching similar historical contract texts based on the cosine similarity.
7. The intelligent analysis method of contract text based on depth data mining according to claim 6, wherein in the step S43, the cosine similarity calculation formula is:
Where sim represents cosine similarity, x i represents a first feature sentence vector, y i represents a second feature sentence vector, and m represents the total number of feature sentence vectors.
8. The method for intelligent analysis of contract text based on deep data mining according to claim 1, wherein said step S50 comprises the following steps:
Step S51, setting a likelihood threshold value and creating an confusion word set, wherein the confusion word set comprises one-to-one correspondence between a plurality of confusion words and correct words;
Step S52, calculating likelihood estimation values of all the sentences in the contract text to be analyzed in sequence based on the confusion degree of the language model, judging whether the likelihood estimation values are lower than a likelihood threshold value, if so, indicating that suspected confusion words exist, and entering step S53;
S53, sorting sentences with suspected confusion words through an N-Gram language model, and selecting words with highest scores as confusion words based on sorting results;
And step S54, matching correct words corresponding to the confusion words by using the confusion word set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111485260.9A CN114328822B (en) | 2021-12-07 | 2021-12-07 | A contract text intelligent analysis method based on deep data mining |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111485260.9A CN114328822B (en) | 2021-12-07 | 2021-12-07 | A contract text intelligent analysis method based on deep data mining |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114328822A CN114328822A (en) | 2022-04-12 |
CN114328822B true CN114328822B (en) | 2025-04-04 |
Family
ID=81049667
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111485260.9A Active CN114328822B (en) | 2021-12-07 | 2021-12-07 | A contract text intelligent analysis method based on deep data mining |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114328822B (en) |
Families Citing this family (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116955536A (en) * | 2023-07-26 | 2023-10-27 | 北京长城电子商务有限公司 | Contract-based automatic matching rule algorithm |
CN118520862B (en) * | 2024-07-17 | 2024-11-12 | 沈阳慧筑云科技有限公司 | A method for intelligently generating contract templates based on user contract habits |
CN118607522B (en) * | 2024-08-08 | 2024-10-18 | 沈阳慧筑云科技有限公司 | A personalized user behavior prompting method based on big data and big language model |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134952A (en) * | 2019-04-29 | 2019-08-16 | 华南师范大学 | A method, device and storage medium for rejecting wrong text |
CN110765765A (en) * | 2019-09-16 | 2020-02-07 | 平安科技(深圳)有限公司 | Contract key clause extraction method and device based on artificial intelligence and storage medium |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP5203324B2 (en) * | 2009-09-16 | 2013-06-05 | 日本電信電話株式会社 | Text analysis apparatus, method and program for typographical error |
CN108334533B (en) * | 2017-10-20 | 2021-12-24 | 腾讯科技(深圳)有限公司 | Keyword extraction method and device, storage medium and electronic device |
CN111241814B (en) * | 2019-12-31 | 2023-04-28 | 中移(杭州)信息技术有限公司 | Error correction method, device, electronic equipment and storage medium for speech recognition text |
-
2021
- 2021-12-07 CN CN202111485260.9A patent/CN114328822B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110134952A (en) * | 2019-04-29 | 2019-08-16 | 华南师范大学 | A method, device and storage medium for rejecting wrong text |
CN110765765A (en) * | 2019-09-16 | 2020-02-07 | 平安科技(深圳)有限公司 | Contract key clause extraction method and device based on artificial intelligence and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN114328822A (en) | 2022-04-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110442760B (en) | A synonym mining method and device for question answering retrieval system | |
US12135939B2 (en) | Systems and methods for deviation detection, information extraction and obligation deviation detection | |
US9971974B2 (en) | Methods and systems for knowledge discovery | |
CN108304375B (en) | Information identification method and equipment, storage medium and terminal thereof | |
CN114328822B (en) | A contract text intelligent analysis method based on deep data mining | |
US7376634B2 (en) | Method and apparatus for implementing Q&A function and computer-aided authoring | |
US9792277B2 (en) | System and method for determining the meaning of a document with respect to a concept | |
US9183274B1 (en) | System, methods, and data structure for representing object and properties associations | |
US8370129B2 (en) | System and methods for quantitative assessment of information in natural language contents | |
CN104794169B (en) | A kind of subject terminology extraction method and system based on sequence labelling model | |
US20050080613A1 (en) | System and method for processing text utilizing a suite of disambiguation techniques | |
CN112632969B (en) | Incremental industry dictionary updating method and system | |
CN108124477A (en) | Segmenter is improved based on pseudo- data to handle natural language | |
JP2005526317A (en) | Method and system for automatically searching a concept hierarchy from a document corpus | |
CN113886604A (en) | Job knowledge map generation method and system | |
CN113515939B (en) | System and method for extracting key information of investigation report text | |
CN111767733A (en) | A document classification method based on statistical word segmentation | |
US20140089246A1 (en) | Methods and systems for knowledge discovery | |
CN118797005A (en) | Intelligent question-answering method, device, electronic device, storage medium and product | |
CN113076740A (en) | Synonym mining method and device in government affair service field | |
Hirpassa | Information extraction system for Amharic text | |
CN113392189B (en) | News text processing method based on automatic word segmentation | |
Lazemi et al. | Persian plagirisim detection using CNN s | |
Chakraborty et al. | N-Gram based Assamese Question Pattern Extraction and Probabilistic Modelling | |
CN112559768B (en) | Short text mapping and recommendation method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |