CN106649263A - Multi-word expression extraction method and device - Google Patents
Multi-word expression extraction method and device Download PDFInfo
- Publication number
- CN106649263A CN106649263A CN201610990921.6A CN201610990921A CN106649263A CN 106649263 A CN106649263 A CN 106649263A CN 201610990921 A CN201610990921 A CN 201610990921A CN 106649263 A CN106649263 A CN 106649263A
- Authority
- CN
- China
- Prior art keywords
- mutual information
- word
- information
- documents
- jump
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
本发明涉及多词表达抽取方法及其装置,包括:文档库经预处理后形成词汇集,计算多文档中相邻词汇的互信息,获取互信息序列前后的跳变信息,将互信息与跳变信息构成二维互信息,聚类二维互信息筛选出多词表达,构建多词表达库。本发明避免了一维互信息需要人工设定阈值,对不同数据的适应性问题,同时不局限于多词的二元结构,可一次获取多词组合的多词表达,且无需分步实现,有效提高多词表达的利用率,提高了多词表达库建设的准确度。
The present invention relates to a multi-word expression extraction method and its device, comprising: forming a vocabulary set after preprocessing a document library, calculating the mutual information of adjacent words in multiple documents, obtaining jump information before and after the mutual information sequence, and combining the mutual information with the jump information. Variable information constitutes two-dimensional mutual information, clustering two-dimensional mutual information screens out multi-word expressions, and builds a multi-word expression library. The present invention avoids the need to manually set the threshold for one-dimensional mutual information, and adapts to different data. At the same time, it is not limited to the binary structure of multi-words, and can obtain multi-word expressions of multi-word combinations at one time without step-by-step implementation. Effectively improve the utilization rate of multi-word expressions, and improve the accuracy of multi-word expression database construction.
Description
技术领域technical field
本发明涉及统计机器翻译和跨语言信息检索技术领域,尤其是一种多词表达抽取方法及其装置。The invention relates to the technical field of statistical machine translation and cross-language information retrieval, in particular to a multi-word expression extraction method and device thereof.
背景技术Background technique
多词表达是具有语法、语义或语用特性,并有意义完整的多个词组合。多词表达的识别能够很好的提升分词、词性标注以及机器翻译等工作的效率和准确性。在机器翻译中,正确识别源语言中的多词表达有助于选择合适的翻译,避免多个词分别翻译而导致的目标语言不自然甚至不能达意。Multi-word expression is a combination of multiple words with grammatical, semantic or pragmatic characteristics and complete meaning. The recognition of multi-word expressions can improve the efficiency and accuracy of word segmentation, part-of-speech tagging, and machine translation. In machine translation, correct identification of multi-word expressions in the source language is helpful to choose a suitable translation, and avoid unnatural or even unintelligible target language caused by separate translation of multiple words.
多词表达的抽取方法基本分为基于统计的方法和基于规则的方法。基于规则的方法一般是具体研究某一种类型如动词短语结构等或局限于某一个特定领域,基于统计的方法则可以抽取形式独立的多词表达,也就是利用统计信息无差别的抽取各种结构和领域的多词表达。然而,现有的统计方法面临的问题有:一维互信息需要人工设定阈值,对不同数据存在适应性问题,局限于多词的二元结构,无法一次获取多词组合的多词表达,且需分步实现,多词表达库建设的准确度低。The extraction methods of multi-word expressions are basically divided into statistical methods and rule-based methods. Rule-based methods generally study a certain type such as verb phrase structure or are limited to a specific field, while statistics-based methods can extract multi-word expressions with independent forms, that is, use statistical information to extract various expressions without distinction. Multi-word expressions for structures and domains. However, the problems faced by the existing statistical methods are: one-dimensional mutual information needs to manually set the threshold, there is an adaptability problem to different data, it is limited to the binary structure of multi-words, and it is impossible to obtain multi-word expressions of multi-word combinations at one time. And it needs to be implemented step by step, and the accuracy of multi-word expression database construction is low.
发明内容Contents of the invention
本发明的首要目的在于提供一种一次性获取多词组合的多词表达,无需分步实现,有效提高多词表达抽取利用率,提高了多词表达库建设的准确度。The primary purpose of the present invention is to provide a multi-word expression that obtains multi-word combinations at one time without step-by-step implementation, effectively improves the utilization rate of multi-word expression extraction, and improves the accuracy of multi-word expression database construction.
为实现上述目的,本发明采用了以下技术方案,一种多词表达抽取方法,该方法包括下列顺序的步骤:In order to achieve the above object, the present invention adopts the following technical solutions, a multi-word expression extraction method, the method includes the steps of the following order:
(1)文档库采用分词和词性标注的预处理,形成源语言文档;(1) The document library adopts preprocessing of word segmentation and part-of-speech tagging to form source language documents;
(2)计算多文档中相邻词汇的互信息,并进一步计算互信息序列前后的跳变信息;(2) Calculate the mutual information of adjacent words in multiple documents, and further calculate the jump information before and after the mutual information sequence;
(3)将互信息序列与跳变信息序列构成二维互信息集合;(3) The mutual information sequence and the jump information sequence constitute a two-dimensional mutual information set;
(4)二维互信息集合采用分类器为多词表达内点和外点,选多内点链接构建多词表达。(4) The two-dimensional mutual information set uses a classifier to express inliers and outliers for multi-words, and selects multi-inlier links to construct multi-word expressions.
进一步的,在所述步骤(1)中,针对收集文档库的所有文档进行中文分词、词性标注和命名实体识别、词性选择的预处理构成有特定次序的候选词汇集合。Further, in the step (1), the preprocessing of Chinese word segmentation, part-of-speech tagging, named entity recognition, and part-of-speech selection is performed on all documents in the collected document library to form a set of candidate words in a specific order.
进一步的,所述步骤(2)包括以下顺序的步骤:Further, said step (2) includes the steps in the following order:
(a)计算多文档中所有相邻词汇的互信息;(a) Calculate the mutual information of all adjacent words in multiple documents;
(b)计算互信息序列前后的跳变信息。(b) Calculate the jump information before and after the mutual information sequence.
进一步的,所述步骤(3)中,根据互信息序列与跳变信息序列对应位置点,构建二维互信息(MIi,fi),多个二维互信息构成二维互信息集合。Further, in the step (3), two-dimensional mutual information (MI i , f i ) is constructed according to the corresponding position points of the mutual information sequence and the hopping information sequence, and a plurality of two-dimensional mutual information forms a two-dimensional mutual information set.
进一步的,所述步骤(4)中,采用分类器将二维互信息集合中所有点,划分为多词表达内点和外点两类,将包含内点的相邻词汇链接构成多词表达。Further, in the step (4), a classifier is used to divide all points in the two-dimensional mutual information set into two types of multi-word expression interior points and exterior points, and the adjacent vocabulary links containing interior points form multi-word expression .
进一步的,所述步骤(a)中,计算多文档中相邻词汇的互信息,构成互信息序列MI,其中相邻词汇x和y的互信息计算MIi(0≤i<len(MI)-α)如下式:Further, in the step (a), the mutual information of adjacent words in multiple documents is calculated to form a mutual information sequence MI, wherein the mutual information of adjacent words x and y is calculated as MI i (0≤i<len(MI) -α) as follows:
其中,x和y表示相邻词汇;MIi表示相邻词汇x和y构成的第i个互信息;len(MI)表示互信息序列MI的长度;α表示一个常量;M表示所有文档中词汇的总数;p(x,y)表示词汇x和y在所有文档中共现次数;p(x)表示词汇x在所有文档中出现次数;p(y)表示词汇y在所有文档中出现次数;N表示文档集中所有文档的个数;Nx,y表示包含x和y共现的文档个数。Among them, x and y represent adjacent words; MI i represents the i-th mutual information formed by adjacent words x and y; len(MI) represents the length of the mutual information sequence MI; α represents a constant; M represents the vocabulary in all documents The total number of; p(x, y) indicates the number of occurrences of vocabulary x and y in all documents; p(x) indicates the number of occurrences of vocabulary x in all documents; p(y) indicates the number of occurrences of vocabulary y in all documents; N Indicates the number of all documents in the document set; N x, y indicates the number of documents containing co-occurrence of x and y.
进一步的,所述步骤(b)中,计算互信息序列前后的跳变信息,构成跳变信息序列f,其中的相邻互信息的跳变信息fi计算公式如下:Further, in the step (b), the jump information before and after the mutual information sequence is calculated to form the jump information sequence f, and the calculation formula of the jump information fi of the adjacent mutual information is as follows:
其中,fi表示互信息序列中当前互信息和后续互信息的跳变信息;||表示取绝对值。Among them, f i represents the jump information of the current mutual information and the subsequent mutual information in the mutual information sequence; || represents the absolute value.
进一步的,所述α为2。Further, the α is 2.
本发明的另一目的在于提供一种多词表达抽取装置,包括:Another object of the present invention is to provide a multi-word expression extraction device, comprising:
候选词汇获取装置:针对收集文档库的所有文档进行中文分词、词性标注和命名实体识别、词性选择的预处理构成具有特定次序的候选词汇集合;Candidate vocabulary acquisition device: perform preprocessing of Chinese word segmentation, part-of-speech tagging and named entity recognition, and part-of-speech selection for all documents in the collected document library to form a candidate vocabulary set with a specific order;
互信息和跳变信息获取装置:计算多文档中相邻候选词汇的互信息,并跟据相邻互信息计算互信息序列前后的跳变信息;Mutual information and jump information acquisition device: calculate the mutual information of adjacent candidate words in multiple documents, and calculate the jump information before and after the mutual information sequence according to the adjacent mutual information;
二维互信息获取装置:根据互信息序列与跳变信息序列位置对应的信息,选择互信息和跳变信息构成二维互信息;The two-dimensional mutual information acquisition device: according to the information corresponding to the position of the mutual information sequence and the jump information sequence, select the mutual information and the jump information to form the two-dimensional mutual information;
分类筛选多词表达装置:采用分类器将二维互信息集合中所有点,分类为多词表达内点和外点两类,将有内点的相邻词汇链接构成多词表达。Classification and screening multi-word expression device: use a classifier to classify all points in the two-dimensional mutual information set into two types of multi-word expression inner points and outer points, and link adjacent words with inner points to form multi-word expressions.
由上述技术方案可知,本发明将相邻词汇间的互信息转变成二维互信息,聚类二维互信息筛选出多词表达,避免了一维互信息需要人工设定阈值,对不同数据的适应性问题,同时不局限于多词的二元结构,可一次获取多词组合的多词表达,且无需分步实现,有效提高多词表达的利用率,提高了多词表达库建设的准确度。It can be seen from the above technical solution that the present invention converts the mutual information between adjacent words into two-dimensional mutual information, and clusters two-dimensional mutual information to screen out multi-word expressions, avoiding the need to manually set thresholds for one-dimensional mutual information, and different data At the same time, it is not limited to the binary structure of multi-words, and multi-word expressions of multi-word combinations can be obtained at one time without step-by-step implementation, which effectively improves the utilization rate of multi-word expressions and improves the construction of multi-word expression databases. Accuracy.
附图说明Description of drawings
图1是本发明方法的流程示意图;Fig. 1 is a schematic flow sheet of the inventive method;
图2是本发明装置的结构框图。Fig. 2 is a structural block diagram of the device of the present invention.
具体实施方式detailed description
一种多词表达抽取方法,该方法包括下列顺序的步骤:(1)文档库采用分词和词性标注等预处理,形成源语言文档;(2)计算多文档中相邻词汇的互信息,并进一步计算互信息序列前后的跳变信息;(3)将互信息序列与跳变信息序列构成二维互信息集合;(4)二维互信息集合采用分类器为多词表达内点和外点,筛选连续内点链接构建多词表达。如图1所示。A multi-word expression extraction method, the method comprises the steps of the following sequence: (1) the document base adopts preprocessing such as word segmentation and part-of-speech tagging to form a source language document; (2) calculates the mutual information of adjacent words in the multi-document, and Further calculate the jump information before and after the mutual information sequence; (3) The mutual information sequence and the jump information sequence form a two-dimensional mutual information set; (4) The two-dimensional mutual information set uses a classifier to express inliers and outliers for multiple words , to filter continuous internal point links to construct multi-word expressions. As shown in Figure 1.
以下结合图1对本发明作进一步的说明。The present invention will be further described below in conjunction with FIG. 1 .
在所述步骤(1)中,针对收集文档库的所有文本进行中文分词、词性标注和命名实体识别、词性选择的预处理构成有特定次序的候选词汇集合。In the step (1), Chinese word segmentation, part-of-speech tagging, named entity recognition, and part-of-speech selection are performed on all texts in the collected document base to form a set of candidate words in a specific order.
所述步骤(2)包括以下顺序的步骤:(a)计算多文档中所有相邻词汇的互信息;(b)计算互信息序列前后的跳变信息。The step (2) includes steps in the following order: (a) calculating the mutual information of all adjacent words in multiple documents; (b) calculating the jump information before and after the mutual information sequence.
在所述步骤(a)中,计算多文档中相邻词汇的互信息,构成互信息序列MI,其中相邻词汇x和y的互信息计算MIi(0≤i<len(MI)-α)如下式:In the step (a), the mutual information of adjacent words in multiple documents is calculated to form a mutual information sequence MI, wherein the mutual information of adjacent words x and y is calculated as MI i (0≤i<len(MI)-α ) as follows:
其中,x和y表示相邻词汇;MIi表示相邻词汇x和y构成的第i个互信息;len(MI)表示互信息序列MI的长度;α表示一个常量;M表示所有文档中词汇的总数;p(x,y)表示词汇x和y在所有文档中共现次数;p(x)表示词汇x在所有文档中出现次数;p(y)表示词汇y在所有文档中出现次数;N表示文档集中所有文档的个数;Nx,y表示包含x和y共现的文档个数;常量α为2。Among them, x and y represent adjacent words; MI i represents the i-th mutual information formed by adjacent words x and y; len(MI) represents the length of the mutual information sequence MI; α represents a constant; M represents the vocabulary in all documents The total number of; p(x, y) indicates the number of occurrences of vocabulary x and y in all documents; p(x) indicates the number of occurrences of vocabulary x in all documents; p(y) indicates the number of occurrences of vocabulary y in all documents; N Indicates the number of all documents in the document set; N x, y indicates the number of documents containing co-occurrence of x and y; the constant α is 2.
在所述步骤(b)中,计算互信息序列前后的跳变信息,构成跳变信息序列f,其中的相邻互信息的跳变信息fi计算公式如下:In the step (b), the jump information before and after the mutual information sequence is calculated to form the jump information sequence f, and the calculation formula of the jump information fi of the adjacent mutual information is as follows:
其中,fi表示互信息序列中当前互信息和后续互信息的跳变信息;||表示取绝对值。Among them, f i represents the jump information of the current mutual information and the subsequent mutual information in the mutual information sequence; || represents the absolute value.
所述步骤(3)中,根据互信息序列与跳变信息序列对应位置点,构建二维互信息(MIi,fi),多个二维互信息构成二维互信息集合。In the step (3), two-dimensional mutual information (MI i , f i ) is constructed according to the corresponding position points of the mutual information sequence and the hopping information sequence, and a plurality of two-dimensional mutual information forms a two-dimensional mutual information set.
所述步骤(4)中,采用分类器将二维互信息集合中所有点,划分为多词表达内点和外点两类,将包含内点的相邻词汇链接构成多词表达。In the step (4), a classifier is used to divide all points in the two-dimensional mutual information set into two types of multi-word expression inner points and outer points, and adjacent vocabulary links containing inner points are used to form multi-word expressions.
如图2所示,本发明装置包括:候选词汇获取装置,针对收集文档库的所有文本进行中文分词、词性标注和命名实体识别、词性选择等预处理构成具有特定次序的候选词汇集合;互信息和跳变信息获取装置,计算多文档中相邻候选词汇的互信息,并跟据相邻互信息计算互信息序列前后的跳变信息;二维互信息获取装置,根据互信息序列与跳变信息序列位置对应的信息,选择互信息和跳变信息构成二维互信息;分类筛选多词表达装置,采用分类器将二维互信息集合中所有点,分类为多词表达内点和外点两类,将有内点的相邻词汇链接构成多词表达。As shown in Figure 2, the device of the present invention includes: a candidate vocabulary acquisition device, which performs preprocessing such as Chinese word segmentation, part-of-speech tagging, named entity recognition, and part-of-speech selection for all texts in the collected document library to form a candidate vocabulary set with a specific order; mutual information and jump information acquisition device, calculate the mutual information of adjacent candidate words in multiple documents, and calculate the jump information before and after the mutual information sequence according to the adjacent mutual information; the two-dimensional mutual information acquisition device, according to the mutual information sequence and the jump information Information corresponding to the information sequence position, select mutual information and jump information to form two-dimensional mutual information; classify and screen multi-word expression devices, use a classifier to classify all points in the two-dimensional mutual information set into multi-word expression internal points and external points Two types, linking adjacent words with interior points to form multi-word expressions.
综上所述,本发明将相邻词汇间的互信息转变成二维互信息,聚类二维互信息筛选出多词表达,避免了一维互信息需要人工设定阈值,对不同数据的适应性问题,同时不局限于多词的二元结构,可一次获取多词组合的多词表达,且无需分步实现,有效提高多词表达的利用率,提高了多词表达库建设的准确度。In summary, the present invention transforms the mutual information between adjacent words into two-dimensional mutual information, clusters the two-dimensional mutual information to screen out multi-word expressions, avoids the need for manual setting of thresholds for one-dimensional mutual information, and compares different data The problem of adaptability is not limited to the binary structure of multi-words, and the multi-word expressions of multi-word combinations can be obtained at one time without step-by-step implementation, which effectively improves the utilization rate of multi-word expressions and improves the accuracy of multi-word expression database construction Spend.
Claims (9)
Priority Applications (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610990921.6A CN106649263A (en) | 2016-11-10 | 2016-11-10 | Multi-word expression extraction method and device |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201610990921.6A CN106649263A (en) | 2016-11-10 | 2016-11-10 | Multi-word expression extraction method and device |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN106649263A true CN106649263A (en) | 2017-05-10 |
Family
ID=58806046
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201610990921.6A Pending CN106649263A (en) | 2016-11-10 | 2016-11-10 | Multi-word expression extraction method and device |
Country Status (1)
| Country | Link |
|---|---|
| CN (1) | CN106649263A (en) |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108549631A (en) * | 2018-03-30 | 2018-09-18 | 北京智慧正安科技有限公司 | Noun dictionary extracting method, electronic device and computer readable storage medium |
Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20040044528A1 (en) * | 2002-09-03 | 2004-03-04 | Chelba Ciprian I. | Method and apparatus for generating decision tree questions for speech processing |
| CN1567297A (en) * | 2003-07-03 | 2005-01-19 | 中国科学院声学研究所 | Method for extracting multi-word translation equivalent cells from bilingual corpus automatically |
| US20050216443A1 (en) * | 2000-07-06 | 2005-09-29 | Streamsage, Inc. | Method and system for indexing and searching timed media information based upon relevance intervals |
| JP2006178536A (en) * | 2004-12-20 | 2006-07-06 | Oki Electric Ind Co Ltd | Bilingual expression extraction device |
| CN106095736A (en) * | 2016-06-07 | 2016-11-09 | 华东师范大学 | A kind of method of field neologisms extraction |
-
2016
- 2016-11-10 CN CN201610990921.6A patent/CN106649263A/en active Pending
Patent Citations (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| US20050216443A1 (en) * | 2000-07-06 | 2005-09-29 | Streamsage, Inc. | Method and system for indexing and searching timed media information based upon relevance intervals |
| US20040044528A1 (en) * | 2002-09-03 | 2004-03-04 | Chelba Ciprian I. | Method and apparatus for generating decision tree questions for speech processing |
| CN1567297A (en) * | 2003-07-03 | 2005-01-19 | 中国科学院声学研究所 | Method for extracting multi-word translation equivalent cells from bilingual corpus automatically |
| JP2006178536A (en) * | 2004-12-20 | 2006-07-06 | Oki Electric Ind Co Ltd | Bilingual expression extraction device |
| CN106095736A (en) * | 2016-06-07 | 2016-11-09 | 华东师范大学 | A kind of method of field neologisms extraction |
Cited By (1)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN108549631A (en) * | 2018-03-30 | 2018-09-18 | 北京智慧正安科技有限公司 | Noun dictionary extracting method, electronic device and computer readable storage medium |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN108984526B (en) | A deep learning-based document topic vector extraction method | |
| CN104572892B (en) | A Text Classification Method Based on Recurrent Convolutional Network | |
| Wilkinson et al. | Neural Ctrl-F: segmentation-free query-by-string word spotting in handwritten manuscript collections | |
| CN110413768B (en) | A method of automatic generation of article title | |
| CN116167362A (en) | Model training method, Chinese text error correction method, electronic device and storage medium | |
| CN112926345B (en) | Multi-feature fusion neural machine translation error detection method based on data augmentation training | |
| CN112101031B (en) | Entity identification method, terminal equipment and storage medium | |
| CN106095749A (en) | A kind of text key word extracting method based on degree of depth study | |
| CN106611055A (en) | Chinese Fuzzy Restricted Information Range Detection Method Based on Laminated Neural Network | |
| US10303938B2 (en) | Identifying a structure presented in portable document format (PDF) | |
| CN108427717B (en) | Letter class language family medical text relation extraction method based on gradual expansion | |
| CN103559193B (en) | A kind of based on the theme modeling method selecting unit | |
| CN110490081A (en) | A kind of remote sensing object decomposition method based on focusing weight matrix and mutative scale semantic segmentation neural network | |
| CN105261358A (en) | N-gram grammar model constructing method for voice identification and voice identification system | |
| CN109446333A (en) | A kind of method that realizing Chinese Text Categorization and relevant device | |
| CN111476036A (en) | A Word Embedding Learning Method Based on Chinese Word Feature Substrings | |
| CN108763192B (en) | Entity relation extraction method and device for text processing | |
| CN114510568A (en) | Author name disambiguation method and author name disambiguation device | |
| CN116186067A (en) | Industrial data table storage query method and equipment | |
| CN107168953A (en) | The new word discovery method and system that word-based vector is characterized in mass text | |
| CN104077274A (en) | Method and device for extracting hot word phrases from document set | |
| CN116842168A (en) | Cross-domain problem processing method and device, electronic equipment and storage medium | |
| CN108846033A (en) | The discovery and classifier training method and apparatus of specific area vocabulary | |
| CN109472020B (en) | Feature alignment Chinese word segmentation method | |
| CN106484672A (en) | Vocabulary recognition methods and vocabulary identifying system |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination | ||
| WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170510 |
|
| WD01 | Invention patent application deemed withdrawn after publication |