[go: up one dir, main page]

CN109977406A - A kind of Chinese medicine state of an illness text key word extracting method based on sick position - Google Patents

A kind of Chinese medicine state of an illness text key word extracting method based on sick position Download PDF

Info

Publication number
CN109977406A
CN109977406A CN201910232088.2A CN201910232088A CN109977406A CN 109977406 A CN109977406 A CN 109977406A CN 201910232088 A CN201910232088 A CN 201910232088A CN 109977406 A CN109977406 A CN 109977406A
Authority
CN
China
Prior art keywords
text
word
illness
chinese medicine
state
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910232088.2A
Other languages
Chinese (zh)
Inventor
姜晓红
陈广
吴健
吴朝晖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910232088.2A priority Critical patent/CN109977406A/en
Publication of CN109977406A publication Critical patent/CN109977406A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

本发明公开了一种基于病位的中医病情文本关键词提取方法,包括以下步骤:对中医病情文本分词,并基于中医病情文本分词结果生成中医病情词典;计算中医病情词典中每个词语的IDF值和TF值;根据词语的IDF值和TF值、以及词语中包含病位情况,提升词语的重要度;根据每个词语的重要度,选择排在前m位的m个词语为文本的关键词。本发明虑到中医病情文本中的文本关键词大部分是病位词及病症词,以病位为基础,通过对TF‑IDF的值进行病位加权,进而提升了中医病情文本关键词提取的准确性。

The invention discloses a method for extracting keywords of TCM condition text based on disease location, comprising the following steps: segmenting TCM condition text, and generating a TCM condition dictionary based on the word segmentation result of the TCM condition text; calculating the IDF of each word in the TCM condition dictionary value and TF value; according to the IDF value and TF value of the word, and the condition of the disease location in the word, the importance of the word is improved; according to the importance of each word, the m words in the top m are selected as the key of the text word. Considering that most of the text keywords in the TCM disease text are disease position words and disease words, the invention takes the disease position as the basis, and improves the extraction efficiency of keywords in the TCM disease text by weighting the disease position on the value of TF-IDF. accuracy.

Description

一种基于病位的中医病情文本关键词提取方法A method for extracting keywords from TCM disease text based on disease location

技术领域technical field

本发明属于自然语言处理技术领域,具体涉及一种基于病位的中医病 情文本关键词提取方法。The invention belongs to the technical field of natural language processing, and in particular relates to a disease location-based method for extracting keywords from TCM disease text.

背景技术Background technique

中医辩证辨病常采用试探与反证、援物比类、“望、闻、问、切”四 诊合参的方法对病人进行诊疗,通常问病人的病症部位,症状严重程度, 病症的有无关系、病人饮食起居等,随着数字化检验的不断发展,中医诊 疗中也常常包括西医检测数据,如血常规、尿常规等数据。相对于一般文 本,比如人民日报、网络新闻文本等,中医病情文本具有以下特点:Dialectical disease differentiation in TCM often adopts the method of trial and discord, comparison of aids and materials, and the four diagnostic methods of “looking, smelling, asking, and cutting” to diagnose and treat patients. Usually, the patient is asked about the location of the disease, the severity of the symptoms, and the presence or absence of the disease. With the continuous development of digital testing, TCM diagnosis and treatment often include Western medicine testing data, such as blood routine, urine routine and other data. Compared with general texts, such as People's Daily, online news texts, etc., Chinese medicine disease texts have the following characteristics:

1)中医病情文本中主语、谓语和宾语等句子主要成分不明显,甚至缺 少其中某一部分。另外,句子中并列关系明显,比如“无压痛、反跳痛”, 正确的理解是“无压痛”、“无反跳痛”;1) The main components of sentences such as subject, predicate and object in TCM illness texts are not obvious, and even some of them are missing. In addition, the juxtaposition relationship in the sentence is obvious, such as "no tenderness, rebound tenderness", the correct understanding is "no tenderness", "no rebound tenderness";

2)中医病情文本中常常包括一些西医检测数据。比如体温等数据,这 些数据给基于文本分析的算法带来了一些困难;2) Some western medicine testing data are often included in the texts of TCM illnesses. For example, data such as body temperature, these data bring some difficulties to algorithms based on text analysis;

3)中医病情文本中领域词比较多。比如“干湿性罗音”这个组合词在 一般的文本中不会出现;3) There are many domain words in TCM disease texts. For example, the compound word "dry and wet rales" does not appear in ordinary texts;

4)中医病情文本关键语义信息主要以症状、病位、症状有无关系和症 状严重程度等词或短语构成。4) The key semantic information of TCM disease text is mainly composed of words or phrases such as symptoms, disease location, whether symptoms are related or not, and the severity of symptoms.

常用的文本关键词提取算法是TF-IDF算法和TextRank算法。TF-IDF 算法计算词的词频和逆文档频率的乘积来衡量一个词在文本中的重要程 度,然后按照词的TF-IDF值进行降序排列,选择最靠前的若干个词作为 文本关键词。TextRank算法借鉴了PageRank的思路,通过构建词图网络 来发掘关键词,其核心是将文本中的词作为图的节点。设定文本窗口大小 为k,则词与词之间距离不大于k的节点间存在一条无向边。依据该词图 网络,通过随机游走的方式,求出每个词节点的重要性,最重要的若干词 就是文本关键词。Commonly used text keyword extraction algorithms are TF-IDF algorithm and TextRank algorithm. The TF-IDF algorithm calculates the product of the word frequency and the inverse document frequency to measure the importance of a word in the text, and then sorts in descending order according to the TF-IDF value of the word, and selects the most advanced words as text keywords. The TextRank algorithm draws on the idea of PageRank, and discovers keywords by constructing a word graph network, the core of which is to use the words in the text as the nodes of the graph. If the text window size is set to k, there is an undirected edge between the nodes whose distance between words is not greater than k. According to the word graph network, the importance of each word node is obtained by random walk, and the most important words are text keywords.

无论是TF-IDF算法,还是TextRank算法,其存在以下问题:对于文 本中的每个词来说,其算法执行前,每个词的重要性相同。在这里未考虑 到每个词中的字与内容之间的关系。对于中医病情文本来说,其关键词大 部分应该是文本中的病位词和症状词,也就是说词中包含病位的词更加容 易成为关键词。比如“腹泻”一词,其包含‘腹’字,其应该成为中医病 情文本的关键词。Whether it is the TF-IDF algorithm or the TextRank algorithm, there are the following problems: for each word in the text, before the algorithm is executed, each word has the same importance. The relationship between the words in each word and the content is not considered here. For TCM disease texts, most of the keywords should be disease words and symptom words in the text, that is to say, words containing disease locations are more likely to become keywords. For example, the word "diarrhea", which contains the word "abdomen", should be the key word in the text of TCM conditions.

发明内容SUMMARY OF THE INVENTION

本发明的目的在于提出一种基于病位的中医病情文本关键词提取方 法,以病位作为提取关键词的一个特征,通过计算加权的TF-IDF值来提 取关键词。The object of the present invention is to propose a method for extracting keywords from disease-position-based Chinese medicine disease texts, using disease-position as a feature of extracting keywords, and extracting keywords by calculating a weighted TF-IDF value.

为实现上述发明目,本发明提供以下技术方案:In order to realize the above-mentioned purpose of the invention, the present invention provides the following technical solutions:

一种基于病位的中医病情文本关键词提取方法,包括以下步骤:A disease location-based method for extracting keywords from TCM condition texts, comprising the following steps:

对中医病情文本分词,并基于中医病情文本分词结果生成中医病情词 典;Segmentation of TCM condition text, and generate a TCM condition dictionary based on the result of TCM condition text segmentation;

计算中医病情词典中每个词语的IDF值和TF值;Calculate the IDF value and TF value of each word in the TCM disease dictionary;

根据词语的IDF值和TF值、以及词语中包含病位情况,提升词语的 重要度;According to the IDF value and TF value of the word, and the condition of the disease location in the word, the importance of the word is improved;

根据每个词语的重要度,选择排在前m位的m个词语为文本的关键 词。According to the importance of each word, the m words in the top m are selected as the keywords of the text.

本发明提供的基于病位的中医病情文本关键词提取方法,克服了传统 文本关键词提取方法,如TF-IDF或TextRank,每个词的重要度相同的问 题,本发明虑到中医病情文本中的文本关键词大部分是病位词及病症词, 以病位为基础,通过对TF-IDF的值进行病位加权,进而提升了中医病情 文本关键词提取的准确性。The method for extracting keywords in TCM disease text based on disease location provided by the present invention overcomes the problem of traditional text keyword extraction methods, such as TF-IDF or TextRank, that the importance of each word is the same. Most of the text keywords are disease location words and disease words. Based on disease location, by weighting the disease location on the value of TF-IDF, the accuracy of TCM disease text keyword extraction is improved.

附图说明Description of drawings

为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对 实施例或现有技术描述中所需要使用的附图做简单地介绍,显而易见地, 下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员 来讲,在不付出创造性劳动前提下,还可以根据这些附图获得其他附图。In order to illustrate the embodiments of the present invention or the technical solutions in the prior art more clearly, the following briefly introduces the accompanying drawings that are used in the description of the embodiments or the prior art. Obviously, the drawings in the following description are only These are some embodiments of the present invention. For those of ordinary skill in the art, other drawings can also be obtained from these drawings without creative efforts.

图1是本发明基于病位的中医病情文本关键词提取方法的流程图。Fig. 1 is a flow chart of the method for extracting keywords of TCM disease text based on disease location of the present invention.

具体实施方式Detailed ways

为使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及 实施例对本发明进行进一步的详细说明。应当理解,此处所描述的具体实 施方式仅仅用以解释本发明,并不限定本发明的保护范围。In order to make the purpose, technical solutions and advantages of the present invention clearer, the present invention will be further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the present invention, and do not limit the protection scope of the present invention.

如图1所示,本实施例提供了一种基于病位的中医病情文本关键词提 取方法,该方法以病位为基础,通过对TF-IDF的值进行病位加权,提升 了中医病情文本关键词提取的准确性,具体包括以下步骤:As shown in FIG. 1 , the present embodiment provides a method for extracting keywords of TCM condition text based on disease location. The method is based on the disease location, and improves the TCM condition text by weighting the disease location on the value of TF-IDF. The accuracy of keyword extraction includes the following steps:

S101,对中医病情文本分词。S101 , segmenting the text of TCM illnesses.

具体地,在进行中医病情文本分词时,对于待分词的病情文本集合, 根据医学词典和停用词典stopWords,对病情文本集合进行分词,并去除 停用词,得到分词文本集合。Specifically, when performing word segmentation of TCM disease text, for the disease text set to be segmented, according to the medical dictionary and stopWords, the disease text set is segmented, and the stop words are removed to obtain a word segmentation text set.

病情文本集合中包含有代表中医诊断文本案例。医学词典是指包含有 医药材、方剂以及医学基本术语的字典。停用词典stopWords是中医领域 相关的停用词表,比如“病人”、“病史”等词。The disease text collection contains text cases representing TCM diagnosis. A medical dictionary is a dictionary that contains medicinal materials, prescriptions, and basic medical terms. The stop dictionary stopWords is a list of stop words related to the field of traditional Chinese medicine, such as words such as "patient" and "medical history".

S102,基于中医病情文本分词结果生成中医病情词典。S102, a TCM disease dictionary is generated based on the word segmentation result of the TCM disease text.

其中,所述基于中医病情文本分词结果生成中医病情词典包括:Wherein, the generation of the TCM disease dictionary based on the word segmentation result of the TCM disease text includes:

统计分词文本集合中的词语,将满足出现频数属于区间[α1,α2]之间的 词语添加到中医病情词典中。The words in the word segmentation text collection are counted, and the words whose frequency of occurrence belongs to the interval [α 1 , α 2 ] are added to the TCM disease dictionary.

S103,计算中医病情词典中每个词语的IDF值和TF值。S103, calculate the IDF value and TF value of each word in the TCM condition dictionary.

IDF值是指逆文本频率指数,对于中医病情词典中的第i个词语,其 IDF值为:The IDF value refers to the inverse text frequency index. For the i-th word in the TCM condition dictionary, its IDF value is:

其中,idfi表示第i个词语的IDF值,|D|为中医病情文本总数,ni为 包含第i个词语的中医病情文本数量。Among them, idf i represents the IDF value of the ith word, |D| is the total number of TCM disease texts, and ni is the number of TCM disease texts containing the ith word.

TF是指词频,在计算词语的TF值之前,分别将分词文本集合中属 于同一科室的病情文本拼成一个文本Dall,k,k表示科室索引;TF refers to the word frequency. Before calculating the TF value of the word, the disease texts belonging to the same department in the word segmentation text collection are spelled into a text D all, k , and k represents the department index;

对于中医病情词典中的第i个词语,其TF值为:For the i-th word in the TCM condition dictionary, its TF value is:

其中,tfi表示第i个词语的TF值,ni表示第i个词语在文本Dall,k中出 现的次数。Among them, tfi represents the TF value of the ith word, and ni represents the number of times the ith word appears in the text D all,k .

S104,根据词语的IDF值和TF值、以及词语中包含病位情况,提升 词语的重要度。S104, according to the IDF value and TF value of the word, and the condition that the word contains a disease location, improve the importance of the word.

当中医病情词典中词语中包含病位时,提升词语的重要度,若词语中 不包含病位,则不提升该词语的重要度,其中,重要度的计算公式为:When the word in the TCM disease dictionary contains the disease location, the importance of the word is increased. If the word does not contain the disease location, the importance of the word is not increased. The calculation formula of the importance is as follows:

其中,wordi代表中医病情词典中第i个的词语的重要度,α用来调节 病位重要度,并且α≥0,wj代表第j个病位的权重,其计算公式为:Among them, word i represents the importance of the i-th word in the TCM disease dictionary, α is used to adjust the importance of the disease location, and α≥0, w j represents the weight of the j-th disease location, and the calculation formula is:

其中,cj表示第j个病位出现的频数,n表示病位总数量。Among them, c j represents the frequency of the jth disease site, and n represents the total number of disease sites.

本发明中,病位不仅仅包括身体的各个部位字,还包括某些基础病症 字,比如“痛”、“虚”,这些病症字代表了中医辨证的理论基础。In the present invention, the disease position not only includes the characters of various parts of the body, but also includes some basic disease characters, such as "pain" and "deficiency", and these disease characters represent the theoretical basis of TCM syndrome differentiation.

S105,根据每个词语的重要度,选择排在前m位的m个词语为文本 的关键词。S105, according to the importance of each word, select m words in the top m positions as keywords of the text.

具体地,可以根据重要度对词语进行排列,选择排在前m位的词语成 为文本Dall的关键词W={W1,W2,...,Wm}。Specifically, the words may be arranged according to their importance, and the top m words are selected to be the keywords W={W 1 , W 2 , . . . , W m } of the text D all .

实验例Experimental example

假设病情文本集合中包括文本A,文本B,文本C,文本D,文本E:Suppose the disease text collection includes text A, text B, text C, text D, and text E:

文本A:病人1周前无明显诱因下出现咳嗽,神志清,精神可。无压 痛、反跳痛等。Text A: The patient developed cough without obvious incentive 1 week ago, and was conscious and in good spirits. No tenderness, rebound tenderness, etc.

文本B:病人出现咳嗽,无压痛、反跳痛等。Text B: The patient presented with a cough without tenderness, rebound tenderness, etc.

文本C:病人出现咳嗽。Text C: Patient presents with cough.

文本D:病人神志清。Text D: Patient is sane.

文本E:病人血肌酐升高。Text E: Patient has elevated serum creatinine.

对文本A、文本B、文本C、文本D、文本E按照S101~105进行计算, 其中文本A、文本C为同一类,文本B、文本D为同一类,文本E为一 类。The text A, text B, text C, text D, and text E are calculated according to S101 to 105, wherein text A and text C are of the same type, text B and text D are of the same type, and text E is of a type.

经过S101中分词和去停用词得到如下文本集合D′,其中采用结巴分词 工具进行分词:After word segmentation and stop word removal in S101, the following text set D' is obtained, in which the word segmentation tool is used for word segmentation:

文本A:1周前无明显诱因出现咳嗽神志清精神可无压痛 反跳痛Text A: Coughing without obvious incentives 1 week ago, conscious but no tenderness, rebound tenderness

文本B:出现咳嗽无压痛反跳痛Text B: Coughing with nontender rebound tenderness

文本C:出现咳嗽Text C: Coughing occurs

文本D:神志清Text D: Consciousness

文本E:血肌酐升高Text E: Elevated serum creatinine

设置α1=1,α2=10,根据S101获得的文本集合D′,经过S102的筛 选生成中医病情词典X,则中医病情词典X包含如表1所示的词:Set α 1 =1, α 2 =10, according to the text set D' obtained in S101, after the screening of S102, a TCM disease dictionary X is generated, and the TCM disease dictionary X contains the words shown in Table 1:

表1词典X生成Table 1 Dictionary X Generation

编号Numbering 词语words 频率frequency 11 出现Appear 33 22 咳嗽cough 33 33 none 22 44 压痛tenderness 22 55 反跳痛rebound pain 22 66 神志consciousness 22 77 clear 2 2

利用S103计算每个词语的IDF值,其中log底数取自然对数e,具体 计算结果如表2所示,Utilize S103 to calculate the IDF value of each word, wherein log base is taken natural logarithm e, concrete calculation result is as shown in table 2,

表2词典X的IDF计算结果Table 2 IDF calculation results of dictionary X

利用S103计算每个词语的TF值。首先,分别将文本集合D'中同一 科室的病情文本拼接成一个文本Dall,将同一类型文本A、C合并成Dall-ac, 文本B、D合并成Dall-bd,文本E为Dall-e,然后,分别计算中医病情词典 X中词语的TF值,计算结果如表3所示:Use S103 to calculate the TF value of each word. First, the disease texts of the same department in the text set D' are spliced into one text D all , the texts A and C of the same type are merged into D all-ac , the texts B and D are merged into D all-bd , and the text E is D all-e , and then, calculate the TF values of the words in the TCM condition dictionary X respectively, and the calculation results are shown in Table 3:

表3同一类TF值计算结果Table 3 Calculation results of TF values of the same type

根据S104提升词语的重要度,示例中病位可以选择“痛”、“神”。其 中病位权重初始化结果如表4所示:According to S104, the importance of the word is increased. In the example, the disease location can select "pain" and "god". Among them, the initial results of the disease position weight are shown in Table 4:

表4病位初始化结果Table 4 Initialization results of the sick position

设置α=10,根据S104计算词语中病位对关键词影响的重要度如表5:Set α=10, according to S104, calculate the importance of the influence of the disease position in the word on the keyword, as shown in Table 5:

表5 wordi值计算Table 5 word i value calculation

根据S105,选择2个词语形成文本的关键词,结果如表6所示:According to S105, two words are selected to form keywords of the text, and the results are shown in Table 6:

表6文本关键词结果Table 6 Text keyword results

Dall-ac中的top2关键词为压痛、反跳痛,Dall-bd中的top2关键词可以为 压痛、反跳痛、神志等任意两个组合,Dall-e中没有关键词。上述示例只是 说明该发明的计算过程,不代表实际情况。The top2 keywords in D all-ac are tenderness and rebound tenderness. The top2 keywords in D all-bd can be any two combinations of tenderness, rebound tenderness, and consciousness. There are no keywords in D all-e . The above examples only illustrate the calculation process of the invention and do not represent the actual situation.

以上所述的具体实施方式对本发明的技术方案和有益效果进行了详 细说明,应理解的是以上所述仅为本发明的最优选实施例,并不用于限制 本发明,凡在本发明的原则范围内所做的任何修改、补充和等同替换等, 均应包含在本发明的保护范围之内。The above-mentioned specific embodiments describe in detail the technical solutions and beneficial effects of the present invention. It should be understood that the above-mentioned embodiments are only the most preferred embodiments of the present invention, and are not intended to limit the present invention. Any modifications, additions and equivalent substitutions made within the scope shall be included within the protection scope of the present invention.

Claims (6)

1. a kind of Chinese medicine state of an illness text key word extracting method based on sick position, comprising the following steps:
Feelings of curing the desease in text participle, and Chinese medicine state of an illness dictionary is generated based on Chinese medicine state of an illness text word segmentation result;
Calculate the IDF value and TF value of each word in Chinese medicine state of an illness dictionary;
According to including sick position situation in the IDF value of word and TF value and word, the different degree of word is promoted;
According to the different degree of each word, select m before coming m words for the keyword of text.
2. the Chinese medicine state of an illness text key word extracting method based on sick position as described in claim 1, which is characterized in that carrying out When Chinese medicine state of an illness text segments, for state of an illness text collection to be segmented, according to Medical Dictionary and deactivated dictionary, to state of an illness text Set is segmented, and removes stop words, obtains participle text collection.
3. the Chinese medicine state of an illness text key word extracting method based on sick position as claimed in claim 2, which is characterized in that the base Generating Chinese medicine state of an illness dictionary in Chinese medicine state of an illness text word segmentation result includes:
, there is frequency for satisfaction and belongs to section [α in word in statistics participle text collection12] between word be added to Chinese medicine In state of an illness dictionary.
4. the Chinese medicine state of an illness text key word extracting method based on sick position as claimed in claim 2, which is characterized in that in I-th of word in feelings of curing the desease dictionary, IDF value are as follows:
Wherein, idfiIndicate the IDF value of i-th of word, | D | for Chinese medicine state of an illness text sum, niFor comprising in i-th of word Feelings of curing the desease amount of text.
5. the Chinese medicine state of an illness text key word extracting method based on sick position as claimed in claim 2, which is characterized in that calculating Before the TF value of word, the state of an illness text for belonging to same department in text collection will be segmented respectively and is combined into a text Dall,k, k table Show that department indexes;
For i-th of word in Chinese medicine state of an illness dictionary, TF value are as follows:
Wherein, tfiIndicate the TF value of i-th of word, niIndicate i-th of word in text Dall,kThe number of middle appearance.
6. the Chinese medicine state of an illness text key word extracting method based on sick position as claimed in claim 2, which is characterized in that work as Chinese medicine When in state of an illness dictionary in word including sick position, the different degree of word, the calculation formula of different degree are promoted are as follows:
Wherein, eordiThe different degree of i-th of word in Chinese medicine state of an illness dictionary is represented, α is used to adjust disease position different degree, and α >=0, wjThe weight for representing j-th of sick position, its calculation formula is:
Wherein, cjIndicate the frequency that j-th of sick position occurs, n indicates sick position total quantity.
CN201910232088.2A 2019-03-26 2019-03-26 A kind of Chinese medicine state of an illness text key word extracting method based on sick position Pending CN109977406A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910232088.2A CN109977406A (en) 2019-03-26 2019-03-26 A kind of Chinese medicine state of an illness text key word extracting method based on sick position

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910232088.2A CN109977406A (en) 2019-03-26 2019-03-26 A kind of Chinese medicine state of an illness text key word extracting method based on sick position

Publications (1)

Publication Number Publication Date
CN109977406A true CN109977406A (en) 2019-07-05

Family

ID=67080616

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910232088.2A Pending CN109977406A (en) 2019-03-26 2019-03-26 A kind of Chinese medicine state of an illness text key word extracting method based on sick position

Country Status (1)

Country Link
CN (1) CN109977406A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112002415A (en) * 2020-08-23 2020-11-27 吾征智能技术(北京)有限公司 Intelligent cognitive disease system based on human excrement
TWI815411B (en) * 2022-04-22 2023-09-11 臺北醫學大學 Methods and non-transitory computer storage media of extracting linguistic patterns and summarizing pathology report

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104899190A (en) * 2015-06-04 2015-09-09 百度在线网络技术(北京)有限公司 Generation method and device for word segmentation dictionary and word segmentation processing method and device
US20160132648A1 (en) * 2014-11-06 2016-05-12 ezDI, LLC Data Processing System and Method for Computer-Assisted Coding of Natural Language Medical Text
CN108133752A (en) * 2017-12-21 2018-06-08 新博卓畅技术(北京)有限公司 A kind of optimization of medical symptom keyword extraction and recovery method and system based on TFIDF
CN108647203A (en) * 2018-04-20 2018-10-12 浙江大学 A kind of computational methods of Chinese medicine state of an illness text similarity

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160132648A1 (en) * 2014-11-06 2016-05-12 ezDI, LLC Data Processing System and Method for Computer-Assisted Coding of Natural Language Medical Text
CN104899190A (en) * 2015-06-04 2015-09-09 百度在线网络技术(北京)有限公司 Generation method and device for word segmentation dictionary and word segmentation processing method and device
CN108133752A (en) * 2017-12-21 2018-06-08 新博卓畅技术(北京)有限公司 A kind of optimization of medical symptom keyword extraction and recovery method and system based on TFIDF
CN108647203A (en) * 2018-04-20 2018-10-12 浙江大学 A kind of computational methods of Chinese medicine state of an illness text similarity

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112002415A (en) * 2020-08-23 2020-11-27 吾征智能技术(北京)有限公司 Intelligent cognitive disease system based on human excrement
CN112002415B (en) * 2020-08-23 2024-03-01 吾征智能技术(北京)有限公司 Intelligent cognitive disease system based on human excrement
TWI815411B (en) * 2022-04-22 2023-09-11 臺北醫學大學 Methods and non-transitory computer storage media of extracting linguistic patterns and summarizing pathology report

Similar Documents

Publication Publication Date Title
Celletti et al. Evaluation of kinesiophobia and its correlations with pain and fatigue in joint hypermobility syndrome/Ehlers‐Danlos syndrome hypermobility type
Thornton et al. Publication bias in meta-analysis: its causes and consequences
Collins‐Thompson et al. Predicting reading difficulty with statistical language models
Heijmans et al. Dissimilarity in patients' and spouses' representations of chronic illness: Exploration of relations to patient adaptation
Givens et al. Publication bias in meta-analysis: a Bayesian data-augmentation approach to account for issues exemplified in the passive smoking debate
CN108399163A (en) Bluebeard compound polymerize the text similarity measure with word combination semantic feature
CN106682411A (en) Method for converting physical examination diagnostic data into disease label
CN103020454A (en) Method and system for extracting morbidity key factor and early warning disease
Deléger et al. Detecting negation of medical problems in French clinical notes
CN110931128A (en) Unstructured medical text unsupervised automatic symptom recognition method, system and device
Smalheiser et al. Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings
CN116092699A (en) Cancer question-answer interaction method based on pre-training model
Mondal et al. Wme: Sense, polarity and affinity based concept resource for medical events
Xiong et al. Knowledge-driven online multimodal automated phenotyping system
CN112349367B (en) Method, device, electronic equipment and storage medium for generating simulated medical record
CN109977406A (en) A kind of Chinese medicine state of an illness text key word extracting method based on sick position
Brown et al. Randomised crossover trial comparing the performance of Clinical Terms Version 3 and Read Codes 5 byte set coding schemes in general practice
Ibrahim et al. Enriching consumer health vocabulary using enhanced GloVe word embedding
Wanyan et al. Obstructive sleep apnea hypopnea syndrome: Protocol for the development of a core outcome set
Xie et al. Traditional Chinese medicine prescription mining based on abstract text
Eweje et al. Translatability Analysis of National Institutes of Health–Funded Biomedical Research That Applies Artificial Intelligence
Dorahaki et al. The impact of marital life satisfaction on the number of children ever born to women: a study from Kashan, Iran
CN110765762A (en) System and method for extracting optimal theme of online comment text under big data background
CN113782123A (en) An online medical patient satisfaction measurement method based on network data
Habert et al. Extending an existing specialized semantic lexicon.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190705