[go: up one dir, main page]

CN1770147A - Method and system for querying data based on keyword segmentation index - Google Patents

Method and system for querying data based on keyword segmentation index Download PDF

Info

Publication number
CN1770147A
CN1770147A CN 200410087137 CN200410087137A CN1770147A CN 1770147 A CN1770147 A CN 1770147A CN 200410087137 CN200410087137 CN 200410087137 CN 200410087137 A CN200410087137 A CN 200410087137A CN 1770147 A CN1770147 A CN 1770147A
Authority
CN
China
Prior art keywords
word
prefix
suffix
keyword
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 200410087137
Other languages
Chinese (zh)
Inventor
邱全成
徐晓燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inventec Corp
Original Assignee
Inventec Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inventec Corp filed Critical Inventec Corp
Priority to CN 200410087137 priority Critical patent/CN1770147A/en
Publication of CN1770147A publication Critical patent/CN1770147A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

A method and system for searching keyword segmented index type data can be applied to a computer platform, and the basic structure of the system at least comprises: the system comprises a database, a prefix and suffix list module, a word stem list module, a keyword input module, a prefix/suffix comparison module, a word stem comparison module and a data acquisition module; the invention provides a keyword segmented index type data query function, which enables a user to input a word of a specific Pinyin type language as a keyword for query, and retrieves a data item corresponding to the keyword from a database in a segmented mode according to a prefix/suffix and a word stem of the keyword input by the user; the segmented index type has the advantage that the number of times of character string comparison can be reduced, so that the keyword query speed can be improved.

Description

关键词分段索引式资料查询方法及系统Method and system for querying data based on keyword segmentation index

技术领域technical field

本发明是关于一种计算机信息技术,特别是关于一种关键词分段索引式资料查询方法及系统,应用在计算机平台,对该计算机平台提供关键词分段索引式的资料查询功能,让使用者输入特定的拼音式语言的单词(例如英文单词)作为查询用的关键词,并根据使用者输入的关键词的前缀/字尾和字干,以分段方式从数据库(例如英汉辞典数据库)中索引出该关键词对应的数据项(例如英文单词的中文释义及用法资料)。The present invention relates to a computer information technology, in particular to a method and system for keyword segmented index type data query, which is applied to a computer platform and provides a keyword segmented index type data query function for the computer platform, allowing users to The user inputs specific pinyin-type language words (such as English words) as keywords for query, and according to the prefix/suffix and stem of the keywords input by the user, the database (such as English-Chinese dictionary database) is retrieved in a segmented manner. The data item corresponding to the keyword (such as the Chinese definition and usage information of the English word) is retrieved in the index.

背景技术Background technique

电子英汉辞典是一种常用的计算机应用软件程序,应用在计算机平台,例如桌上型个人计算机、笔记本型计算机、平板型计算机(TabletPC)、个人数字助理装置(Personal Digital Assistant,PDA)等,让使用者利用该计算机平台以在线方式查询及学习英文单词的中文释义及用法。由于计算机化的电子英汉辞典让使用者更快速地查询到英文单词的中文释义及用法,因此较传统的印刷辞典更能提高学生的学习效果。The electronic English-Chinese dictionary is a commonly used computer application software program, which is applied to computer platforms, such as desktop personal computers, notebook computers, tablet computers (Tablet PC), personal digital assistant devices (Personal Digital Assistant, PDA), etc., so that Users use the computer platform to inquire and learn Chinese definitions and usages of English words online. Since computerized electronic English-Chinese dictionaries allow users to find the Chinese meaning and usage of English words more quickly, it can improve students' learning effect more than traditional printed dictionaries.

目前的电子英汉辞典常用的一种单词查询方法是,首先令使用者输入要查询的英文单词的字符串,作为查询用的关键词,接着按照英文字母顺序逐项对比,从辞典数据库中搜寻出该关键词所对应的数据项(即中文释义及用法资料)。A word query method commonly used in current electronic English-Chinese dictionaries is to firstly let the user input the character string of the English word to be queried as the keyword for query, and then compare item by item according to the order of English letters, and search out the word from the dictionary database. The data item corresponding to the keyword (that is, the Chinese definition and usage information).

然而上述单词查询方法实际应用上的一项缺点在于,需要对输入英文单词中的所有字母按英文字母的顺序,逐步对该辞典数据库收纳的所有英文单词进行对比及搜寻程序,因此使得查询过程比较复杂而致使其查询速度较慢。However, a shortcoming in the practical application of the above-mentioned word query method is that all letters in the input English word need to be compared and searched step by step for all the English words stored in the dictionary database in the order of the English alphabet, so that the query process is more complicated. Complexity makes its query speed slow.

发明内容Contents of the invention

为克服上述现有技术的缺点,本发明的主要目的在于提供一种关键词分段索引式资料查询方法及系统,其可提高电子英汉辞典的单词查询效率,令使用者更快速地查询到英文单词的中文释义及用法的相关资料。In order to overcome the shortcomings of the above-mentioned prior art, the main purpose of the present invention is to provide a keyword segment index type data query method and system, which can improve the word query efficiency of the electronic English-Chinese dictionary and enable users to query English more quickly. The Chinese interpretation and usage of the word.

本发明的关键词分段索引式资料查询方法及系统是设计来应用在搭载至计算机平台,例如桌上型个人计算机、笔记本型计算机、平板型计算机(Tablet PC)、个人数字助理装置(Personal Digital Assistant,PDA)、电子辞典装置等,用于对该计算机平台提供关键词分段索引式的资料查询功能,可让使用者输入特定的拼音式语言的单词(例如英文单词)作为查询用的关键词,并根据使用者所输入的关键词的前缀/字尾和字干以分段方式从数据库(例如英汉辞典数据库)中索引出该关键词所对应的数据项(例如英文单词的中文释义及用法资料)。The keyword segmentation index type data query method and system of the present invention are designed to be applied to computer platforms, such as desktop personal computers, notebook computers, tablet computers (Tablet PCs), personal digital assistants (Personal Digital Assistants) Assistant, PDA), electronic dictionary device, etc., which are used to provide the computer platform with a keyword segmentation index type data query function, allowing users to input specific pinyin-style language words (such as English words) as the key for query words, and according to the prefix/suffix and stem of the keyword input by the user, the data item corresponding to the keyword (such as the Chinese definition of the English word and the usage information).

本发明的关键词分段索引式数据查询方法至少包括:首先,建置数据库,其中储存有多个数据项,且其中各个数据项的查询用关键词分别对应至特定拼音式语言的单词集里的各个单词;其次,建置前缀与字尾列表模块,其中预存有该特定拼音式语言的单词集里所有单词的前缀与字尾的总集的列表;接着,建置字干列表模块,其中预存有一组前缀除去型字干列表和一组字尾除去型字干列表;其中每一个前缀除去型字干列表对应至该前缀与字尾列表模块中的一个特定的前缀,且用于预存该特定的拼音式语言的单词集里具有该特定的前缀的单词组在除去该前缀后所余留的字干的总集;每一个字尾除去型字干列表则对应至该前缀与字尾列表模块中的一个特定的字尾,且用于预存该特定的拼音式语言的单词集里具有该特定的字尾的单词组在除去该字尾后所余留的字干的总集;且该前缀除去型字干列表和该字尾除去型字干列表中的各个字干是预先设定为以一对一方式分别对应至该数据库中的各个数据项。The keyword segmentation index type data query method of the present invention at least includes: first, building a database, wherein a plurality of data items are stored, and wherein the query keywords of each data item are respectively corresponding to the word set of a specific pinyin type language each word; secondly, build the prefix and suffix list module, which pre-stores the list of the prefix and the suffix of all words in the word set of the specific phonetic language; then, build the stem list module, where A set of prefix-removed stem lists and a set of suffix-removed stem lists are pre-stored; each prefix-removed stem list corresponds to a specific prefix in the prefix and suffix list module, and is used to prestore the The total set of stems remaining after removal of the prefix from the word set of a particular phonetic language in the word set of a particular phonetic language; each list of suffix-removed stems corresponds to the list of prefixes and suffixes A specific suffix in the module, and is used to pre-store the total set of word stems remaining after removing the suffix of the word group with the specific suffix in the word set of the specific phonetic language; and the Each stem in the prefix-removed stem list and the suffix-removed stem list is preset to correspond to each data item in the database in a one-to-one manner.

在实际应用上,首先,输入使用者所要查询的数据项所对应的关键词;其次,将该前缀与字尾列表模块中的各个前缀与字尾与该关键词的前缀和字尾进行对比;若有相符的前缀或字尾,则发出字干对比激活信息;接着,响应该字干对比激活信息,将该关键词在除去前缀或字尾后所余留的字干与该字干列表模块中的各个字干进行对比;若有相符的字干,则发出数据索取激活信息;最后,响应该数据索取激活信息,从该数据库中索取出该相符的字干所对应的数据项。In practical application, firstly, input the keyword corresponding to the data item to be queried by the user; secondly, compare each prefix and suffix in the prefix and suffix list module with the prefix and suffix of the keyword; If there is a matching prefix or suffix, then send a word stem comparison activation message; then, in response to the word stem comparison activation message, the keyword remains after removing the prefix or suffix with the word stem list module compare each stem in the database; if there is a matching stem, send out data request activation information; finally, in response to the data request activation information, retrieve the data item corresponding to the matching stem from the database.

本发明的关键词分段索引式数据查询系统至少包括:数据库,其中储存有多个数据项,且其中各个数据项的查询用关键词分别对应至特定拼音式语言的单词集里的各个单词;前缀与字尾列表模块,其中预存有该特定的拼音式语言的单词集里所有单词的前缀与字尾的总集的列表;字干列表模块,其中预存有一组前缀除去型字干列表和一组字尾除去型字干列表;其中每一个前缀除去型字干列表对应至该前缀与字尾列表模块中的一个特定的前缀,且用于预存该特定的拼音式语言的单词集里具有该特定的前缀的单词组在除去该前缀后所余留的字干的总集;每一个字尾除去型字干列表则对应至该前缀与字尾列表模块中的一个特定的字尾,且用于预存该特定的拼音式语言的单词集里具有该特定的字尾的单词组在除去该字尾后所余留的字干的总集;且该前缀除去型字干列表和该字尾除去型字干列表中的各个字干预先设定为以一对一方式分别对应至该数据库中的各个数据项;关键词输入模块,是使用者操控的输入模块,输入使用者所查询的数据项所对应的关键词;前缀/字尾对比模块,将该前缀与字尾列表模块中的各个前缀与字尾,与该关键词输入模块输入的关键词的前缀和字尾进行对比;若有相符的前缀或字尾,则发出字干对比激活信息;字干对比模块,响应该前缀/字尾对比模块发出的字干对比激活信息,将该关键词输入模块输入的关键词在除去前缀或字尾后所余留的字干,与该字干列表模块中的各个字干进行对比;若有相符的字干,则发出数据索取激活信息;以及数据索取模块,响应该字干对比模块产生的数据索取激活信息,从该数据库中索取出该相符的字干所对应的数据项。The keyword segmentation index type data query system of the present invention at least includes: a database, wherein a plurality of data items are stored, and wherein the query keywords of each data item are respectively corresponding to each word in the word set of a specific phonetic language; Prefix and suffix list module, wherein pre-stored with the list of the prefix of all words in the word collection of this specific pinyin formula language and the total set of suffix; Word stem list module, wherein pre-stored with a group of prefix removing type word stem list and a Group suffix removal type stem list; wherein each prefix removal type stem list corresponds to a specific prefix in the prefix and suffix list module, and is used to pre-store the specific phonetic language in the word set with the The total set of stems remaining after the prefix is removed from the word group of a specific prefix; each suffix-removed stem list corresponds to a specific suffix in the prefix and suffix list module, and is used a total set of stems remaining after removal of the suffix from word groups having the particular suffix in the pre-stored word set of the particular phonetic language; and the prefix-removed stem list and the suffix-removed Each character in the type stem list is pre-set to correspond to each data item in the database in a one-to-one manner; the keyword input module is an input module controlled by the user, and inputs the data item queried by the user Corresponding keywords; prefix/suffix comparison module, compare each prefix and suffix in the prefix and suffix list module with the prefix and suffix of the keyword imported by the keyword input module; if there is a match prefix or suffix, then send the stem comparison activation information; the stem comparison module responds to the stem comparison activation information sent by the prefix/suffix comparison module, and removes the prefix or suffix from the keyword entered by the keyword input module. The remaining stems after the tail are compared with each stem in the stem list module; if there is a matching stem, the data request activation information is sent; and the data request module responds to the stem comparison module. The data request activates information, and the data item corresponding to the matching word stem is retrieved from the database.

其中,上述字干列表模块中的每一个前缀除去型字干列表对应至该前缀与字尾列表模块中一个特定的前缀,且用于预存该特定的拼音式语言的单词集里具有该特定的前缀的单词组在除去该前缀后所余留的字干的总集;每一个字尾除去型字干列表则对应至该前缀与字尾列表模块中一个特定的字尾,且用于预存该特定的拼音式语言的单词集里具有该特定的字尾的单词组在除去该字尾后所余留的字干的总集;且该前缀除去型字干列表和该字尾除去型字干列表中的各个字干预先设定为以一对一方式分别对应至该数据库中的各个数据项。Wherein, each prefix-removed stem list in the above-mentioned stem list module corresponds to a specific prefix in the prefix and suffix list module, and the word set used to pre-store the specific phonetic language has the specific prefix The total set of stems remaining after the prefix is removed from the word group of the prefix; each suffix-removed stem list corresponds to a specific suffix in the prefix and suffix list module, and is used to pre-store the the total set of stems remaining after removal of the suffix from the group of words having the particular suffix in the set of words of a particular phonetic language; and the list of prefix-removed stems and the suffix-removed stem Each character in the list is preset to correspond to each data item in the database in a one-to-one manner.

本发明的关键词分段索引式数据查询方法及系统的优点在于可减少字符串对比次数,因此可提高查询速度而让使用者更快速地查询到所需的资料。The advantage of the keyword segmentation index type data query method and system of the present invention is that it can reduce the number of character string comparisons, so it can increase the query speed and allow users to query the required information more quickly.

附图说明Description of drawings

图1是系统结构示意图,显示本发明的关键词分段索引式资料查询系统的应用结构及其对象导向组件模型的基本结构;Fig. 1 is a schematic diagram of the system structure, showing the application structure and the basic structure of the object-oriented component model of the keyword segmentation index type data query system of the present invention;

图2是数据结构示意图,显示本发明的关键词分段索引式数据查询系统采用的数据库、前缀与字尾列表模块和字干列表模块的数据结构及其之间的关联性。Fig. 2 is a schematic diagram of the data structure, showing the data structure of the database, prefix and suffix list module, and stem list module used in the keyword segmentation index data query system of the present invention and the correlation between them.

具体实施方式Detailed ways

实施例Example

以下即配合附图,详细说明本发明的关键词分段索引式资料查询方法及系统的实施例。Embodiments of the keyword segmentation index type data query method and system of the present invention will be described in detail below with reference to the accompanying drawings.

图1是本发明的关键词分段索引式数据查询系统(如标号20指的虚线框包括的部分)的应用结构及其对象导向组件模型(object-orientedcomponent model)的基本结构。如图所示,本发明的关键词分段索引式资料查询系统20实际应用上是搭载至计算机平台10,例如桌上型个人计算机、笔记本型计算机、平板型计算机(Tablet PC)、个人数字助理装置(Personal Digital Assistant,PDA)、电子辞典装置等,对该计算机平台10提供关键词分段索引式的资料查询功能,例如英文单词查询功能,让使用者输入特定拼音式语言的单词(例如英文单词)作为查询用的关键词,并可根据使用者输入的关键词的前缀/字尾和字干,以分段方式从数据库(例如英汉辞典数据库)中索引出该关键词所对应的数据项(例如英文单词的中文释义及用法资料)。Fig. 1 is the application structure and the basic structure of the object-oriented component model (object-oriented component model) of the keyword segmentation index type data query system of the present invention (such as the part included in the dotted box indicated by the label 20). As shown in the figure, the keyword segmentation index type data query system 20 of the present invention is actually applied to a computer platform 10, such as a desktop personal computer, a notebook computer, a tablet computer (Tablet PC), a personal digital assistant Device (Personal Digital Assistant, PDA), electronic dictionary device etc., provide the data query function of keyword subsection index type to this computer platform 10, such as English word query function, allow the user to input the word of specific pinyin formula language (such as English word) as the keyword for query, and according to the prefix/suffix and stem of the keyword input by the user, the data item corresponding to the keyword can be indexed from the database (such as the English-Chinese dictionary database) in a segmented manner (e.g. Chinese definitions and usage information of English words).

例如,在电子英汉辞典的应用上,当使用者使用该计算机平台10查询英文单词[misadvice]时,只要利用计算机平台10的键盘11输入该英文单词[misadvice]的字符串,即可令本发明的关键词分段索引式资料查询系统20,依据使用者输入的英文单词[misadvice]的前缀[mis-]和字干[advice],以二段方式从该电子英汉辞典中索引出该英文单词[misadviee]的中文释义及用法资料,并将这些资料显示在屏幕12上。同样地,若使用者要查询英文单词[childish],则只要输入该英文单词[childish]的字符串,即可令本发明的关键词分段索引式资料查询系统20,依据输入的英文单词[childish]的字尾[-ish]和字干[child],以二段方式从该电子英汉辞典中索引出该英文单词[childish]的中文释义及用法资料,并将这些资料显示在屏幕12上。For example, in the application of the electronic English-Chinese dictionary, when the user uses the computer platform 10 to query the English word [misadvice], as long as the keyboard 11 of the computer platform 10 is used to input the character string of the English word [misadvice], the present invention can be activated. The keyword segment indexing data query system 20, according to the prefix [mis-] and stem [advice] of the English word [misadvice] input by the user, indexes the English word from the electronic English-Chinese dictionary in two paragraphs [misadviee] Chinese interpretation and usage data, and display these data on the screen 12. Similarly, if the user wants to query the English word [childish], then as long as the character string of the English word [childish] is input, the keyword segmentation index type data query system 20 of the present invention can be used according to the input English word [ The suffix [-ish] and the stem [child] of childish] index the Chinese interpretation and usage data of the English word [childish] from the electronic English-Chinese dictionary in two paragraphs, and display these materials on the screen 12 .

具体实施上,本发明的关键词分段索引式资料查询系统20可完全以软件程序来实现,并将其程序代码安装至该计算机平台10。In specific implementation, the keyword segment index type data query system 20 of the present invention can be completely realized by a software program, and its program code is installed on the computer platform 10 .

如图1所示,本发明的关键词分段索引式资料查询系统20的对象导向组件模型(object-oriented component model)的基本结构至少包括:(a)数据库100;(b)前缀与字尾列表模块110;(c)字干列表模块120;(d)关键词输入模块210;(e)前缀/字尾对比模块220;(f)字干对比模块230;以及(g)数据索取模块240。As shown in Figure 1, the basic structure of the object-oriented component model (object-oriented component model) of the keyword subsection index type data query system 20 of the present invention at least includes: (a) database 100; (b) prefix and suffix List module 110; (c) stem list module 120; (d) keyword input module 210; (e) prefix/suffix comparison module 220; (f) stem comparison module 230; and (g) data retrieval module 240 .

数据库100例如是英汉辞典数据库,其中储存有多个数据项(例如英文单词的中文释义及用法资料),且其中各个数据项的查询用关键词分别对应至特定的拼音式语言单词集里的各个单词(例如英文单词)。The database 100 is, for example, an English-Chinese dictionary database, in which a plurality of data items (such as Chinese interpretation and usage data of English words) are stored, and wherein the query keywords of each data item correspond to each of the specific pinyin-style language word sets respectively. words (such as English words).

前缀与字尾列表模块110用于预存该特定的拼音式语言(例如英文)单词集里所有单词的特定前缀与特定字尾的总集的列表。如图2所示,在电子英汉辞典的应用上,此前缀与字尾列表模块110中所储存的前缀与字尾例如包括[ab-]、[annu-]、[anti-]、[deca-]、[-er]、[-ish]、[mis-]等。The prefix and suffix list module 110 is used to pre-store a list of specific prefixes and specific suffixes of all words in the specific phonetic language (eg, English) word set. As shown in Figure 2, in the application of the electronic English-Chinese dictionary, the prefixes and suffixes stored in the prefix and suffix list module 110 include, for example, [ab-], [annu-], [anti-], [deca-] ], [-er], [-ish], [mis-], etc.

字干列表模块120用于预存一组前缀除去型字干列表121和一组字尾除去型字干列表122;其中每一个前缀除去型字干列表121对应至上述前缀与字尾列表模块110中特定的前缀,且用于预存该特定的拼音式语言的单词集里同样具有该特定前缀的单词组,在除去该前缀后所余留字干的总集;每一个字尾除去型字干列表122则对应至该前缀与字尾列表模块110中特定的字尾,且用于预存该特定拼音式语言的单词集里同样具有该特定字尾的单词组,在除去该字尾后所余留的字干的总集。此外,该前缀除去型字干列表121和该字尾除去型字干列表122中的各个字干预先设定为以一对一方式,分别对应至上述数据库100中所储存的各个数据项。例如,如图2所示,在电子英汉辞典的应用上,该字干列表模块120中对应至前缀[mis-]的前缀除去型列表121中所储存的字干例如包括[advice]、[ally]和[take],即分别对应至英文单词[misadvice]、[misally]和[mistake];该字干列表模块120中对应至字尾[-ish]的字尾除去型字干列表122中储存的字干例如包括[child]、[Dan]和[fool],其分别对应至英文单词[childish]、[Danish]和[foolish]。The stem list module 120 is used to prestore a group of prefix-removed stem lists 121 and a group of suffix-removed stem lists 122; wherein each prefix-removed stem list 121 corresponds to the above-mentioned prefix and suffix list module 110 A specific prefix, and is used to pre-store the word group that also has the specific prefix in the word set of the specific pinyin language, and the total set of stems left after removing the prefix; each suffix removal type stem list 122 then corresponds to the prefix and the specific suffix in the suffix list module 110, and is used to pre-store the word group that also has the specific suffix in the word set of the specific phonetic formula language, and remains after removing the suffix The total set of stems for . In addition, each character in the prefix-removed stem list 121 and the suffix-removed stem list 122 is preset to correspond to each data item stored in the database 100 in a one-to-one manner. For example, as shown in FIG. 2 , in the application of the electronic English-Chinese dictionary, the stems stored in the prefix removal type list 121 corresponding to the prefix [mis-] in the stem list module 120 include, for example, [advice], [ally ] and [take], that is, corresponding to English words [misadvice], [misally] and [mistake] respectively; in this word stem list module 120, store in the suffix removal type word stem list 122 corresponding to suffix [-ish] The stems for include, for example, [child], [Dan], and [fool], which correspond to the English words [childish], [Danish], and [foolish], respectively.

关键词输入模块210是使用者操控的输入模块,用于接收使用者通过键盘11输入的特定拼音式语言中的一个单词(例如英文单词)的字符串,并将使用者输入的字符串作为查询用的关键词。Keyword input module 210 is the input module of user manipulation, is used for receiving the character string of a word (for example English word) in the specific pinyin type language that user inputs through keyboard 11, and the character string that user inputs is used as query keywords used.

前缀/字尾对比模块220可将上述前缀与字尾列表模块110中的各个前缀与字尾,与该关键词输入模块110输入的关键词的前缀和字尾进行对比,借以检查该关键词的前缀或字尾是否是该前缀与字尾列表模块110中的任何一个前缀或字尾;若有相符的前缀或字尾,则发出字干对比激活信息至该字干对比模块230。The prefix/suffix comparison module 220 can compare each prefix and suffix in the above-mentioned prefix and suffix list module 110 with the prefix and suffix of the keyword input by the keyword input module 110, so as to check the identity of the keyword. Whether the prefix or suffix is any prefix or suffix in the prefix and suffix list module 110; if there is a matching prefix or suffix, then send a stem comparison activation message to the stem comparison module 230.

字干对比模块230可响应上述前缀/字尾对比模块220发出的字干对比激活信息,将该关键词输入模块210输入的关键词在除去前缀或字尾后所余留的字干部分,与该字干列表模块120中对应的前缀除去型字干列表121或字尾除去型字干列表122中的各个字干进行对比(也就是若为前缀相符,则将该关键词的前缀除去,并将所余留的字干部分与对应的前缀除去型字干列表121中的各个字干进行对比;反之若为字尾相符,则将该关键词的字尾除去,并将所余留的字干部分与对应的字尾除去型字干列表122中的各个字干进行对比)。若有相符的字干,则即发出数据索取激活信息至该数据索取模块240。The stem comparison module 230 can respond to the stem comparison activation information sent by the above-mentioned prefix/suffix comparison module 220, and the remaining stem part of the keyword input by the keyword input module 210 after removing the prefix or the suffix, and Each stem in the prefix removal type stem list 121 or the suffix removal type stem list 122 in the stem list module 120 is compared (that is, if the prefix matches, then the prefix of the keyword is removed, and The remaining stem part is compared with each stem in the corresponding prefix removal type stem list 121; The stem portion is compared with each stem in the corresponding suffix-removed stem list 122). If there is a matching stem, a data request activation message is sent to the data request module 240 .

数据索取模块240可响应上述字干对比模块230产生的数据索取激活信息,从该数据库100中索取出该相符的字干所对应的数据项。The data requesting module 240 may respond to the data requesting activation information generated by the above-mentioned stem comparison module 230 , and retrieve the data item corresponding to the matching stem from the database 100 .

请同时参阅图1和图2,以关键词是英文单词[misadvice]及[childish]为例,分别说明本发明的关键词分段索引式资料查询方法在实际应用过程中的步骤。在电子英汉辞典时的工作方式。Please refer to Fig. 1 and Fig. 2 at the same time, taking the English words [misadvice] and [childish] as examples to illustrate the steps in the actual application process of the keyword segmentation index type data query method of the present invention. How it works in electronic English-Chinese dictionaries.

当使用者要查询英文单词[misadvice]的中文释义时,则首先须通过键盘11输入该英文单词[misadvice]的字符串,令本发明的关键词分段索引式数据查询系统20中的关键词输入模块210,将此输入的英文单词[misadvice]作为查询用的关键词,并接着令前缀/字尾对比模块220将前缀与字尾列表模块110中的各个前缀与字尾,与该关键词[misadvice]的前缀部分和字尾部分进行对比,借以检查该关键词[misadvice]的前缀部分或字尾部分,是否与该前缀与字尾列表模块110中的任何一个前缀或字尾相符。由于该前缀与字尾列表模块110中有一前缀[mis-]相符至该关键词[misadvice]的前缀,因此前缀/字尾对比模块220即会发出字干对比激活信息至字干对比模块230,令该字干对比模块230响应地将该关键词[misadvice]在除去前缀[mis-]后所余留的字干部分[advice],与该字干列表模块120中对应的前缀除去型字干列表121中的各个字干进行对比。由于该前缀除去型字干列表121中有字干[advice]与该关键词[misadvice]除去前缀[mis-]后所余留的字干部分[advice]相符,因此其即会发出数据索取激活信息至该数据索取模块240,令数据索取模块240响应地从该数据库100中索取出该相符的字干[advice]所对应的数据项(即英文单词[misadvice]的中文释义及用法资料),并将这些资料显示在屏幕12上。When the user will inquire about the Chinese definition of the English word [misadvice], then at first the character string of this English word [misadvice] must be input by the keyboard 11, so that the keywords in the keyword segmentation index type data query system 20 of the present invention Input module 210, the English word [misadvice] of this input is used as the keyword of inquiry, and then make prefix/suffix comparison module 220 with each prefix and suffix in prefix and suffix list module 110, and this keyword The prefix part of [misadvice] is compared with the suffix part, so as to check whether the prefix part or suffix part of the keyword [misadvice] matches any prefix or suffix in the prefix and suffix list module 110. Because the prefix and the suffix list module 110 have a prefix [mis-] that matches the prefix of the keyword [misadvice], the prefix/suffix comparison module 220 will send a stem comparison activation message to the stem comparison module 230, Make the stem comparison module 230 respond to the keyword [misadvice] after removing the prefix [mis-], the remaining stem part [advice], and the corresponding prefix removal type stem in the stem list module 120 Each stem in the list 121 is compared. Since the stem [advice] in the prefix-removed stem list 121 matches the stem part [advice] left after the keyword [misadvice] removes the prefix [mis-], it will issue a data request activation Information to the data requesting module 240, so that the data requesting module 240 requests from the database 100 the corresponding data item (i.e. the Chinese interpretation and usage information of the English word [misadvice]) from the database 100, And display these data on the screen 12.

同样地,当使用者要查询英文单词[childish]的中文释义时,则首先须通过键盘11输入该英文单词[childish]的字符串,令本发明的关键词分段索引式数据查询系统20中的关键词输入模块210,将此输入的英文单词[childish]作为查询用的关键词,并接着令前缀/字尾对比模块220将前缀与字尾列表模块110中的各个前缀与字尾,与该关键词[childish]的前缀部分和字尾部分进行对比,借以检查该关键词[childish]的前缀或字尾,是否与该前缀与字尾列表模块110中的任何一个前缀或字尾相符。由于该前缀与字尾列表模块110中有字尾[-ish]与该关键词[childish]的字尾相符,因此其即会发出字干对比激活信息至字干对比模块230,令字干对比模块230响应地将该关键词[childish]除去字尾[-ish]后所余留的字干部分[child],与该字干列表模块120中对应的字尾除去型字干列表122中的各个字干进行对比。由于该字尾除去型字干列表122中有字干[child]与该关键词[childish]除去前缀[-ish]后所余留的字干部分[child]相符,因此前缀/字尾对比模块220即会发出数据索取激活信息至该数据索取模块240,令数据索取模块240响应地从该数据库100中索取出该相符的字干[child]所对应的数据项(即英文单词[childish]的中文释义及用法资料),并将这些资料显示在屏幕12上。Equally, when the user will inquire about the Chinese interpretation of the English word [childish], then at first the character string of this English word [childish] must be input by the keyboard 11, so that in the keyword segmentation index type data query system 20 of the present invention The keyword input module 210 of this input English word [childish] is used as the keyword of query, and then makes the prefix/suffix comparison module 220 with each prefix and suffix in the prefix and suffix list module 110, and The prefix and suffix of the keyword [childish] are compared to check whether the prefix or suffix of the keyword [childish] matches any prefix or suffix in the prefix and suffix list module 110 . Because there is suffix [-ish] in this prefix and the suffix list module 110 and the suffix of this keyword [childish] matches, so it will promptly send word stem comparison activation information to word stem comparison module 230, make word stem contrast Module 230 responsively this keyword [childish] removes the word stem part [child] that remains behind the suffix [-ish], and in this word stem list module 120, removes the stem part [child] in the type word stem list 122 of corresponding word end. Compare the individual stems. Because there is stem [child] in this suffix removal type word stem list 122 and this keyword [childish] removes the stem part [child] that remains after prefix [-ish] matches, so prefix/suffix comparison module 220 will send data requesting activation information to the data requesting module 240, so that the data requesting module 240 will request from the database 100 the corresponding data item (that is, the English word [childish]) corresponding to the word stem [child]. Chinese interpretation and usage data), and these data are displayed on the screen 12.

总而言之,本发明提供了一种新颖的关键词分段索引式资料查询方法及系统,可应用在计算机平台,其特点在于,可提供关键词分段索引式的资料查询功能,让使用者输入特定的拼音式语言的单词,作为查询用的关键词,并可根据使用者输入的关键词的前缀/字尾和字干,以分段方式从数据库中索引出该关键词所对应的数据项。该分段索引式做法的优点在于,可减少字符串对比次数,因此可提高查询速度。本发明因此比现有技术具有更佳的进步性及实用性。In a word, the present invention provides a novel keyword segment index type data query method and system, which can be applied to a computer platform. The words in the pinyin language are used as keywords for query, and according to the prefix/suffix and stem of the keyword input by the user, the data item corresponding to the keyword can be indexed from the database in a segmented manner. The advantage of this segmented index method is that the number of character string comparisons can be reduced, so the query speed can be improved. Therefore, the present invention has better progress and practicality than the prior art.

Claims (12)

1. a keyword segmented indexing type data query method is applied in computer platform, provides the keyword segmented indexing type data query function to this computer platform, it is characterized in that, this keyword segmented indexing type data enquire method comprises at least:
Build and put database, wherein store a plurality of data item, and wherein the inquiry of each data item corresponds to each word in the set of words of particular pinyin formula language respectively with keyword;
Build and put prefix and suffix list block, wherein prestore the tabulation of the general collection of the prefix of all words in the set of words of this particular pinyin formula language and suffix;
Build and put word and do list block, wherein prestore one group of prefix and remove type-word and do tabulation and one group of suffix and remove type-word and do and tabulate; Wherein each prefix is removed type-word and is done tabulation and correspond to a specific prefix in this prefix and the suffix list block, and the group of words that has this specific prefix in the set of words of this specific phoneticizing type language that is used to prestore after removing this prefix the remaining dried general collection of word; Each suffix is removed type-word and is done tabulation and then correspond to a specific suffix in this prefix and the suffix list block, and the group of words that has this specific suffix in the set of words of this specific phoneticizing type language that is used to prestore after removing this suffix the remaining dried general collection of word; And this prefix is removed type-word and is done tabulation and this suffix and remove type-word to do each word in the tabulation dried be to be redefined in mode one to one to correspond to each data item in this database respectively;
In practical application,
The pairing keyword of data item that the input user will inquire about;
The prefix and the suffix of each prefix in this prefix and the suffix list block and suffix and this keyword are compared; If prefix or the suffix that conforms to arranged, then send word and do the contrast active information;
Respond this word and do the contrast active information, with this keyword after removing prefix or suffix remaining word do with this word and do dried the comparing of each word in the list block; If have the word that conforms to do, then send the data acquisition active information; And
Respond this data acquisition active information, from this database, ask for out this word that conforms to and do pairing data item.
2. keyword segmented indexing type data query method as claimed in claim 1 is characterized in that this computer platform is a desktop PC.
3. keyword segmented indexing type data query method as claimed in claim 1 is characterized in that this computer platform is a notebook computer.
4. keyword segmented indexing type data query method as claimed in claim 1 is characterized in that this computer platform is a Tablet PC.
5. keyword segmented indexing type data query method as claimed in claim 1 is characterized in that this computer platform is a personal digital assistant device.
6. keyword segmented indexing type data query method as claimed in claim 1 is characterized in that this computer platform is an electronic dictionary device.
7. a keyword segmented indexing type data inquiry system carries to computer platform, provides the keyword segmented indexing type data query function to this computer platform, it is characterized in that, this keyword segmented indexing type data query system comprises at least:
Database wherein stores a plurality of data item, and wherein the inquiry of each data item corresponds to each word in the set of words of particular pinyin formula language respectively with keyword;
Prefix and suffix list block wherein prestore the tabulation of the general collection of the prefix of all words in the set of words of this specific phoneticizing type language and suffix;
List block done in word, wherein prestores one group of prefix and remove type-word and do tabulation and one group of suffix and remove type-word and do and tabulate; Wherein each prefix is removed type-word and is done tabulation and correspond to a specific prefix in this prefix and the suffix list block, and the group of words that has this specific prefix in the set of words of this specific phoneticizing type language that is used to prestore after removing this prefix the remaining dried general collection of word; Each suffix is removed type-word and is done tabulation and then correspond to a specific suffix in this prefix and the suffix list block, and the group of words that has this specific suffix in the set of words of this specific phoneticizing type language that is used to prestore after removing this suffix the remaining dried general collection of word; And this prefix is removed type-word and is done tabulation and this suffix and remove type-word and do each word intervention in the tabulation and be set at earlier in mode one to one and correspond to each data item in this database respectively;
The keyword load module, the load module that the person of being to use controls, the input pairing keyword of data item that the user inquired about;
Prefix/suffix contrast module, with each prefix and the suffix in this prefix and the suffix list block, the prefix and the suffix of the keyword of importing with this keyword load module compare; If prefix or the suffix that conforms to arranged, then send word and do the contrast active information;
The contrast module done in word, respond the word that this prefix/suffix contrast module is sent and do the contrast active information, with the keyword of this keyword load module input after removing prefix or suffix remaining word do, do dried the comparing of each word in the list block with this word; If have the word that conforms to do, then send the data acquisition active information; And
The data acquisition module responds this word and does the data acquisition active information that the contrast module produces, and asks for out this word that conforms to and do pairing data item from this database.
8. keyword segmented indexing type data inquiry system as claimed in claim 7 is characterized in that this computer platform is a desktop PC.
9. keyword segmented indexing type data inquiry system as claimed in claim 7 is characterized in that this computer platform is a notebook computer.
10. keyword segmented indexing type data inquiry system as claimed in claim 7 is characterized in that this computer platform is a Tablet PC.
11. keyword segmented indexing type data inquiry system as claimed in claim 7 is characterized in that this computer platform is a personal digital assistant device.
12. keyword segmented indexing type data inquiry system as claimed in claim 7 is characterized in that this computer platform is an electronic dictionary device.
CN 200410087137 2004-11-01 2004-11-01 Method and system for querying data based on keyword segmentation index Pending CN1770147A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 200410087137 CN1770147A (en) 2004-11-01 2004-11-01 Method and system for querying data based on keyword segmentation index

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 200410087137 CN1770147A (en) 2004-11-01 2004-11-01 Method and system for querying data based on keyword segmentation index

Publications (1)

Publication Number Publication Date
CN1770147A true CN1770147A (en) 2006-05-10

Family

ID=36751452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 200410087137 Pending CN1770147A (en) 2004-11-01 2004-11-01 Method and system for querying data based on keyword segmentation index

Country Status (1)

Country Link
CN (1) CN1770147A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010000208A1 (en) * 2008-07-03 2010-01-07 Google Inc. Resource locator suggestions from input character sequence
CN101986308A (en) * 2010-11-16 2011-03-16 传神联合(北京)信息技术有限公司 Quick term marking method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2010000208A1 (en) * 2008-07-03 2010-01-07 Google Inc. Resource locator suggestions from input character sequence
US8745051B2 (en) 2008-07-03 2014-06-03 Google Inc. Resource locator suggestions from input character sequence
CN101986308A (en) * 2010-11-16 2011-03-16 传神联合(北京)信息技术有限公司 Quick term marking method
CN101986308B (en) * 2010-11-16 2013-07-31 传神联合(北京)信息技术有限公司 Quick term marking method

Similar Documents

Publication Publication Date Title
Krallinger et al. Information retrieval and text mining technologies for chemistry
US11593439B1 (en) Identifying similar documents in a file repository using unique document signatures
US8812300B2 (en) Identifying related names
US7523102B2 (en) Content search in complex language, such as Japanese
CN104933181A (en) Mathematical formula searching method and device
CN1871607A (en) Identifying related names
JP2022054389A (en) Method and apparatus for training retrieval model, device, computer storage medium, and computer program
US8583415B2 (en) Phonetic search using normalized string
WO2022134355A1 (en) Keyword prompt-based search method and apparatus, and electronic device and storage medium
TWI269193B (en) Keyword sector-index data-searching method and it system
CN1691006A (en) Method and system for inquiring word explanation of literal information
CN1770147A (en) Method and system for querying data based on keyword segmentation index
CN104106064A (en) Video search
CN1845134B (en) Anti-reprinting or/and anti-plagiarism monitoring method based on computer network
US7130470B1 (en) System and method of context-based sorting of character strings for use in data base applications
TW200947241A (en) Database indexing algorithm and method and system for database searching using the same
CN101996202A (en) System and method for quickly searching data by keyword
CN1105985C (en) Device and method for Chinese input by hand writing and speech sound
CN101576897A (en) File content retrieval system and file content retrieval method
CN119226812B (en) A method, device and electronic device for calculating multi-paragraph text similarity
TW201312375A (en) Method and system for inputting Chinese character with unknown pronunciation using pinyin input method
CN1421804A (en) Lexical processing systems and methods for different languages
Liang et al. How to build a DNA search engine like Google?
CN1294364A (en) High-speed text search method
TWI249694B (en) System for expanding words and phrases of local electronic dictionary via inquiring on-line dictionary and method of same

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication