[go: up one dir, main page]

TW420778B - An information retrieval system realized by fuzzy neutral network model - Google Patents

An information retrieval system realized by fuzzy neutral network model Download PDF

Info

Publication number
TW420778B
TW420778B TW87107685A TW87107685A TW420778B TW 420778 B TW420778 B TW 420778B TW 87107685 A TW87107685 A TW 87107685A TW 87107685 A TW87107685 A TW 87107685A TW 420778 B TW420778 B TW 420778B
Authority
TW
Taiwan
Prior art keywords
information
query word
synonym
similarity
query
Prior art date
Application number
TW87107685A
Other languages
Chinese (zh)
Inventor
Yau-Huang Guo
Jen-Peng Shiu
Original Assignee
Inst Information Industry
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inst Information Industry filed Critical Inst Information Industry
Priority to TW87107685A priority Critical patent/TW420778B/en
Application granted granted Critical
Publication of TW420778B publication Critical patent/TW420778B/en

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

An intelligent information retrieval system queries the matching information component according to the query words inputted. The system includes a thesaurus procedure part which searches the index key words with same meaning by the query word; an information index procedure part which searches the matching information component according to the mentioned index key word; a component sequence filtering procedure part which sorts and filters the searched information component and outputs the information component available for selection; a personalized thesaurus modification part which modifies the thesaurus fuzzy mechanism of the thesaurus procedure part according to the selected information component; and a personalized filtering modification part which modifies the sequence filtering fuzzy mechanism of the component sequence filtering procedure part according to the selected information components. The thesaurus procedure part and the information index procedure part mentioned above is consolidated by quasi-neural network model to achieve parallel fast processing and automatic learning. Besides, the thesaurus procedure part provides the query word coding and position shift compensation for input fault tolerance capability.

Description

Λ7 B7 423^78 五、發明説明(1 ) I I - -. H *1 - 1 ·'ΙΊ I In I In I 1 — (請先閱讀背面之注意事項再填寫本頁) 本發明係有關於一種資訊擷取系統(information retrieval system),特別是利用模糊類神經模式 _ral network model)來實現的資訊擷取系統,能夠建立 個人化的同義字字典(thesaurus)、並且能夠提供操作者的 輸入容錯能力及進行模糊查詢,同時能夠依據操作者的 擷取喜好,對系統所擷取出的資訊排出優先順序,提供 快速選取的功能。 經濟部中央標準局員工消費合作社印製 目前由於網際網路以及資訊的發達,資訊的取得已經 變得非常多樣化。對於一個操作者而言,如何能夠有效 地在眾多資訊中找出自已有興趣的部分,是相當重要的 問題。舉例來說,對於利用物件導向(〇bject_〇riented)技 術進行軟體開發的軟體程式設計師而言,當收到某個大 型軟體開發的需求時,最希望在經過需求分析 (requirementanalysis)的程序’取得正式規格後,能夠直 接操取到適合的、已開發的軟體元件,以便快速地組合 成軟體的離型(prototype) ’進行測試。至於其他像是公共 圖書館查詢系統、WWW(World Wide Web)搜尋工具,以 及其他的資訊服務(例如大量的新聞、法律文件的管理和 查詢)等等情況’也有相同的要求,操作者最希望的是如 何能夠有效地擷取所需要的資訊,以配合本身工作的需 求。 但是當資訊的數量快速增加,或是軟體元件資料庫 (software component丨ibraries)所包含的軟體元件數量與 曰俱增時’整個資訊和軟體元件的操取過程即充滿了不 本紙張尺度it财關家縣(⑽)Α4· ( 210χ297公幻 420*78 Λ7 ~—— --~__ 五、發明説明(2 ) '' -- 確定性。想要達到較佳的錄效果,往往取決於操作者 本身的使驗㉟φ即,操作者所輸人的查詢字㈣町 words)是否能夠破切地描述出所需資訊的特徵。有鑑於 此’資賴取系統即成為資訊管理及資訊再生技術上重 要的課題。 在-般傳統的資《取系統中,通常是以布林邏輯 (B〇olean logic)的架構而建立。亦即,利用操作者所輸入 的查詢字’與資訊或軟體元件所具有之特徵關鍵值 (keywo叫進行比對,以判斷是符合操作者的需求。然而 這種傳統資訊㈣系統,在使用的彈性上非常差,其缺 點分述如下: l .在典型資訊齡系統中,系統通常希望操作者在 查詢資料庫時所輸人的查詢字,能夠完全符合系統正式 提供的關鍵字(keyword)。這些正式的_字是由系統建 立時所設定的’用來描述待梅取之f訊文件或軟體元件 的特徵。但是在實務上’要求操作者能夠準確地使用系 統中所預先儲存的關鍵字,幾乎是不太可行的。 根據此一缺點,習知技術所提出的解決方案是對系統 的關鍵字建立一個同義字字典(thesaurus)。亦即,系統中 的每個關鍵字可以在同義字字典中定義出—組同義字, 只要操作者輸入此組同義字之一者’即可推論出其輸入 者為對應的_字。此-方切加了操作者可輸入查詢 的字彙’但是實際上並沒有解決問題。因為這類的同義 字字典-般大都是採用固定的方式,表列出關鍵字和同 5 本紙張尺度適用中國國家樣準(CNS ) Α4規格(210Χ 297公瘦) (請先聞讀背面之注意事項再填寫本頁) J--^-I -------II-------- . ! 經濟部中央標準局舅工消費合作社印聚 9 9Λ7 B7 423 ^ 78 V. Description of the invention (1) II--. H * 1-1 · 'ΙΊ I In I In I 1 — (Please read the notes on the back before filling this page) The present invention relates to a An information retrieval system, especially an information retrieval system implemented using a fuzzy neural model _ral network model, can build a personalized synonym dictionary (thesaurus), and can provide operator input fault tolerance Ability and fuzzy query, at the same time, according to the retrieval preferences of the operator, prioritize the information retrieved by the system, and provide the function of quick selection. Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs. Due to the development of the Internet and information, the acquisition of information has become very diverse. For an operator, how to effectively find out the parts of his own interest from a lot of information is a very important issue. For example, for software programmers who use object-oriented (〇bject_〇riented) technology for software development, when they receive a demand for a large-scale software development, they most want to go through a requirement analysis process. 'After obtaining the formal specifications, you can directly access the appropriate, developed software components to quickly combine them into a software prototype' for testing. As for other situations such as public library query systems, WWW (World Wide Web) search tools, and other information services (such as a large number of news, management and query of legal documents), there are also similar requirements, and operators most want It is how to effectively capture the information needed to meet the needs of their own work. However, when the amount of information increases rapidly, or the number of software components contained in the software component database (ibraries) increases, the entire process of handling information and software components is full of paper. Guanjia County (⑽) Α4 · (210χ297 公 幻 420 * 78 Λ7 ~ ——-~ __ V. Description of the Invention (2) ''-Certainty. To achieve a better recording effect, it often depends on the operation The user's own test ㉟, that is, whether the query word entered by the operator can accurately describe the characteristics of the required information. In view of this, the 'reliance acquisition system' has become an important subject in information management and information reproduction technology. In the traditional information acquisition system, it is usually established based on the structure of Boolean logic. That is, the query word entered by the operator is used to compare with the characteristic key value (keywo) of the information or software components to determine that it meets the needs of the operator. However, this traditional information system is in use. The flexibility is very poor, and its disadvantages are described as follows: l. In a typical information age system, the system usually hopes that the query entered by the operator when querying the database can fully match the keyword officially provided by the system. These formal _ words are set by the system when it is set up to describe the characteristics of the document or software component to be retrieved. However, in practice, it requires the operator to be able to use the keywords stored in the system accurately. According to this shortcoming, the solution proposed by the conventional technology is to build a synonym dictionary (thesaurus) for the keywords of the system. That is, each keyword in the system can be in the synonym A group of synonyms is defined in the dictionary. As long as the operator enters one of this group of synonyms, it can be inferred that the input is the corresponding _ word. This -square cut adds The operator can enter the vocabulary of the query 'but it does not actually solve the problem. Because this type of synonym dictionary is generally used in a fixed way, the table lists the keywords and the 5 paper standards are applicable to the Chinese national standard (CNS ) Α4 size (210 × 297 male thin) (Please read the precautions on the back before filling in this page) J-^-I ------- II --------.! Central Ministry of Economic Affairs Standards Bureau, Masonry Consumer Cooperatives, India 9 9

ΑΊ Β7 五、'發明説明(3 的關係’所以無法針對操作者的查性,來調 整同義字字典的關係。 〜相2 Β #統的f訊操取系統是假設操作者所輸入的查詢 7確❺%即習知系統中沒有具備容忍錯誤輸入 &人在進行輸人動作時,很難保證能釣完全 1錯誤地輸人’另外也有可能會輪人同義但是不同的’ ^性之字素。舉例來說’操作者可能在輸人—個査詢字 衣 2漏打了其中的第—辦母;或者是同義字字典所記 載者原為某-字細動詞,但是操作者卻輸人其名詞。 員似如此的狀況’都有可能會讓系統無法正確地找出對 應的關鍵字。 訂 根據此-缺點,習知技術一般是利用置換表㈣咖 二e)的方式來彌補。置換表中表列出各同義字字典中關 鍵子可能發生的錯誤形式’以及可能的詞性變化,藉此 來f正操作者輪人時可能造成的錯誤。然而,這種置換 表疋相畜固疋的,而且基本上是無法處理所有可能的錯 誤。 經濟部中央標準局員工消費合作杜印聚 3 ·習知的資訊擷取“無法針料確定資訊進行處 理’亦即無法提供模糊查詢的處理。換言之,操作者必 須要確切地描述出所要查詢資訊文件或是軟體元件之各 項特性,才能夠找到所要求的資料。換言之,操作者必 須對所要查詢的資訊文件或軟體元件有相當程度的了 解’否則錯誤的資訊往往使得查詢出來的結I,無 到原本的預期。 w 標準(⑽} 2"97公爱) A7 423778 五、發明説明(4 ) 4 ·最後必須一提的是’傳統資訊擷取系統根據某些 特定的關鍵字’通常是可以找到相當數量符合的文件或 是軟體元件。一般而言’較先進的資訊擷取系統通常是 可以根據所找到文件或是軟體元件與查詢字的關聯性, 將其排序(ranking)輸出。然而這種排序及過濾(fiitering) 的功能相當固定的’也就是無論是任何操作者都採用相 同的排序和過濾程序,如此在使用上就相當沒有彈性。 根據以上所述可知’現有的資訊擷取系統雖然基本上 是能夠達到查詢及棟取的目的,但是在使用上並不是非 常方便’明時很難找到適合的文件或軟體元件。此亦正 是本發明所欲解決的問題。 有鑑於此’本發明的主要目的’在於提出一種智慧型 的資訊擷取系統,其具有針對操作者個人喜好而設定的 同義字字典,因此能夠彈性地根據操作者所輸入的查詢 關鍵字’進行個人式同義字的轉換後再進行查詢工作。 本發明的另一目的,在於提供一種智慧型的資訊擷取 系統,能夠提供輸入容錯的能力,亦即,操作者即使在 輸入時犯了某種程度的錯誤,系統也能夠進行補償並且 進行查詢。 本發明的另-目的,在於提供-種智慧製的資訊操取 系統,能夠提供不確定資訊的表現和處理,亦即以模糊 方式進行查詢,以增加使用上的彈性。 本發明的又-㈣,在於提供-種智慧型㈣訊棟取 系統,能夠依據操作者的擷取喜好,對系統所擷取出的 本紙張尺度適用中國國家樣準(CNS ) A4規格(210X297公釐) ' n· I- - n I n In - i 1-' -I 1 . ΐ i I I- - I. I I I I 1 (請先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局工消費合作社印製 鯉濟部中央樣準局負工消費合作社印策 420778 五、發明説明(5 ) 一 - 軟體元件排序,並能㈣據操作者的設定進行過遽,以 增加選擇的便利性及功能。 根據上述之目的,本發明提出一種智慧型的資訊棟取 糸統,用以根據輸入之查詢字,查詢出對應之資訊元件。 、匕括g義子程序部,其根據一同彡字模糊對應機 制丄利用此查詢字搜尋出具有同義關係之至少_索引關 鍵予’-資訊索?|程序部,用以根據上述索^關鍵字, 搜尋出對應之至少一杳士η 4 ^ 貪忒7C件,一兀件排序過濾程序 4 ’其根據-模糊排序過遽機制,將搜尋出之資訊元件 進行排序及過濾’輸出以供選擇出所需之資訊元件;一 個人化同義字調整部,其根據所選擇出之f訊元件,調 整上述同義字程序部巾之同義字㈣對應機制;以及一 個人化過濾調整部,其根據所選擇出之資訊元件,調整 上述元件排序過遽程序部之模糊排序過據機制。 本發明實施例中係為業已包裝能獨立完成一特定任 務之軟體元件做為上述的f訊元件。另外,上述之同義 字程序部和資訊索引程序部係由類神經網路所構成,用 以達到平行快速處理。另外,同義字㈣部利用查詢字 編碼以及位置偏移量的方式,可叫供輸人容錯的 能力。 上述之同義字程序部可以在兩種模式下進行動作,分 別為回憶模式(recaU phase)和學f模式㈣ phase)。在回憶模式下’此同義字程序部利用一編碼部編 馬此查《旬子,產生一查詢字編碼樣式,其對應於此查 (n n |_1 .m —J J— I I ί I---I — .1 丁 、-e (请先閱讀背面之注意事項再填馬本頁) 五、發明説明(6 ) 詢字之特徵。接著再利用一相似度計算部,用以將此查 詢字編碼樣式與數個系統所設定之同義字之系统同義字 編碼樣式進行比對,藉以產生對應的相似度值,用以表 示此查詢字和這些系統同義字之間的相似程度。這些系 統同義字則分別對應於數個系統所設定的關鍵字。根據 上述的相似度值’以及系統同義字和系統關鍵字之間的 關聯性值,利用一關聯部與一既定臨界值進行比較,藉 以從這些系統關鍵字中,選擇輸出做為上述的索引關鍵 字。在實施上’相似度計算部可以由類神經網路之節點 所構成,每個節點對應於一個系統設定的同義字,而關 聯部亦可以由節點所構成,分別對應於一個系統所設定 之關鍵字。 經濟部中央標準局員工消費合作社印裝 (請先閱讀背面之注意事項再填寫本頁) 在學習模式中,同義字程序部利用編碼部,將查詢字 和被選擇之資訊元件所對應的擷取元件關鍵字,分別產 生一查詢字編碼樣式和一擷取元件關鍵字編碼樣式。而 相似度計算部則分別將此查詢字編碼樣式和此擷取元件 關鍵字編碼樣式’與系統同義字編碼樣式進行比對,藉 以產生對應的第'组相似度值和第二組相似度值。關聯 部則根據上述的第—組相似度值和該第二組相似度值, 以及該等系統⑽字和該㈣統_字狀原始關聯性 值’計算對應之輸出值。最後再利用—判斷部,根據第 、’且相似度值、第二組相似度和上述輸出值,判斷是否 修正上述的相似度計算部以及上述的關聯部。 上述資訊索引程序則包括一索引關鍵字暫存部,依序 9 本紙張尺賴财關家蘇7^)織格(21()><297公 -------- 4207T8 Λ7 _____B7 五、發明説明(7 ) ^ ' 儲存索引關鍵字;以及複數個元件索引單元,分別對應 於各資訊元件,並且儲存相對於這些資訊元件之元件關 鍵字’以及對應之資訊元件辨識碼。當接收到索引關鍵 子時’用以計算其與所儲存之元件關鍵字間的相似度, 並判斷是否超過一既定臨界值,以決定是否輸出對應之 資訊元件辨識碼。 圖式之簡單說明: 為使本發明之上述目的、特徵和優點能更明顯易懂’ 下文特舉一較佳實施例,並配合所附圖式,作詳細說明 如下: 第1圖表示本發明中智慧型資訊擷取系統的功能結構 方塊圖。 第2圖表示本實施例中資訊擷取系統之查詢畫面之— 例的示意圓。 第3圖表示本發明實施例之同義字程序部和個人化同 義字調整部之功能結構方塊圖。 經濟部中央標準局員工消費合作社印製 (請先閱讀背面之注意事項再填寫本頁) 第4圊表示在本實施例中編碼程序之一例的示意圖。 第5圖表示本發明實施例中利用模糊類神經網路之結 構’實現同義字程序部之詳細結構示意圖。 第6圖表示本發明實施例之資訊索引程序部之功能結 構方塊圖。 第7圖表示本發明實施例中利用模糊類祌經網路之結 構’實現資訊索引程序部之詳細結構示意圖。 符號說明: 本紙張尺度適用中國國家榡準(CNS } A4規格(210x29·?公楚) 420778 Λ7 經濟部t央標準局員工消費合作社印策 B7 五、發明説明(8 ) ~~~~~~ 1〜同義字程序部;3〜資訊索引程序部;5〜元件排序 過濾程序部;7〜個人化排序過濾調整部;9〜個人化同 義字調整部;10〜操作者;11〜査詢字;13〜索引關鍵字: 15〜軟體元件辨識碼;17〜排序過濾後之軟體元件辨識 碼;19〜擷取元件關鍵字;21〜擷取之軟體元件;62〜軟 體元件功能查詢項;64〜關聯媒介查詢項;&〜軟體元 件類型查詢項;68〜系統類型查詢項;1〇1〜編瑪部; 103〜相似度計算部;105〜關聯部;1〇7〜判斷部;u、 L2、L3、L4、L5、L6、L7〜處理層;Lla、Lib、 L61-L6m〜子層;301〜索引關鍵字暫存部;303〜軟體元 件索引單元。 '實施例: 本實施例所揭露之資訊擷取系統,係以軟體元件為抬頁 取對象之系統為例。此處所謂的軟體元件,係指將—斗寺 定功能的程式片段(source codes)及其程式片段所使用的 資料(private data) ’包裝成一獨立且可完成某一功能的元 件,例如像是物件導向技術中所採用的軟體元件。然而 對於熟知此技藝者而言,可運用本實施例中所揭露的相 同技術,擴大使用在擷取各種文件、圖樣、動畫、影片、 歌曲等等多媒體資料的情況中,應用上並不僅限於本實 施例所述之環境。 根據以上針對習知技術之說明,本實施例中之資訊 (本實施例中稱軟體元件)擷取系統要能夠達到下列之目 的。第一、根據操作者的使用習慣以及個人喜好,彈性 11 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐〉 1- - - -= -1 - - 1 - --1衣----- -I In i (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印装 420778 at _____ 五、發明説明(9 ) 地調整同義字字典’以方便查詢的進行;第二、可以提 供輸入容錯的能力;第三、提供不確定資訊的表現和處 理’亦即提供模糊查詢的方法;第四、依據操作者的擷 取喜好和使用習慣’對於所擷取出的軟體元件進行排序 並且過濾相關性較小的軟體元件,以增加操作者在選取 時的便利性》在本發明中’運用了模糊資訊擷取(fuzzy information retrieval)、知識工程(knowledge engineering) 和機器學習(machine learning)等等的技術,實現以上之 目的。以下配合圖示’詳細說明本發明之實施例。 第1圖表示智慧型的軟體元件擷取系統的功能結構方 塊圖。如第1圖所示,整個軟體元件擷取系統是由同義 子程序部1、資訊索引程序部3、元件排序過渡程序部5、 個人化排序過渡調整部7和個人化同義調整部9所構 成。在功能則可以分成兩個部分,即由第丨圖上半部之 同義字程序部1、資訊索引程序部3和元件排序過濾程 序部5所構成的查詢擷取機制,以及由下半部之個人化 排序過濾調整部7和個人化同義字調整部9所構成的調 整機制。 在進行查詢擷取的過程中,首先由操作者10下達查 詢字11 ’交由同義字程序部1負責解出對應的關鍵字。 這些關鍵字是由系統所設定的,固定用來描述可操取之 軟體元件的特徵。在本實施例中’操作者10所下達的杳 詢字11包含了四個查詢項(facet),用來界定操作者所要 擷取的軟體元件。這四個查詢項分別為:軟體元件功能 本紙張尺度適用中國國家榡準(CNS) A4规格(210x297公釐) (請先閲讀背面之注意事項再填寫本頁) 訂 經濟部中央標準局員工消費合作社印製 k. 五、發明説明(10 ) (function)、軟體元件類型(file type)、關聯媒介(medium) 和系統類型(system type) ’操作者可以設定這四個查詢項 來界定所需要的軟體元件。軟體元件功能的查詢項是用 來界定此軟體元件的作用及功能,在本實施例中,軟體 元件功能的查詢項是可以輸入多重查詢字’再以分隔號 加以區分。軟體元件類型的查詢項是用來界定某一軟體 元件的種類,亦即何種軟體元件型態,例如〇CX、 VBX(visual basic) ^ VCL ^ DLL(dynamic link library) ' C++、delphi等等》關聯媒介的查詢項是用來界定某一 軟體元件所作用的裝置,例如磁碟(disk)、鍵盤 (keyboard)、輸出入埠(port) ' 網路(neUv〇rk)、螢幕 (screen)、界面卡(interface card)等等。系統類型的查詢 項則是用來界定某一軟體元件所適合的作業平台,例如ΑΊ Β7 5. 'Invention description (3 relations'), so it is not possible to adjust the synonym dictionary relationship for the operator's search. ~ Phase 2 Β #The system's f-message operation system is based on the query 7 entered by the operator. It ’s true that there is no tolerance for erroneous input in the conventional system. It is difficult to guarantee that when a person enters an input action, they can completely lose the input by mistake. In addition, it may also be a synonymous but different word. For example, 'the operator may have missed the first-mother in the input of a query word 2; or the synonym dictionary originally used a certain-word verb, but the operator lost Its noun. The situation like this member 'may make the system unable to find the corresponding keywords correctly. According to this-the shortcomings, the conventional technology is generally compensated by using the substitution table (e). The substitution table lists the possible error forms of the key words in each synonym dictionary ’and possible part-of-speech changes, so as to correct errors that may be caused when the operator turns. However, this kind of substitution is inherently unsuccessful, and it is basically impossible to handle all possible errors. Du Yinju, Employee Cooperative Cooperation of the Central Bureau of Standards, Ministry of Economic Affairs · The conventional information retrieval "Unable to determine the information to be processed", that is, the fuzzy query processing cannot be provided. In other words, the operator must accurately describe the query information Documents or software components can only find the required data. In other words, the operator must have a considerable degree of understanding of the information file or software component to be queried. 'Otherwise, the wrong information often leads to the conclusion of the query. No original expectations. W Standard (⑽} 2 " 97 公 爱) A7 423778 V. Description of the invention (4) 4 · The last thing to mention is that 'traditional information retrieval system based on certain specific keywords' is usually A considerable number of matching documents or software components can be found. Generally speaking, 'more advanced information retrieval systems can usually rank and output the found documents or software components based on their relevance to query words. However, This sorting and filtering function is quite fixed, that is, the same sorting is used by any operator And filtering procedures, so it is quite inflexible in use. According to the above, it can be seen that although the existing information retrieval system can basically achieve the purpose of query and retrieval, it is not very convenient in use. It is difficult to find a suitable document or software component. This is also the problem that the present invention intends to solve. In view of this, the 'main purpose of the present invention' is to propose an intelligent information retrieval system, which has a personal preference for the operator. A synonym dictionary is set, so it is possible to flexibly perform a personal synonym conversion based on a query keyword 'input by an operator and then perform a query. Another object of the present invention is to provide an intelligent information retrieval system , Can provide the ability to input fault tolerance, that is, even if the operator made a certain degree of error in the input, the system can compensate and query. Another object of the present invention is to provide a smart information operation Acquisition system, which can provide the performance and processing of uncertain information, that is, query in a fuzzy manner, Plus the flexibility in use. Another aspect of the present invention is to provide an intelligent system for picking up information, which can apply Chinese national standards (CNS) to the paper size retrieved by the system according to the retrieval preferences of the operator. ) A4 size (210X297mm) 'n · I--n I n In-i 1-' -I 1. Ϊ́ i I I--I. IIII 1 (Please read the precautions on the back before filling this page) Printed by the Industrial Standards and Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs, and printed by the Central Samples and Standards Bureau of the Ministry of Economic Affairs, the Consumer Cooperatives of the Central Standards Bureau. Increase the convenience and function of selection. According to the above purpose, the present invention proposes a smart information building system for querying the corresponding information element according to the input query word. 2. The g subroutine, which uses the query word fuzzy correspondence mechanism, uses this query word to search for at least the _ index key to the synonymous relationship. | The program department is used to search for at least one person corresponding to the search key η 4 ^ 7C, a software ordering filtering process 4 'It is based on the-fuzzy ordering mechanism, will search out The information elements are sorted and filtered to output the desired information elements for selection; a humanized synonym adjustment unit that adjusts the synonym word correspondence mechanism of the above synonym program procedures based on the selected f-sign elements; and A humanized filtering adjustment unit adjusts the fuzzy sorting and data collection mechanism of the above-mentioned component sorting and program part according to the selected information components. In the embodiment of the present invention, a software component that has been packaged to perform a specific task independently is used as the f-signal component described above. In addition, the above-mentioned synonym program section and information index program section are composed of neural-like networks to achieve parallel and fast processing. In addition, the synonym word unit uses the query word encoding and position offset, which can be called the input fault tolerance capability. The synonym program unit described above can operate in two modes, which are recaU phase and learning f mode (phase). In the recall mode, 'this synonym program department uses a coding department to compile this query, and generates a query word encoding style, which corresponds to this query (nn | _1 .m —JJ— II ί I --- I — .1 Ding, -e (Please read the notes on the back before filling out this page) 5. Description of the invention (6) The characteristics of the query word. Then use a similarity calculation section to use this query word coding style and The system synonym encoding styles set by several systems are compared to generate corresponding similarity values to indicate the similarity between the query word and these system synonyms. These system synonyms correspond to each other Keywords set in several systems. Based on the above-mentioned similarity value 'and the correlation value between the system synonym and the system keywords, a correlation section is used to compare with a predetermined threshold value, so as to derive from these system keywords In the selection, the output is selected as the above-mentioned index key. In practice, the 'similarity calculation section may be composed of nodes of a neural network, and each node corresponds to a synonym set by the system, and is associated with It can also be composed of nodes, corresponding to keywords set by a system. Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling this page) In the learning mode, the synonym program department Using the encoding unit, the query word and the extracted component keywords corresponding to the selected information component are respectively used to generate a query word encoding style and an extraction component keyword encoding style. The similarity calculation unit respectively uses this query word The encoding style is compared with the system's synonym word encoding style to generate the corresponding group of similarity values and the second group of similarity values. The association unit is based on the first group of similarities Degree value and the second set of similarity values, and the corresponding output values of the system word and the system_word-like original correlation value 'are calculated. Finally, the reuse-judgment section, according to the first, and the similarity value , The second group of similarities and the output value, to determine whether to modify the similarity calculation unit and the associated unit. The information indexing program includes a Keyword temporary storage department, in order of 9 paper rule Lai Cai Guan Jia Su 7 ^) weaving grid (21 () > < 297 public -------- 4207T8 Λ7 _____B7 V. Description of the invention (7 ) ^ 'Store index keys; and a plurality of component index units, each corresponding to each information element, and storing a component key corresponding to these information elements' and a corresponding information element identification code. When an index key is received' It is used to calculate the similarity between the stored component keywords and determine whether it exceeds a predetermined threshold to determine whether to output the corresponding information component identification code. Brief description of the drawing: In order to make the above-mentioned object of the present invention, Features and advantages can be more obvious and easy to understand 'The following is a detailed description of a preferred embodiment in conjunction with the accompanying drawings: Figure 1 shows a block diagram of the functional structure of the intelligent information retrieval system in the present invention. FIG. 2 shows a schematic circle of an example of a query screen of an information retrieval system in this embodiment. FIG. 3 is a block diagram showing the functional structure of the synonym program unit and the personalized synonym adjustment unit according to the embodiment of the present invention. Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs (please read the precautions on the back before filling this page). Figure 4 shows a schematic diagram of an example of the coding procedure in this embodiment. FIG. 5 shows a detailed structure diagram of a synonym program part using the structure of a fuzzy neural network according to an embodiment of the present invention. Fig. 6 is a block diagram showing a functional structure of an information indexing program unit according to an embodiment of the present invention. Fig. 7 shows a detailed structure diagram of the information indexing program part using the structure of the fuzzy network via the embodiment of the present invention. Explanation of symbols: This paper size is applicable to China National Standards (CNS) A4 (210x29 ·? Gongchu) 420778 Λ7 Instruction B7 of the Consumers' Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of the invention (8) ~~~~~~ 1 ~ synonym program section; 3 ~ information index program section; 5 ~ component sorting filter program section; 7 ~ personalized sort filter adjustment section; 9 ~ personalized synonym adjustment section; 10 ~ operator; 11 ~ query word; 13 ~ index key: 15 ~ software component identification code; 17 ~ sorted filtered software component identification code; 19 ~ retrieved component keyword; 21 ~ retrieved software component; 62 ~ software component function query item; 64 ~ Related Media Query Items; & ~ Software Component Type Query Items; 68 ~ System Type Query Items; 101 ~ Compilation Department; 103 ~ Similarity Calculation Department; 105 ~ Correlation Department; 107 ~ Judgment Department; u, L2, L3, L4, L5, L6, L7 to processing layers; Lla, Lib, L61-L6m to sublayers; 301 to index key temporary storage unit; 303 to software component indexing unit. 'Embodiment: This embodiment Disclosed information retrieval system, using software components as pages Take the elephant system as an example. The so-called software component here refers to the package of the source code and the private data used for the program function to be packaged into an independent and complete certain function. Components, such as software components used in object-oriented technology. However, for those who are familiar with this technology, they can use the same technology disclosed in this embodiment to expand their use in capturing various documents, patterns, animations, videos. In the case of multimedia data such as songs, songs, etc., the application is not limited to the environment described in this embodiment. According to the above description of the conventional technology, the information in this embodiment (referred to as the software component in this embodiment) retrieval system To be able to achieve the following purposes: First, according to the operator's usage habits and personal preferences, flexibility 11 This paper size applies Chinese National Standard (CNS) A4 specifications (210X297 mm> 1----= -1--1 ---1 clothing ----- -I In i (Please read the notes on the back before filling out this page) Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs 420778 at _____ Explanation (9) to adjust the synonym dictionary to facilitate the query; second, it can provide the ability to input fault tolerance; third, provide the performance and treatment of uncertain information, that is, provide a fuzzy query method; fourth, the basis Operator's retrieval preferences and usage habits 'to sort the extracted software components and filter the less relevant software components to increase the operator's convenience in selection "In the present invention,' fuzzy information extraction is used Techniques such as fuzzy information retrieval, knowledge engineering, machine learning, etc., achieve the above purpose. The embodiments of the present invention will be described in detail below with reference to the drawings. Fig. 1 shows a functional block diagram of an intelligent software component retrieval system. As shown in Figure 1, the entire software component extraction system is composed of a synonymous subroutine section 1, an information indexing program section 3, a component sorting transition program section 5, a personalized sorting transition adjusting section 7, and a personalized synonymous adjusting section 9. . The function can be divided into two parts, that is, the query retrieval mechanism composed of the synonym program section 1, the information index program section 3, and the component sorting filter program section 5 in the upper half of the figure, and the query retrieval mechanism in the lower half. The adjustment mechanism formed by the personalized sort filter adjustment unit 7 and the personalized synonym adjustment unit 9. In the process of query retrieval, the operator 10 first issues the query word 11 'and the synonym program unit 1 is responsible for resolving the corresponding keywords. These keywords are set by the system and are used to describe the characteristics of the software components that can be manipulated. In this embodiment, the query word 11 issued by the operator 10 includes four query terms (facets), which are used to define the software components to be retrieved by the operator. The four inquiries are: Software component function. This paper size is applicable to China National Standards (CNS) A4 (210x297 mm) (Please read the precautions on the back before filling this page.) Order the staff of the Central Bureau of Standards of the Ministry of Economic Affairs. Printed by the cooperative k. V. Description of the invention (10) (function), software component type (file type), associated medium (medium) and system type (operator can set these four query items to define the required Software components. The query term of the software component function is used to define the function and function of the software component. In this embodiment, the query term of the software component function can be entered by multiple query words' and then distinguished by a separator. The software component type query is used to define the type of a software component, that is, what type of software component, such as 〇CX, VBX (visual basic) ^ VCL ^ DLL (dynamic link library) 'C ++, delphi, etc. 》 The related media query term is used to define the device that a software component acts on, such as disk, keyboard, port 'neUvrk, screen Interface card, etc. System-type queries are used to define the operating platform for a software component, such as

Windows 3.x/95/NT、DOS、OS/2、UNIX 等等。在本 實施例中’只有軟體元件功能的查詢項會由操作者自行 輸入,其他查詢項則是由系統提供固定的關鍵字表,讓 操作者來選擇。 第2圖表示本實施例中資訊摘取系統之查詢畫面之一 例的示意圖。如圖所示’操作者1〇可以利用各查詢項的 設定,用以界定所要求的軟體元件,其中62表示軟體元 件功(function),64表示關聯媒介丨则仙!!!),表示 軟體元件類型(file type),68表示系統類型(system type)。由圖中亦可清楚看出,操作者可以自由輸入用來 界定軟體元件功能的查詢字’而其他的查詢項則是直接 本紙張尺度適用中國國家標準(CNS〉A4規格(210X297公楚) (請先閲讀背面之注意事項再填寫本頁} Λ7 42D778 五、發明説明(ii ) 由系統設定的關鍵字中選擇出來。因此,本實施例中針 對操作者輸入之查詢字所進行同義字處理、容錯處理等 等’主要係指此一查詢項所輸入的查詢字。 查詢字11輸入同義字程序部1後,即進行同義字的 分析處理’對應出數個原本即由系統所界定的關鍵字, 在本實Μ例中稱之為索引關鍵字(indexe£j key word) 13。 索引關鍵字係屬於系統原本界定的關鍵字,其與輸入的 查詢字有某種程度上的同義關係,此同義關係則是由同 義字程序部1所界定。在本實施例中,同義字程序部1 係由模糊神經網路所構成,同時利用查詢字編碼 (encoding)和位置偏移量補償的方式,提供輸入容錯的能 力。至於同義字程序部1之詳細結構則於後詳述。另外, 如上所述,本實施例中的同義字程序部〗只有針對軟體 功能査詢項的查詢字進行同義字處理,至於其他查詢項 所輸入的查詢字由於是直接由系統預設之關鍵字中選 取,因此不會有同義字和輸入錯誤的問題。 根據同義字程序1所搜尋出的索引關鍵字13 ,資訊 索引程序部3則進行對應軟體元件的擷取動作,而在本 實施例中’資訊索引程序部3會將被獅軟體元件的識 別碼(identification number)15輸出。另外,本實施例之 資訊索引程序部3同樣是由模糊類神經網路所構成,因 此也可以提高平行處理的效率,則詳細内容在隨後詳 述。 最後由資訊索引程序部3所輸出的軟體元件識別竭 本紙張尺度適用中國國家標準(CNS ) A4規格公楚 (請先閲讀背面之注意事項再填寫本頁} ·.装 經濟部t央標準局員工消費合作杜印製 ΑΊ 420778 五、發明説明(12) 一*-- 15則是送到元件排序過遽程序部5中,經排序⑽㈣) 和過濾(filtering)處理後,輸出在顯示裝置上供 加以選擇。一般查詢字在經過上述同義字程序部i和資 訊索引程序部3的處理,可以找到相當數量的軟體元件, 符合操作者所輸人的查詢字條件,同時隨著可檢索的軟 體元件數量和範圍與日俱增,符合的數量會更多。要任 由操作者自由地從如此眾多訊息中找出所需的軟體元 件’並不恰當,因此有必要㈣取出的軟趙元件加以分 類及過濾。排序處理的目的是要所有被擷取出的軟體元 件’依照與查詢字的相關性及操作者過去選用軟體元件 的喜好’依序顯示出來讓操作者進行選擇,藉此可以讓 操作者能夠優先找到所需要的㈣元件。料處理的目 的是要將關聯性較低的軟體元件過遽#,讓操作者更容 ,找到所需要的軟體元件。因此,在元件排序過遽程序 部5中會設定一可由操作者調整的臨界值⑷,以 便過渡關聯性較低的軟體元件。操作者1〇根據顯示的軟 體凡件,即可選擇出實際需求的軟體元件21。 經濟部中央標隼局貝工消费合作社印製 本實施例之軟體元件擷取系統的另一特徵,在於具有 自動回授的調整機制,可以根據操作者10所選擇實際需 求、軟體;件對應之擷取元件21與其關鍵字19,所顯示 出的操作習慣和擷取喜好,分別來調整元件排序過濾程 序部5和同義字程序部丨。個人化排序過濾調整部7根 據擷取元件21,調整元件排序過濾程序部5内的排序機 制和過濾機制,使其具有學習功能,以配合實際作業 本纸張尺 15 (210X297公釐) Λ7 B7 423778 五、發明説明(13) 需要。個人化同義字調整ώ 堡0ρ 9則根據軟體元件關鍵字 19,調整同義字程序1内沾 門的冋義關係,使其具有學習功 能,以配合實際作業之需要。 I - I n- In Mil ------訂 (請先閲讀背面之注意事項再填寫本頁) 以下配合圖式,分別針對同義字程序部卜資訊索引 程序部3、兀件排序過濾程序部5加以說明。Windows 3.x / 95 / NT, DOS, OS / 2, UNIX, and more. In this embodiment, the query items that have only software component functions are input by the operator themselves, and other query items are provided by the system with a fixed keyword list for the operator to choose. Fig. 2 is a diagram showing an example of a query screen of an information extraction system in this embodiment. As shown in the figure, 'Operator 10 can use the settings of each query item to define the required software components, where 62 represents the software component function (64) and the associated medium 丨 then immortal !!!), which represents the software Component type (file type), 68 indicates system type. It can also be clearly seen from the figure that the operator can freely enter the query word used to define the function of the software component, and the other query terms are directly applicable to the national standard (CNS> A4 specification (210X297)) of this paper size ( Please read the notes on the back before filling in this page} Λ7 42D778 V. Description of the invention (ii) It is selected from the keywords set by the system. Therefore, in this embodiment, synonym processing is performed for the query words entered by the operator. 'Fault-tolerant processing, etc.' mainly refers to the query word entered in this query item. After query word 11 is entered into the synonym program unit 1, the analysis of synonym words is performed, corresponding to several keywords that were originally defined by the system. In this example, it is called an index key (indexe £ j key word). 13. An index key is a keyword originally defined by the system, which has a certain synonymous relationship with the input query word. The synonymous relationship is defined by the synonymous program unit 1. In this embodiment, the synonymous program unit 1 is composed of a fuzzy neural network, and the query word encoding (encoding ) And position offset compensation methods to provide the ability to input fault tolerance. As for the detailed structure of the synonym program unit 1 will be described in detail later. In addition, as described above, the synonym program unit in this embodiment is only for software The query words of the functional query items are processed synonymously. As for the query words entered by other query items, they are directly selected from the keywords preset by the system, so there will be no problems with synonyms and input errors. According to the synonyms program 1 The searched index key 13 is used by the information indexing program section 3 to retrieve the corresponding software components. In this embodiment, the 'information indexing program section 3 will identify the software component's identification number 15 In addition, the information indexing program section 3 of this embodiment is also composed of a fuzzy neural network, so the efficiency of parallel processing can also be improved. The details will be detailed later. Finally, the information indexing program section 3 outputs Identification of software components is exhausted. This paper size is applicable to China National Standard (CNS) A4 specifications. (Please read the precautions on the back before filling This page} · The consumer cooperation of the Central Bureau of Standards of the Ministry of Economic Affairs, printed by DU 420 420778 V. Description of the invention (12) One *-15 is sent to the component sorting program section 5, sorted ⑽㈣) and After filtering, the output is selected on the display device. General query words can be found in a considerable number of software components after processing by the synonym program unit i and the information index program unit 3 described above. Query terms, and as the number and scope of software components that can be retrieved increase, the number of matches will increase. It is inappropriate for the operator to freely find the required software components from so many messages, so It is necessary to sort out and filter out the soft components. The purpose of the sorting process is to require all extracted software components to be displayed in order for the operator to choose according to the relevance to the query word and the operator's past choice of software components, thereby allowing the operator to find The required plutonium element. The purpose of material processing is to reduce the number of software components with low relevance, so that the operator is more tolerant and finds the required software components. Therefore, a critical value 可由, which can be adjusted by the operator, is set in the component ordering program section 5 so as to transition software components with low relevance. The operator 10 can select the actual software component 21 according to the displayed software. Another feature of the software component extraction system printed by the Central Bureau of Standards, the Ministry of Economic Affairs, Shellfish Consumer Cooperative, is that it has an automatic feedback adjustment mechanism, which can choose the actual needs and software according to the operator 10; The retrieved component 21 and its keyword 19, the displayed operating habits and retrieval preferences, respectively, adjust the component sorting and filtering program section 5 and the synonym program section 丨. The personalized sorting and filtering adjustment section 7 adjusts the sorting mechanism and filtering mechanism in the component sorting and filtering program section 5 according to the capturing component 21 so that it has a learning function to match the actual work. Paper ruler 15 (210X297 mm) Λ7 B7 423778 V. Description of Invention (13) Need. Personalized synonym adjustments. For free 0ρ9, according to the software component keyword 19, adjust the synonymous relationship in the synonym program 1 to make it have a learning function to meet the needs of actual work. I-I n- In Mil ------ Order (please read the notes on the back before filling this page) The following diagrams are used for the synonym program section, the information index program section, and the component sorting and filtering program. Section 5 explains.

Jj義字程序部(Π 第3圖表示本實施例之同義字程序部1和個人化同義 字調整部9之功能結構方塊圖。如圖所示,同義字程序 部!包括編碼部101、相似度(similarity)計算部ι〇3、關 聯部105和判斷部1〇7。 同義字程序部1在操作上可以分成兩個模式 (Phases),分別為回憶模式(recaU沖⑽匀和學習模式 (learning phase)。 回憶模式(recall p'hase): 經濟部中央標準局舅工消費合作社印裂 在回憶模式下’操作者所輸入的查詢字u會先經過 編碼部101進行編碼處理。在本實施例中,編碼格式主 要是將待編碼字(此時即查詢字)的各字母數量和第一次 出現位置,以平行方式表現。以下即本實施例中所採用 的編碼方式: (a) 一開始先將待編碼字所有字母先轉成小寫,一些非 a到z及空白字皆轉成空白字。 (b) 設定27個暫存器,分別對應於英文字母α到z以 及空白字。 (c) 計算轉換後之待編碼字中’每一個字母的數量,並 16 本紙張尺度適用中國國家標隼(CNS ) A4規格(21ΰχ 297公釐) 42JT78 Λ7 五、發明説明(14) —〜 且將這些字母$量儲存到對應的暫存器中。 (d)此時,應該尚有許多暫存器尚未填入資料。因此 利用這些未填入資料之暫存器,儲存此換後之待編碼字 中每個字母(包含空白字)的第一次出現位置。儲存的= 式是算出每個字母在待編碼字的相對位置,變成負值 在儲存在對應字母暫存器的接鄰之空暫存器中。儲存時 是優先儲存在左邊暫存器,若左邊暫存器已填入資料 時’再選擇儲存在右邊暫存器中。若兩邊暫存器都已填 入資料時,則放棄此位置值。 、 ⑷即使填入了位置資料,這27個暫存器仍可能有未 :入資料的暫存器。此時則將該換後之待編碼字的所有 字母(即只有&到z及空白的字),依序填入所有未填入資 料的暫存器中。此時,便完成了待編碼字的編碼動作。 第4圖表示在本實施射對”化邛c_r〇i,,進行編碼 之不意圖。第4圓⑷、⑻、⑷和⑷分別表示上述步驟⑻、 =、⑷和⑷時各暫存器的内容值,其中資料,,〇”表示未 :真二身料的狀態1 26個暫存器分別對應 (讀先聞讀背面之注意事項再填{"本頁) 經濟部中央標準局員工消費合作社印裝 於 Ί”、,’e” '、,Ί、”k”、”〇,’、,,v”、,,n 、”y’ 77x,, ”尸, 7a,:,,s, z ,b’Jj synonym program section (Π Figure 3 shows the functional structure block diagram of the synonym program section 1 and personalized synonym adjustment section 9 of this embodiment. As shown in the figure, the synonym program section! Includes the encoding section 101, similar Degree (similarity) calculation unit ι03, association unit 105, and judgment unit 107. The synonym program unit 1 can be divided into two modes (Phases) in operation, which are the recall mode (recaU and the learning mode ( learning phase). Recall p'hase: In the recall mode, the Central Consumers ’Bureau of Standards and Labor ’s Consumer Cooperatives prints that in the recall mode, the query word u entered by the operator is first encoded by the encoding unit 101. In this implementation In the example, the encoding format mainly represents the number of letters and the first occurrence position of the word to be encoded (the query word at this time) in a parallel manner. The following is the encoding method used in this embodiment: (a)- At the beginning, all the letters of the word to be coded are converted to lowercase, and some non-a to z and blank words are converted to blank words. (B) Set up 27 registers, corresponding to the English letters α to z and blank words. c) After calculating the conversion 'Number of each letter in the word to be coded, and 16 paper sizes are applicable to the Chinese National Standard (CNS) A4 specification (21ΰχ 297 mm) 42JT78 Λ7 V. Description of the invention (14) — ~ and store the amount of these letters in $ To the corresponding register. (D) At this time, there should be many registers that have not yet been filled with data. Therefore, these unfilled registers are used to store each letter in the word to be encoded after the change. (Including blank words) the first occurrence position. The stored = formula is to calculate the relative position of each letter in the word to be encoded, and it becomes a negative value stored in the adjacent temporary register stored in the corresponding letter register. When saving, it is preferentially stored in the left register. If the left register has been filled with data, then choose to save it in the right register. If both registers are filled with data, this position value is discarded. , ⑷Even if the location data is filled in, these 27 registers may still have unregistered data registers. At this time, all the letters of the word to be encoded after the replacement (that is, only & to z and blank) "), Fill in all the temporary At this time, the encoding action of the word to be encoded is completed. Figure 4 shows the intention of encoding "邛 _c_r〇i" in this implementation to perform encoding. The fourth circle ⑷, ⑻, ⑷, and ⑷ Respectively indicate the content value of each register in the above steps ⑻, =, ⑷, and ⑷. Among them, the data, 〇 ”means not: the state of the real body 1 26 registers corresponding to each other Matters needing attention are re-filled (" this page) The Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs are printed on Ί ",, 'e"',, Ί, "k", "〇, ',,, v" ,, n , "Y '77x," "corpse, 7a,: ,, s, z, b'

’T w,g” ” t’”c, M U 、”11”、)’’、4,’、及、5,的字母,第27個暫 子器對應於空白字。這種字母位置的分配::::碼特性’以及採用最常出現的字母被安排L =:::旁邊的規則,加編碼後資訊的記載 17 本紙張尺舰财關 Λ7 420778 五、發明説明(15) ~ —-- 利用編碼部101所產生之查詢字編碼格式,會被送到 相似度計算部1G3中,與线所衫義之同義字進行相 似度的計算;接著再送到關聯部1G5,藉以決定實際與 查詢字相關的系統關鍵字為何,輸出做為索引關鍵^ 13 〇 在相似度計算部1〇3中,預先儲存著系統所定義之同 義字的編碼格式’其編碼方式與上述者相同。系統所定 義之同義字係對應於某—系統所定義的騎字’關鍵字 才是用來直接描述軟體元件功能(本實施例情況)的文 字。因此,在相似度計算部103中先將查詢字u與系統 同義字進行比對,產生兩者間的相似度值,藉以表示兩 者的相似程度。 關聯部105内則是儲存著系統所定義之同義字和系統. 所定義之關鍵字間的關聯性值。舉例來說,若系統定義 某個軟體元件功能的關鍵字為,,ab〇ut dial〇g b〇x”,用來 描述”有關於對話框’,之功能。因此,同義字可能 經濟部中央標準局員工消費合作社印裝 為’’interface”、’’dialog”、”help box”等等。根據使用經驗 可知’上述的二個同義字與實際的關鍵字之間的相關程 度並不一樣。例如’’dialog”與’’about dialog box”的相關程 度應該很高,可以設定成〇.7 ;但是像,’hdp b〇x,,的同義 字可能相關程度就較低,所以設定成〇5 :而像 是”interface”的同義字,由於含意過於廣泛,相關程度可 能更低’所以設定成更低的0.3。 因此,當相似度計算部103所計算的相似度值被送到 本紙張尺度適用中國國家標準(CNS ) A4規格(210x297公犛) 420773 Λ7 經濟部中央榡準局員工消費合作社印掣 B7 五、發明説明(16 ) 關聯部105後,便可以根據各同義字和實際關鍵字之間 的關聯性值’決定出查詢字和各關鍵字之間的相依程 度。在本實施例中’是將對應的相似度值和關聯性值取 最小值後,決定出查詢字和各關鍵字之間的相依程度, 再與一個預定的臨界值進行比較’以決定是否將此關鍵 字輸出做為索引關鍵字13。 在回憶模式中’同義字程序部1並不會使用到判斷部 107,也不會使用到個人化同義字調整部9來進行調整。 另外’同義字程序部1中尚包含一位置偏移量計算部(未 圖示於第3圖,其功能係由第5圖之L2層節點所實現), 可以根據査詢字11的編碼格式,提供輸入容錯的能力, 此部分將於後詳述。 學習模式(learning phase): 在學習模式中’同義字程序部1會利用個人化同義字 調整部9所提供之功能,修正相似度計算部1〇3中的相 似度值計算參數(後述)以及關聯部1〇5中的關聯性值。而 此修正判斷則是基於操作者所選擇的軟體元件,其所對 應的擷取元件關鍵字19而決定。 首先,查詢字11和擷取元件關鍵字19會被輸入到編 碼部101中,進行編碼,得到對應查詢字編碼格式和擷 取元件關鍵字編碼格式。將此查詢字編碼格式和擷取元 件關鍵字編碼格式同時送到相似度計算部1 中,分別 地计算出兩者與各系統同義字之間的相似程度。此時便 分別產生兩組相似度值,分別對應於查詢字1 1和糊取元 ^^^1 1 I H^I n^— n 一· ^-5 (讀先閱讀背面之;i意事項再填寫本頁}'T w, g ”” t' ”c, MU,“ 11 ”,)”, 4, ’, and 5, and the 27th register corresponds to a blank word. This letter position assignment :::: Code characteristics' and the most commonly used letters are arranged L = ::: next to the rule, plus the recorded information records 17 paper ruler Cai Caiguan Λ7 420778 V. Description of the invention (15) ~ --- -The query word encoding format generated by the encoding unit 101 will be sent to the similarity calculation unit 1G3 to calculate the similarity with the synonymous word of the line shirt; then it will be sent to the association unit 1G5 to determine the actual and query words What are the relevant system keywords, and the output is used as the index key? 13 〇 In the similarity calculation unit 103, the encoding format of the synonym defined by the system is stored in advance. Its encoding method is the same as the above. Synonyms correspond to the “keywords” defined by a certain system—the keywords are used to directly describe the functions of software components (in the case of this embodiment). Therefore, in the similarity calculation unit 103, the query word u and the system are first used. Compare synonyms to produce two The similarity value between the two represents the degree of similarity between the two. The association unit 105 stores the synonymous word defined by the system and the system. The correlation value between the defined keywords. For example, if the system defines a certain The key word for the function of each software component is, “ab〇ut dial〇gb〇x”, which is used to describe the function of “related to the dialog box”. Therefore, the synonym may be printed as “' 'interface', 'dialog', 'help box', etc. According to experience, it can be seen that the correlation between the above two synonyms and the actual keywords is not the same. For example, `` dialog '' and `` about The correlation degree of "dialog box" should be very high, and it can be set to 0.7; but synonyms such as 'hdp b〇x,' may be less relevant, so set it to 〇5: and like "interface" Synonyms, because the meaning is too broad, may have a lower degree of correlation, so set it to a lower value of 0.3. Therefore, when the similarity value calculated by the similarity calculation unit 103 is sent to this paper scale, the country of China is applicable. Standard (CNS) A4 specification (210x297) 2077 Printed by the Consumers' Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs B7 V. Invention description (16) After the associating unit 105, you can "Relevance value" determines the degree of dependence between the query word and each keyword. In this embodiment, "is the minimum value of the corresponding similarity value and relevance value, then determines the relationship between the query word and each keyword And then compare it with a predetermined threshold to determine whether to output this keyword as the index keyword 13. In the recall mode, the 'synonym program unit 1' does not use the judgment unit 107, nor The personalized synonym adjustment unit 9 will be used for adjustment. In addition, the synonym program unit 1 also includes a position offset calculation unit (not shown in Figure 3, whose function is implemented by the L2 layer node in Figure 5), which can be based on the encoding format of query word 11, Provides input fault tolerance. This section will be detailed later. Learning mode: In the learning mode, the 'synonym program section 1 will use the function provided by the personalized synonym adjustment section 9 to modify the similarity value calculation parameters (described later) in the similarity calculation section 103 and The correlation value in the correlation unit 105. The correction judgment is determined based on the software component selected by the operator and the corresponding retrieval component keyword 19. First, the query word 11 and the extracted component keyword 19 are input into the encoding section 101 and encoded to obtain the corresponding query word encoding format and the extracted component keyword encoding format. The query word encoding format and the retrieval element keyword encoding format are sent to the similarity calculation unit 1 at the same time, and the similarity between the two and each system's synonym is calculated separately. At this time, two sets of similarity values are generated, which correspond to the query word 1 1 and the paste element ^^^ 1 1 IH ^ I n ^ — n a ^ -5 (read the first; read the meaning of the matter and then Fill out this page}

420778 Λ7 經濟部中央標準局員工消費合作社印繁 B7 五、發明説明(17 ) 件關鍵字19。這兩組相似度值除了會被送到關聯部ι〇5 之外,也會被送到判斷部1〇7。不過,此時關聯部1〇5 是執行不同於回憶模式下的運算,再產生一組輸出值到 判斷部107中。判斷部107便根據上述的兩組相似度值 以及輸出值,決定是否要進行修正,以及如何進行修正, 再由個人化同義字調整部9實際負責調整的動作。 至於編碼部101之編碼方式以及相似度計算部丨〇3在 計算相似度的方式上,則與上述的回憶模式相同。 第5圖是在本實施例中,利用模糊類神經網路之結構 來實現同義字程序部1的詳細結構示意圖。利用類神經 網路的優點在於具有平行處理和學習能力。如第5圖所 示,此模糊類神經網路具備有五層的節點,分別表示為 LI、L2、L3 ' L4、L5。以下分別說明各層節點之功 能。 L1層 在L1層節點中’包含了兩個子層,分別為Lla和Llb, 各子層各具有η個節點。所以,在子層Lla中包含了節 點qi、^..... qn,子層Lib中包含了節點ki、k2..... kn。 子層Lla和Lib分別用來儲存查詢字u和擷取元件 關鍵字19的編碼格式。因此,本實施例之n設為27, 而知點qi、q2、…、qn以及節點h ' k2、…、kn之功 用b即類似於上述編碼過程中的暫存器,分別儲存查詢字 11和擷取元件關鍵字19的各字母總數量、第一次出現位 20 衣紙張尺度適用中國國家標準(CNS ) Μ規格(210X297公趋) —^ν ί -I m. —II ^^^1 I. ' · 、1· (請先閱讀背面之注意事項再填寫本頁) /! r > /·' Q i420778 Λ7 Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs, India and India B7 V. Description of the invention (17) Keyword 19. These two sets of similarity values will be sent to the judgment unit 107 in addition to the correlation unit ι05. However, at this time, the association unit 105 performs an operation different from that in the recall mode, and then generates a set of output values to the judgment unit 107. The judging unit 107 then decides whether or not to perform the correction and how to perform the correction based on the two sets of similarity values and the output values, and the personalized synonym adjustment unit 9 is actually responsible for the adjustment operation. As for the encoding method of the encoding section 101 and the similarity calculation section 〇03, the similarity calculation method is the same as the recall mode described above. FIG. 5 is a detailed schematic diagram of the implementation of the synonym program unit 1 by using the structure of a fuzzy neural network in this embodiment. The advantage of using neural-like networks is their parallel processing and learning capabilities. As shown in Figure 5, this fuzzy neural network has five layers of nodes, which are denoted as LI, L2, L3, L4, and L5, respectively. The functions of the nodes in each layer are explained below. L1 layer In the L1 layer node ', it includes two sub-layers, namely Lla and Llb, each of which has n nodes. Therefore, nodes qi, ^ ..... qn are contained in the sub-layer Lla, and nodes ki, k2 ..... kn are contained in the sub-layer Lib. The sub-layers Lla and Lib are used to store the encoding format of the query word u and the retrieval element key 19, respectively. Therefore, n in this embodiment is set to 27, and the functions b of the knowledge points qi, q2, ..., qn and the nodes h'k2, ..., kn are similar to the temporary registers in the encoding process described above, and respectively store the query word 11 And the total number of each letter of the retrieval component keyword 19, the first occurrence position 20 The paper size is applicable to the Chinese National Standard (CNS) M specification (210X297 public trend) — ^ ν ί -I m. —II ^^^ 1 I. '·, 1 · (Please read the notes on the back before filling this page) /! R > / ·' Q i

M B7 五、發明说明(18 ) 置以及實際字母資料。除了實際負責進行模組未圖示之 外’ L1層節點即儲存於第3圖中之編碼部1〇1之編碼結 果。 另外’在回憶模式’只有子層Lla中各節點的資料會 被使用’而在學習模式中’子廣Lla和子層Lib中各節 點的資料都會被使用。 L2層 在L2層中,包含了節點I、p2、p3、p4、p5 經 濟 部 中 央 標 準 局 員 費 合 社 印M B7 V. Description of the invention (18) and actual letter information. In addition to the actual responsible for the module not shown in the figure, the L1 layer node is stored in the coding result of the coding section 101 in the third figure. In addition, in the recall mode, only the data of each node in the sub-layer Lla will be used, and in the learning mode, the data of each node in the sub-layer Lla and the sub-layer Lib will be used. L2 layer In the L2 layer, nodes I, p2, p3, p4, and p5 are included.

Pd ’为別對應於系統所定義的各同義字,因此,亦儲存 此系統同義字的編碼格式。每個節點從L1層接收到查詢 字11和擷取元件關鍵字19的編碼格式,在回憶模式令, 是將查詢字11的編碼格式和其儲存的系統同義字的編 碼格式,計算兩者間的相似度值;在學習模式中,則是 分別將查詢字11和擷取元件關鍵字19的編碼格式,計 算與系統同義字編碼格式之間的相似度值。此相似度值 代表著輸入字(查詢字U或擷取元件關鍵字19)與該2統 同義字之間的相似程度。另外,利用12層節點所計算的 位置偏移量,也可以找出輸入錯誤的部分,以便對輸入 之查詢字11其編碼格式進行補償。 在本實施例中,每巧L2層節點會執行以下的計算: =Σ Ψα}/ ^}} ⑴咖/)=1C Σ K )/2}(3) 第⑴式是表示在某個L2層節點⑹上,其所儲存的同 (2) —---I 1 I- - - I I - - I — — ί I . . _ _ (請先閱讀背面之注意事項再填寫本頁) 21 本紙張尺度適用中國禺家榇準(CNS )為4規格(2]〇χ297公釐 五、發明説明(19 ) ,子(Pi)與輸人之查詢字u(q)之間的位置偏移量%, )其中q』和PM是分別代表查詢字編碼格式q和同昜 編馬格式Pi的各編碼資才斗,j為卜27。(.則是表示Lu 子層中第j個節點(屮)和L2層中第i個節點p中的 貝料間的權值(weight),也可以視為在查詢字u中第」 7編碼資料的重要度(㈣⑽_e)…般來說,此重要度 可以利用某個字母在―般查詢字中出現的機率來表示, 例如z 一般較少出現在查詢字中因此如果出現的話應 該較為重要’所以相對於”z”的權值就必須較大。 另外必須注意的是,第(1)式只針對編碼格式中的負值 '計算亦即只有屮和Pij都是負值時才會計算。所以, 在第⑴式中,假如屮是負值,^等於1否則為〇,假如Pd 'is a synonym that is defined by the system. Therefore, the encoding format of the synonym of this system is also stored. Each node receives the encoding format of query word 11 and retrieval component keyword 19 from the L1 layer. In the recall mode order, it calculates the encoding format of query word 11 and the encoding format of the system synonym it stores. In the learning mode, the encoding format of the query word 11 and the retrieval component keyword 19 are calculated respectively, and the similarity value between the encoding format of the query word 11 and the system synonym encoding format is calculated. This similarity value represents the degree of similarity between the input word (query word U or extraction component keyword 19) and the two synonymous words. In addition, using the position offset calculated by the 12-layer node, it is also possible to find the input error part in order to compensate the input query word 11 and its encoding format. In this embodiment, each node of the L2 layer performs the following calculations: = Σ Ψα} / ^}} ⑴Ca /) = 1C Σ K) / 2} (3) The formula 表示 is expressed in a certain L2 layer On node ⑹, its storage is the same as (2) —-- I 1 I---II--I — — ί I.. _ _ (Please read the precautions on the back before filling this page) 21 papers The scale applies to the Chinese family standard (CNS) of 4 specifications (2) 0297 mm. 5. Description of the invention (19), the position offset% between the sub (Pi) and the input query u (q). ,) Where q ′ and PM are the coding resources for the query word coding format q and the peer coding format Pi, respectively, and j is Bu 27. (. It means the weight between the j-th node (屮) in the Lu sub-layer and the i-th node in the L2 layer (weight), which can also be regarded as the 7th code in the query word u The importance of the data (㈣⑽_e) ... In general, this importance can be expressed by the probability of a certain letter appearing in the general query word, for example, z is less common in the query word, so it should be more important if it appears. Therefore, the weight relative to "z" must be larger. In addition, it must be noted that formula (1) is only calculated for the negative value 'in the encoding format, that is, it is calculated only when 屮 and Pij are both negative. So, in ⑴, if 屮 is negative, ^ is equal to 1 otherwise it is 0, if

Pij是負值,pij’等於1否則為〇,假如qj和Pu都是負值, %等於q」否則4 〇。當位置偏移量s(q,Pi)算出來的值 大於等於1或小於等於]肖,代表查詢字編碼格式q内 所有負數值的位置資訊,皆需補償(分別是減掉或加上) 這個偏移值·。 經濟部中央標準局員工消費合作社印裝 利用編碼格式以及第〇)式所計算的位置偏移量 s(q’Pi) ’可以檢查出輸入的查詢字是否有漏打的情況。 舉例來說,如果操作者要輸入,,fuzzy,,此一查詢字,但是 部漏打了第一個字母,而打成” uzzy„。如果計算其位置 偏移量’會發現其大於或等於丨,表示在前面有漏打的 字母,所以將偏移量補償到編碼後樣式的位置量上,於 是便可以解決查詢字輸入時漏打前面字母的情況。除 22 本紙張从適用中國國家標準(c^y^^10x297公^ 420Π8 經濟部中央標準局貝工消費合作社印裝 五、發明説明(20 ) 此之外,利用本實施例的模糊機制和編碼樣 達到淡化錯誤輸入的效果。 1也可以 第(2)式則是用來表示查詢字(5和同義字 */ p|3 』之間的平 均漢明距離(averageHammingdistance)。其中,右 说h、及q』需要說明。在這之前,三個集合首先 被定義.8。是某一字在編碼樣式中記錄其各字母個數可 此值的集合,為小於27正整數值的集合;%是某_字在 編碼樣式中記錄其各字母第一次出現的位置值負數的集 合,為大於~27負整數值的集合;Se是某一字在編碼樣 式令記錄其各字母的集合,為ASCII值大於等於32字母 的集合。則,山』有4種可能值:一的值、二為屮的 絕對值、三為Pi】的絕對值、及最後為的絕對值。 如下式所示: if ⑷ e \ and (A = 〇 〇r 巧 e )) or (qj =0and(i* eSporPv eSc))〇r i9j e Sc and {Py = 0 or eSp)) or (9/ e Sp and PtJ e Sp and ^ PtJ ) or (gj e Sc and PtJ e SQ and qy Φ Ptj) \q, I if (qj e S„ and (Pi} e Sp or Pt) e Sc)) \pi, I if ^ S„ and (qj e Sp or qs e Sc)) - /^1 otherwise Pij有三種可能值,如下式所示: P, wp,sn 1 if(P^SporPveSc) 另外Pij is a negative value, pij 'is equal to 1 otherwise it is 0, if qj and Pu are both negative,% is equal to q "otherwise 4 0. When the position offset s (q, Pi) is greater than or equal to 1 or less than or equal to] Shaw, it means that all negative position information in the query word encoding format q must be compensated (reduced or added respectively) This offset value ... Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs Using the coding format and the position offset s (q’Pi) ′ calculated by Equation (0), it is possible to check whether the input query word is missed. For example, if the operator wants to input, "fuzzy", the query word, but the first letter is missed, and it is typed "uzzy". If the position offset is calculated, it will be found to be greater than or equal to 丨, which indicates that there are missing letters in the front, so the offset is compensated to the position of the encoded style, so the missing words in the query input can be solved. The case of the preceding letters. Except that 22 papers are printed from the applicable Chinese national standard (c ^ y ^^ 10x297) ^ 420Π8 Printed by the Shellfish Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs 5. Description of the invention (20) In addition, the fuzzy mechanism and encoding of this embodiment are used In order to achieve the effect of reducing the wrong input. 1 can also be used (2) is used to represent the query word (5 and synonym * / p | 3 "average Hamming distance (averageHammingdistance). Among them, the right says h ", And q" need to be explained. Prior to this, the three sets are first defined. 8. It is a set that records the number of each letter of a word in the coding style. This value is a set of less than 27 positive integers;% Is the set of negative values of the first occurrence of each letter of a _ character in the coding style, which is a set of negative integer values greater than ~ 27; Se is the set of individual letters of a word in the coding style order, which is The ASCII value is greater than or equal to the set of 32 letters. Then, there are 4 possible values of the mountain: the value of one, the absolute value of 屮, the absolute value of three, and the absolute value of the last. The following formula shows: if ⑷ e \ and (A = 〇〇r 巧 e)) or (qj = 0and (i * eSporPv eSc)) 〇r i9j e Sc and (Py = 0 or eSp)) or (9 / e Sp and PtJ e Sp and ^ PtJ) or (gj e Sc and PtJ e SQ and qy Φ Ptj ) \ q, I if (qj e S „and (Pi) e Sp or Pt) e Sc)) \ pi, I if ^ S„ and (qj e Sp or qs e Sc))-/ ^ 1 otherwise Pij Three possible values are shown below: P, wp, sn 1 if (P ^ SporPveSc)

I 1—I i n n'i —J. h -- I __ I {請先閲讀背面之注意事項再填寫本頁J 訂 d„I 1—I i n n'i —J. H-I __ I {Please read the notes on the back before filling in this page J Order d „

K 0 otherwise 以及,q/也有三種可能值,如下式所示: 23 本紙張尺度適用中國國家標牟(CNS ) A4規格(210Χ2π公釐) 42Q778 A7 B7 經濟部中央標準局舅工消费合作社印製 五、發明説明(21 ) 1 li{q^SP0r^j^Sc) 0 otherwise 第(3)式則是根據平均漢明距離,計算兩者則的相似 度,其+ Fe分別表示兩個模糊子(fuzzifier),其定 義了編碼後樣式和樣式間,平均漢明距離之歸屬函數 (membership function)所涵蓋的範圍,以及對平均漢明距 離的敏感度。藉此’即可以產生某—查詢字對應於某— 同義字之相似度值。 上述針對查詢字11計算相似度值的情況來描述,但 是同樣可以ϋ用在操取元件關鍵帛19言十算相似度值的 情況。另夕卜’ L2層節點之功能對應於第3圖的相似度計 算部103。 L3層 在L3層中包含一個節點SA。節點SA是用來找出l2 層各節點所送出的相似度值中,與輸入的查詢字u最相 似。在本實施例中,節點S A主要在學習模式中作用,其 才呆作表不如下: μ(Κ,Ρ):气邮乃 ⑷ 其中Κ表示查詢字11或是擷取元件關鍵字19 d③表 示總合函數(aggregation function),在本實例中為最大值 (max)函數。而μ(Κ,Ρ*)即表示所找到與κ最相近的同義 予所對應的L 2郎點其相似度輸出。另外在本實施例中, 即點SA所輸出的SA0為一向量’除了上述與查詢字11 或是操取元件關鍵字19的一對最相似的值之外,尚包括 24 本紙張尺度適用中囷國家榇準(CNS ) 規格(210X297公釐) (請先閲請背面之注意事項再填寫本頁)K 0 otherwise and q / also have three possible values, as shown in the following formula: 23 This paper size applies to China National Standards (CNS) A4 specifications (210 × 2π mm) 42Q778 A7 B7 V. Description of the invention (21) 1 li {q ^ SP0r ^ j ^ Sc) 0 otherwise Equation (3) calculates the similarity between the two based on the average Hamming distance, and + Fe respectively represents two fuzzy sub-suppliers. (Fuzzifier), which defines the range covered by the membership function of the average Hamming distance after encoding and between styles, and the sensitivity to the average Hamming distance. In this way, the similarity value of a certain query word corresponding to a certain synonym can be generated. The above describes the case where the query word 11 calculates the similarity value, but it can also be used in the case of manipulating the key of the component. In addition, the function of the L2 layer node corresponds to the similarity calculation unit 103 in FIG. 3. L3 layer In the L3 layer, a node SA is included. The node SA is used to find out the similarity value sent by each node in the l2 layer, which is most similar to the input query word u. In this embodiment, the node SA mainly plays a role in the learning mode, and its behavior is not as follows: μ (Κ, P): 气 邮 乃 ⑷ where K represents a query word 11 or a component keyword 19 d③ represents Aggregation function, in this example, the maximum function. And μ (κ, P *) indicates the similarity output of the L 2 point corresponding to the synonym I found closest to κ. In addition, in this embodiment, the SA0 outputted by the point SA is a vector. In addition to the above-mentioned pair of values most similar to the query word 11 or the manipulation element keyword 19, it also includes 24 paper scales.囷 National Standards (CNS) Specifications (210X297 mm) (Please read the notes on the back before filling this page)

、1T 420778 A7 B7 五、發明説明(22 ) 與查詢字11或是擷取元件關鍵字! 9最相似的—對L2層 節點w。其作用將隨後與L5層一併說明,而其功能則對 應於第3圖的判斷部1 〇7。 在L4層中,包括了節點尺丨、r2、R3、r4、…、R ,, 1T 420778 A7 B7 V. Description of the invention (22) and query word 11 or extract component keywords! 9 is most similar—for L2 node w. Its function will be described later with the L5 layer, and its function corresponds to the judgment section 107 of FIG. 3. In the L4 layer, node scales, r2, R3, r4, ..., R are included.

C 分別對應於系統中所設定的關鍵字(keyw〇rd)。如前所 述,某一系統關鍵字可能對應數個系統設定的同義字’ 但各同義字與系統關鍵字之間的關聯性可能並不相同。 L4層中節點的動作,根據上述L2層所計算出的相似度 值(分別對應於一系統同義字),以及對應於各L2層節點 =關聯性值,決定輸入查詢字u與系統所設定的各關鍵 子之間的依存關係。在本實施例中,查詢字丨丨和關鍵字 之間依存關係可以利用相似度值中最大者(即最相似的 同義字)和其對應的關聯性度取最小值來決定。接著,每 個L4層節點均會設定一臨界值响表示L4節點的指 標)’用來韻㈣字U_L4節賴㈣之系統關鍵 字間是否符合要求。如果符合此條件,該L4節點便可以 輸出所對應的關鍵字,做為索弓I關鍵字13,送到資訊索 引程序部3 » 每L4節點在回憶模式和學習模式下,具有不同的 運算特性。在回憶模式下所執行的是—次_運算,在 子I模式下所執行的是二次min運算。每個節點所執 行運算,可以利用下式來表示。 本紙張尺度適用中國國 -m - - Inhr I I K^. ml I in m s- I (請先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局貝工消費合作社印¾. J778 A7 五、發明説明(23 ) ~~~ '—- [Null^othenvise (6) °l(^) = min(min^,^)) ⑺ -----------篆------- ? ^-tl (請先閱讀背面之注意事項再填寫本頁) 其中,第(5)式和第⑹式是在回憶模式下執行 式會在學習模式下執行。在第⑺式中’免是表示與第】 個^層節點相關之同義字之關聯度值。&指第1個Η f節點,。和以分別表示與此L4層節點相關之同義 字^即與第i個L4;f節點相連接的L2層節點^和&。 q是查詢字11編碼樣式,k是擷取元件關鍵字Μ編碼樣 式。利用第(5)式’可以計算出查詢字與相關關鍵字之間 的依存程度。 在第(6)式中,LR代表示系統關鍵字的辨識碼指標’ 用來指到真正的字,而非其編碼形式。另外,利用臨界 值Θ?的設定,可以讓操作者來設定是採用廣義仆⑺以以 term)、狹義(narrower term)或正常(normai term)來處理。 亦即,當臨界值愈小時’表示是採用廣義來處理;當 臨界值Θ/愈大時,表示是採用狹義來處理。在第(?)式 中’ 〇(Ρ!)和0(Ρ2)是分別代表與此L4層節點相關之 層節點所輸出之值。 經濟部中决標準局員工消費合作社印裝 最後’ 〇R(Ri)和Ol^)是分別表示此L4層節點在回 憶模式和學習模式下的輸出,分別被送到資訊索引程序 部3和L5層中。L4層則是對應於第3圖的關聯部1〇5。 L5層 L5層只有包含一個節點RA,並在學習模式下接收來 26 本紙張尺度適用中國國冬標準(CNS ) Λ4規格(2丨0χ29?公麓) 420778 經濟部中央樣隼局員工消費合作社印製 Λ7 五、發明説明(24 ) 一'-- 自L4層各節點的輸入。節點RA的作用是在^層 點所輸出的相關性(依存程序)中,選擇最大者 即 其動作可描述如下: 〇’ (5) L5層和L3層在本實施例是在學習模式中,判斷是否 要進行修正,以及如何修正。如前所述,在本實施例^, ^3層會輸出至少兩個最大的相似度值,分別對應於查詢 字11(其編碼樣式為q)和擷取元件關鍵字19(其編碼樣式 為k)。L5層則會輸出一個最大的輸出值。利用這三^ 值,以及兩個獨立設定之臨界值1和1,可以決定個人 化同義字調整部109之調整動作。 (情況一)當節點RA(L5)的輸出值大於臨界值化,或者 是節點SA(L3)所輸出的兩個相似度值皆大於臨界值匕 時,執行下列之學習程序: 1 (a) 針對L2層中對應於此兩個最大相似度值之節點 (設為沢〗和W2),其中所儲存之系統同義字編碼樣式pW1 和pW2進行修改。修正方式如下所示,其中π和π分別 代表被異動的次數。 /f = W(炉1+1)<+1/(,丨+1)·% pr1 = Nv2 /(JVW2 +1) · +1 /(ATW2 +1). 'kiC corresponds to the keywords (keyword) set in the system. As mentioned above, a system keyword may correspond to several synonymous words set by the system, but the correlation between each synonym and the system keyword may be different. The actions of the nodes in the L4 layer are determined according to the similarity value calculated by the above L2 layer (corresponding to a system synonym respectively), and corresponding to each L2 layer node = relevance value. Dependencies between key children. In this embodiment, the dependency relationship between the query word and the keyword can be determined by taking the smallest of the similarity values (that is, the most similar synonyms) and the corresponding correlation degree. Next, each L4 layer node will set a critical value (indicating the index of the L4 node) 'whether the system key words used in the rhyme word U_L4 section meet the requirements. If this condition is met, the L4 node can output the corresponding keyword as the cable bow I keyword 13 and send it to the information indexing program section 3 »Each L4 node has different computing characteristics in the recall mode and the learning mode . The one-time operation is performed in the recall mode, and the second min operation is performed in the sub-I mode. The operation performed by each node can be expressed by the following formula. This paper size applies to China -m--Inhr IIK ^. Ml I in m s- I (Please read the precautions on the back before filling this page) Printed by the Bayer Consumer Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs ¾. J778 A7 V. Description of the Invention (23) ~~~ '—- [Null ^ othenvise (6) ° l (^) = min (min ^, ^)) ⑺ ----------- 篆 ----- -? ^ -tl (Please read the notes on the back before filling out this page) Among them, formula (5) and formula ⑹ are executed in the recall mode and will be executed in the learning mode. In the formula (1), the 'free' means the value of the degree of synonymy of the synonymous words related to the ^ th node. & refers to the first Η f node. And L respectively represent synonymous words ^ associated with this L4 layer node, that is, L2 layer nodes ^ and & connected to the i-th L4; f node. q is the query word 11 encoding style, and k is the extraction component key M encoding style. Using Equation (5) ', the degree of dependence between the query word and the related keywords can be calculated. In the formula (6), the LR code represents an identification code index of a system keyword, and is used to refer to a real word, not its coded form. In addition, with the setting of the threshold value Θ ?, the operator can be set to use generalized servants to deal with terms, narrower terms, or normal terms. That is, when the threshold value is smaller, it means that it is processed in a broad sense; when the threshold value Θ / is larger, it means that it is processed in a narrow sense. In the formula (?), 〇 (P!) And 0 (P2) are the values output by the layer nodes related to this L4 layer node, respectively. The final print of the employees ’cooperatives of the Bureau of Decision and Standards of the Ministry of Economic Affairs, 〇R (Ri) and Ol ^) is the output of this L4 node in the recall mode and the learning mode, respectively, and sent to the information index program department 3 and L5 Layer. The L4 layer is the associated part 105 corresponding to FIG. 3. L5 layer L5 layer contains only one node RA, and it receives 26 paper sizes in the learning mode. It is applicable to the Chinese National Winter Standard (CNS) Λ4 specification (2 丨 0χ29? Gonglu). System Λ7 V. Description of the invention (24) a '-input from each node of the L4 layer. The role of the node RA is to select the largest one in the correlation (dependency program) output by the layer points, and its action can be described as follows: 0 '(5) The L5 layer and the L3 layer are in the learning mode in this embodiment, Determine whether and how to make corrections. As mentioned before, in this embodiment, the ^ 3 layer will output at least two maximum similarity values, which respectively correspond to the query word 11 (whose coding style is q) and the retrieval component keyword 19 (whose coding style is k). The L5 layer will output a maximum output value. Using these three values and two independently set thresholds 1 and 1, the adjustment operation of the personalized synonym adjustment unit 109 can be determined. (Case 1) When the output value of node RA (L5) is greater than the threshold value, or the two similarity values output by node SA (L3) are both greater than the threshold value, the following learning procedure is performed: 1 (a) For the nodes in the L2 layer corresponding to the two maximum similarity values (set to 沢〗 and W2), the stored system synonym word coding styles pW1 and pW2 are modified. The correction method is shown below, where π and π respectively represent the number of times of being changed. / f = W (furnace 1 + 1) < + 1 / (, 丨 +1) ·% pr1 = Nv2 / (JVW2 +1) · +1 / (ATW2 +1). 'ki

Nvl =jVH"+l 1 (9)Nvl = jVH " + l 1 (9)

Nw2 =Nwi+l 其中 i=l ··· n。 (b) 假如節點RA(L5)的輸出值小於臨界值θ3,則建立 一新的L4節點w3,其内儲存之關聯度值和被異動次數 本紙張尺度適用中國國家標準(CNS ) A4規格(2〗Ox297公楚) {請先閱讀背面之注意事項再填寫本頁) :衣· 420778 五、發明説明(25 ) 之初始值分別設為: Λ7 B7Nw2 = Nwi + l where i = l ··· n. (b) If the output value of the node RA (L5) is less than the critical value θ3, a new L4 node w3 is established, and the correlation degree value and the number of times it has been changed are stored in this paper. 2〗 (Ox297). {Please read the precautions on the back before filling this page): Clothing · 420778 V. The initial value of the description of the invention (25) is set as: Λ7 B7

W w3 = 1.0 經濟部中央標準局員工消費合作社印製 (13)W w3 = 1.0 Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs (13)

Nwi=\ (10) 否則’對於L4層中以min運算輸出最大值的[4層節點, 其所儲存之關聯性值r3進行修改,是為正向學習。修正 方式如下所示,其中ΪΤ代表被異動的次數。Nwi = \ (10) Otherwise, for the [4th layer node in the L4 layer that outputs the maximum value with min operation, the stored correlation value r3 is modified for forward learning. The correction method is shown below, where ΪΤ represents the number of times that it has been changed.

Ww3 = Nw3 /(Nwi +l).W^+\/(N^ +1).L〇Ww3 = Nw3 / (Nwi + l) .W ^ + \ / (N ^ +1) .L〇

Nw3=N^+l (11) (c)針對L4層中並非是最大值的L4層節點,但是在會 在回憶模式中送出輸出值。假設其節點中所儲存之關聯 性值為妃,而纪代表被異動的次數,其修正方式如下所 示,是為負向學習。Nw3 = N ^ + l (11) (c) For the L4 layer nodes that are not the maximum value in the L4 layer, the output value is sent in the recall mode. Assume that the correlation value stored in its node is Fei, and Ji represents the number of times that it has been changed. The correction method is shown below, which is negative learning.

Wr ={Nr +b)l{Nr ->rb + \)*Wr +\!{Nr +Wr = (Nr + b) l (Nr-> rb + \) * Wr + \! {Nr +

Nr=Nr+\ ’ (12) 其中’ δ代表一偏值(bias) ’其在避免一關聯性值在最初 幾次的學習當中,如遭遇負向學習時,其值不會一下子 降低太多。 (情況二)當節點RA(L5)的輸出值小於臨界值θ3時,執 行下列之學習程序: (a)如果節點SA(L2)對於查詢字11(編碼樣式q)所輸出 的最大相似度值比臨界值Θ!來得小,則建立一新的L2節 點wl,其内儲存之編碼樣式和被異動次數之初始值分別 設為 ,i =1,...,ηNr = Nr + \ '(12) where' δ represents a bias', which avoids a correlation value in the first few learnings. If negative learning is encountered, its value will not be reduced too much at once. . (Case 2) When the output value of node RA (L5) is less than the critical value θ3, the following learning procedure is performed: (a) If node SA (L2) outputs the maximum similarity value for query word 11 (coding style q) If it is smaller than the critical value Θ !, a new L2 node wl is established, and the initial values of the coding style and the number of times it is changed are set to i, = 1, ..., η

Nw] =1 (b)如果節點SA(L2)對於擷取元件關鍵字19(編碼樣 28 本紙張尺度適用中园國家標準(CNS ) A4规格(210X297公釐) ------- I - I -* - - --- ---1 —. 抑衣_ I - ----- - - I I I- I XT (請先閲讀背面之·;ΐ意事項再填寫本頁j 經濟部中央標準局員工消費合作杜印製 420778 λ7 ___B7 五、發明説明(26) ~~~~~~ 式k)所輸出的最大相似度值比臨界值θ|來得小,則建立 一新的L2節點w2,其内儲存之編碼樣式和被異動次數 之初始值分別設為: /^2 =弋,i = 1,…,ηNw] = 1 (b) If the node SA (L2) is used to retrieve the component key 19 (code sample 28) This paper size applies the China National Standard (CNS) A4 specification (210X297 mm) ------- I -I-*----- --- 1 —. 抑 衣 _ I--------II I- I XT (Please read the back of the page first; intentions before filling out this page j Ministry of Economy Central Standards Bureau staff consumer cooperation print 420778 λ7 ___B7 V. Description of the invention (26) ~~~~~~ Formula k) The maximum similarity value output is smaller than the critical value θ |, then a new L2 node is established w2, the initial values of the coding style stored therein and the number of changed times are set as: / ^ 2 = 弋, i = 1, ..., η

Nw2 =1 (Η) (c)建立一新的L4節點W3,其内儲存之關聯度值和 被異動次數之初始值分別設為:Nw2 = 1 (Η) (c) Establish a new L4 node W3, and set the initial value of the correlation value and the number of times it is changed as:

Ww3 =1.0 ^3=1 (15) 根據上述對於同義字程序部1和個人化同義字調整部 9所做之說明,即知本實施例之資訊(軟體元件)擷取系 統,不僅可以動態地調整同義字程序内之架構,同時也 能夠提供輸入容錯的能力。 資訊索引裎庠都(3、 第6圖表示本實施例之資訊索引程序部3的功能結構 方塊圖。如圖所示’資訊索引程序部3包括索引關鍵字 暫存部301和軟體元件索引單元3〇3,其根據同義字程 序部1所送出的索引關鍵字13,產生實際對應於某些符 合條件之軟體元件ID 15。 如前所述,在本實施例十,操作者輸入的查詢項共有 4個’分別為軟體元件功能(functi〇n)、關聯媒介 (medium)、軟體元件類型(file type)和系統類型 \ype),如第3圖所示。其中’只有軟體元件功能的輸入 是使用自由設定,並且經過同義字程序部丨的處理後產 生數個有關聯性的索引關鍵字13,至於其他的索引項則 ___ 29 ^"€尺度適用巾ίί家標( CNS ) A4規格(2IGX 297公釐)' {請先閱讀背面之注意事項再填寫本頁) 浪- ,1Τ Λ7 420778 —---------B7^_ 五、發明説明(27 ) 〜· (請先閱讀背面之注意事項再填寫本頁) 是固定由系統所提供之關鍵字中擇其一。依序選擇出利 用同義字程序部1所找出的索引關鍵字13(僅對應於軟 體元件功能項)’並將4個查詢項的關鍵字組合成—組實 際的查詢資料’送到索引關鍵字暫存部3〇1中分別存放。 接著’將存放在索引關鍵字暫存部301中的各關鍵字, 同時送到軟體元件索引單元303中進行比對,找出符合 一定相關性的軟體元件,再將這些軟體元的辨識碼15送 出’即完成了軟體元件擷取的功能。上述的處理會一直 持續到所有找出的索引關鍵字都完成搜尋為止。 第7圖則是表示本實施例中利用模糊類神經網路之結 構’實現資訊索引程序部3的詳細結構示意圖。用來實 施此資訊索引程序部3的模糊類神經網路中,包括了兩 層’即L6和L7 ’分別對應於第6圖中的索引關鍵字暫 存部301和軟體元件索引單元3〇3。以下分別就[6層和 [7層之節點功能加以詳細說明。 L6層 L6層是由m個子層所構成,分別標示為L61 、 L62..... Mm ;而每個子層則各具有η個節點,例如 經濟部中央標準局員工消費合作社印製 子層L61包括QJ到qJ的η個節點,而子層L6m則包括 到Qnm的η個節點等等。每個子層是分別用來儲存各 查詢關鍵字組中的一個關鍵字,因此,m表示在每組查 詢字組中所包括的關鍵字,在本實施例中m則為4 ;而η 則表示關鍵字最大可能字母數,本實施例中η為21。 換言之’在本實施例中,子層L61中的節點Q!1到Q21* 30 本中國家標f CNS) A4規格(210x 297公釐 - ^ — 420778 Λ7 B7 五、發明説明(28 ) 是用來儲存有關於軟體元件功能(function)的關鍵字而 此關鍵字則利用同義字程序部1找出的索引關鍵字13, 依序加以設定。子層L62中的節點Q,2到q212,是用來 儲存有關於關聯媒介(medium)的關鍵字。子層[63中的 節點Q!3到Q2〆,是用來儲存有關於軟體元件類型(file type)的關鍵字。子層L64中的節點Q44到q2|4,則是用 來儲存有關系統類型(system type)的關鍵字。在本實施例 中,後二個關鍵字是從系統預先設定的關鍵字中選擇出 來。 L7層 L7層中包括u個節點,分別標示為Ci、C2、&.....Ww3 = 1.0 ^ 3 = 1 (15) According to the descriptions of the synonym program unit 1 and the personalized synonym adjustment unit 9 described above, the information (software component) acquisition system of this embodiment is known, and it can not only dynamically Adjust the structure in the synonym program, and also provide the ability to input fault tolerance. The information index program (3, 6) shows a functional block diagram of the information indexing program section 3 of this embodiment. As shown in the figure, the 'information indexing program section 3 includes an index key temporary storage section 301 and a software component indexing unit. 3〇3, according to the index key 13 sent by the synonym program unit 1, generates the software component ID 15 which actually corresponds to certain conditions. As mentioned above, in the tenth embodiment, the query item entered by the operator There are 4's, which are software component function (functión), associated medium (medium), software component type (file type), and system type \ ype), as shown in Figure 3. Among them, the input of only software component functions is set freely, and after processing by the synonym program section, several related index keywords 13 are generated. For other index items, ___ 29 ^ " € scale applies Towel ί House Logo (CNS) A4 Specification (2IGX 297 mm) '{Please read the precautions on the back before filling this page) Wave-, 1Τ Λ7 420778 —--------- B7 ^ _ V. Description of the invention (27) ~ (Please read the notes on the back before filling this page) is one of the keywords provided by the system. Select the index key 13 (only corresponding to the software component function item) found by the synonym program section 1 in order, and combine the keywords of the four query items into a group of actual query data to the index key. The word temporary storage section 301 is stored separately. Then 'the keywords stored in the index key temporary storage unit 301 are sent to the software component index unit 303 for comparison at the same time to find the software components that meet a certain correlation, and then the identification code of these software elements is 15 Submit 'completes the function of software component capture. The above processing will continue until all the index keywords found are searched. Fig. 7 is a schematic diagram showing the detailed structure of the information indexing program section 3 using the structure of a fuzzy neural network in this embodiment. The fuzzy neural network used to implement this information indexing program section 3 includes two layers, namely L6 and L7, corresponding to the index key temporary storage section 301 and the software component indexing unit 3 in FIG. 6, respectively. . The node functions of [6 layer and [7 layer] are described in detail below. L6 layer L6 layer is composed of m sub-layers, which are labeled as L61, L62, .... Mm; and each sub-layer has η nodes, such as the printed sub-layer L61 of the Employees' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs. It includes n nodes from QJ to qJ, and the sub-layer L6m includes n nodes to Qnm and so on. Each sub-layer is used to store a keyword in each query keyword group. Therefore, m represents the keywords included in each query keyword group, and m is 4 in this embodiment; and η represents The maximum possible number of letters for a keyword. In this embodiment, n is 21. In other words, 'in this embodiment, the nodes Q! 1 to Q21 in the sub-layer L61 * 30 national standard f CNS in this standard A4 specification (210x 297 mm-^-420778 Λ7 B7 V. The description of the invention (28) is used To store a keyword about the function of the software component, and this keyword is sequentially set using the index keyword 13 found by the synonym program unit 1. The nodes Q, 2 to q212 in the sub-layer L62 are Used to store keywords related to the medium. Nodes Q! 3 to Q2 in the sublayer [63] are used to store keywords related to the software type. The sublayer L64 The nodes Q44 to q2 | 4 are used to store keywords related to the system type. In this embodiment, the latter two keywords are selected from the keywords preset by the system. L7 layer L7 layer Includes u nodes, labeled Ci, C2, & .....

Cu。每個L7層的節點分別對應於系統所設定的軟體元 件。在每個L7層節點中’分別存放著用來描述對應軟體 元件的軟體元件關鍵字,以及該對應軟體元件所具有的 辨識碼。其中’軟體元件關鍵字具有對應於4個查詢項 之關鍵字,其值是在該L7層節點建立時,由L6層節點 所輸入。辨識碼(ID)則是用來指出此軟體元件在資料庫 中的位置或是在網際網路上的存取位置。 每個L7層節點會分別接收來自L6層節點所傳來的各 關鍵字’並與其内儲存的軟體元件關鍵字進行比對,如 果符合條件時’便可以將此軟體元件所對應的辨識碼加 以輸出。含_L7層節點的動作’可以利用下式來描述: α(ϋΑ)=Σ^ (16) 31 本紙張尺度適用中國國家標準(CNS)A4規格(210X 297公釐) (請先閲讀背面之注意事項再填寫本頁) '11 經濟部中央標準局員工消費合作社印製 420778 五、發明説明(29 f〜 Λ7 B7 if Km ^ OandiTm ^ c/ 0 ifKm =0 (1 - ) otherwise (17) (18) (19) M(Q,C,) = 2 * (1 + {d(Q, C,)/ Fdfe )-1 〇(Q = {MC')ifM 泛, [Null ? otherwise 第(16)式是用來計算輸入之關鍵字組α(由4種關鍵字 組成)和第i個節點Q所儲存描述第丨個軟體元件之軟體 元件關鍵字組C, ’兩者間的距離。在本實施例中,豆包括 了 4個向量(1〜m) ’每個向量包括了 21個元素(1〜n),同 樣的’ Ci也包括了 4個向量,每個向量也包括了 21個元 素。另外,其中Am值則是表示其中一個向量(1中之 一)的距離,由第(17)式計算。 第(17)式中是計算輸入關鍵字泛中某一個向量和軟體 元件關鍵字編碼樣式Ci中對應向量間的距離Am。特別 是只有計算其中輸入之關鍵字3中確實有輸入關鍵字的 查詢項(向量)部分,至於未輸入關鍵字的查詢項部分τ 則將距離Am設為〇。藉由這樣的處理,即使在操作者因 為無法確定某一關鍵字(例如無法確定系統類型)時所進 行的查詢’本實施例也能順利地進行處理,也就是可以 達到模糊輸入擷取的目的。其中表示輸入之關鍵字組 Q中第m個向量(即查詢項)的權值,也藉以表示該向量在 輸入關鍵字組中所有向量的重要度。 第18式則是用來計算輸入之關鍵字組g和原本儲存 節點Q的軟體元件關鍵字組ζ之間的相似度值。其尹, 32 本紙張尺度適用中國國家榡準(CNS ) Λ4規格(2! 0 X 297公釐) m —^—fr —I— —I rlr — - I u^i -¾ ,-'e (請先閲讀背面之注意事項再填寫本頁) 經濟部中央標準局貞工消費合作社印11 420ϊ78 Λ7 Β7 五、發明説明(30 ) ΪΜ’和Fe’為模糊運算子,用來描述輸入的關鍵字級豆和 此節點儲存之軟體元件關鍵字組巧之間,距離之歸屬函 數所涵蓋的範圍以及對距離的敏感度。在第18式所計算 出的相似度值’不僅可以提供給此L7節點,判斷是否可 以輸出其所對應之軟體元件之辨識碼,同時也可以提供 給元件排序過濾程序部5,來判斷被擷取出的軟體元件 在排序上的順序。 第19式則是用來實際判斷是否擷取此L7節點所對應 的軟體元件。也就是說,如果輸入之關鍵字組$和描述 該軟體元件之軟體元件關鍵字組ζ的相似度值〆⑦巧)超 過一既定臨界值Θ4時,便可以決定輸出此軟體元件的辨 識碼LcCCi),同時也輸出其對應的相似度值到元件排序 過濾程序部5中,進行後續的處理。 以下則是說明各L7節點中臨界值04的設定問題。 在開始使用本發明之資訊擷取系統時,操作者會先在 如第2圖所示的查詢晝面上,輸入各查詢項的查詢字, 用來索引軟體元件資料庫。在本實施例中所採用的查詢 字輸入方法,有的是直接由操作者自由地輸入,例如軟 體元件功能62 ’另外有些則是使用下拉式(p0p_up)選項 方式讓操作者來挑選,例如關聯媒介64、軟體元件類型 66和系統類型68。在採用直接鍵入的軟體元件功能62 中’操作者可以填入多個查詢字,查詢字與查詢字之間 則是用分隔號隔開。而系統在處理該查詢項時,會將每 個查詢字分開,一次處理一個,最後才將各個結果合併 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) .---^----衣— (請先閲讀背面之注意事項再填寫本頁) -钉. 經濟部中央標準局負工消费合作社印裝 420778 Α7 Β7 經濟部中央標準局員工消費合作社印製 五、發明説明(31 ) ~ " ,來處理。而以下拉式選項方式輸人者,因為各個選項 幾乎是獨立且互斥,所以每次只能挑選一個選項。另外, 也只有軟體元件功能62中的查詢字才會經由同義字程 序部1的處理,而得到許多的關鍵字;至於其他不需同 義字處理的查詢項,則由操作者所選擇的查詢字直接設 為關鍵字。在資訊索引程序部3巾,每次則就從每個查 4項中,分別挑選一個關鍵字組合成一組關鍵字組,進 行元件的擷取處理。 由資訊索引程序部3所找到的所有軟體元件,此時如 果全部交由操作者來選擇,可能會發生三種情況。第一、 被擷取的軟體元件數量適中,因此操作者馬上可以找到 適合的軟體元件。第二、被擷取的軟體元件數量過多, 操作者很難選擇。第三、被擷取的軟體元件數量太少, 操作者找不到適合的軟體元件。而後述的元件排序過濾 程序部5則可以調整各L7層節點的臨界值04,來解決後 面兩種情況所產生的問題。當軟體元件的數量太多時, 則調尚臨界值Θ4。也就是說,用來控制軟體元件是否輸 出的臨界值Θ4 ’在輸出的軟體元件數量太多的時候,會 被調高以減少軟體元件輸出的個數。相反的,當軟體元 件的數量太少時,則需調低臨界值θ4,讓操作者能夠從 更多的軟體元件中進行選擇。 根據上述對於資訊索引程序部3所做之說明,即知本 實施例之資訊(軟體元件)擷取系統,能夠針對操作者所 輸入的查詢字’以模糊擷取的方式進行軟體元件的擷 本紙張尺度適用中國國家榇準(CNS ) Α4規格(2!〇χ297公釐) II----- n I 1 ^----- I__Γ (請先聞讀背面之注意事項存琪寫本育) 420778 Λ7 ________B7 五、發明説明(32 ) 取。也就是說,即使操作者無法確定某一查詢項的正確 内谷,此資訊掏取系統也能夠以最接近的方式來找到所 有可能的軟體元件。 元件排序過瀘鋥庠都(5、 元件排序過濾程序部5除了利用臨界值04來過濾所掏 取的軟體元件數量,如上所述,也可以根據資訊索引程 序部3所送出的對應相似度值,來判斷各軟體元件的順 序。換&之’當相似度值愈大時’表示輸入之查詢字組 和描述該軟體元件的關鍵字較為接近,因此其優先順序 較高;相對地,當相似度值愈小時,表示兩者間差異性 較大,所以其優先順序較低。在輸出所擷取之軟體元件 以供操作者進行選擇時,即可以根據此相似度值的大小 來決定輸出的優先順序,讓操作者能夠較快地選擇所需 要的軟體元件。 個人化排序過濾調整部7則可以根據操作者實際選擇 的軟體元件,來調整元件排序過濾程序部5内的=序機 制和過濾機制,以便根據個人使用之習慣和擷取喜好進 行調整。 經濟部中央標準局周工消費合作杜印掣 --..—.---裝— <請先閲讀背面之注意Ϋ項再填窍本頁〕 根據以上所述,本實施例之軟體元件擷取系統具有下 列之優點: ~ 1 .可以根據操作者的使用習慣以及個人喜好,彈性 地調整同義字字典,以方便查詢的進行; 2.可以提供輸入容錯的能力; 3 .提供不確定資訊的表現和處理,亦即提供模糊查 35 氏1尺度通用巾SS]家標率(CNS ) A4規格(2丨GXW?公趋Γ -------------- Λ7 420778 _______B7 五、發明説明(33) ' 詢的方法; 4 .可以依據操作者的擷取喜好和使用習慣,對於所 擷取出的軟體元件進行排序並且過濾相關性較小的軟體 元件,以增加操作者在選取時的便利性。 本發明雖以一較佳實施例揭露如上,然其並非用以限 定本發明,任何熟習此項技藝者,在不脫離本發明之精 神和範圍内’當可做些許的更動與潤飾,因此本發明之 保護範圍當視後附之申請專利範圍所界定者為準。 (諳先閱讀背面之注意事項再填寫本頁) 經濟部中央標準局員工消費合作社印製 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐)Cu. The nodes of each L7 layer correspond to the software components set by the system. In each L7 layer node, a software component keyword used to describe the corresponding software component and an identification code possessed by the corresponding software component are stored. Among them, the software component keyword has keywords corresponding to 4 query terms, and the value is input by the L6 layer node when the L7 layer node is established. The identification code (ID) is used to indicate the location of the software component in the database or the access location on the Internet. Each L7 layer node will receive each keyword from the L6 layer node and compare it with the keywords of the software components stored in it. If the conditions are met, then the identification code corresponding to this software component can be added. Output. The action of the node with _L7 layer can be described by the following formula: α (ϋΑ) = Σ ^ (16) 31 This paper size is applicable to the Chinese National Standard (CNS) A4 specification (210X 297 mm) (Please read the Please fill in this page again for attention) '11 Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 420778 V. Description of the invention (29 f ~ Λ7 B7 if Km ^ OandiTm ^ c / 0 ifKm = 0 (1-) otherwise (17) ( 18) (19) M (Q, C,) = 2 * (1 + (d (Q, C,) / Fdfe) -1 〇 (Q = (MC ') ifM general, [Null? Otherwise No. (16) The formula is used to calculate the distance between the input keyword group α (consisting of 4 keywords) and the software component keyword group C stored in the i-th node Q describing the first software component. In the embodiment, the bean includes 4 vectors (1 to m) 'Each vector includes 21 elements (1 to n), and the same' Ci also includes 4 vectors, each vector also includes 21 elements In addition, the Am value is the distance representing one of the vectors (one of 1), which is calculated by Equation (17). Equation (17) is the calculation of a vector and software components in the input keyword pan. The distance Am between the corresponding vectors in the key encoding style Ci. In particular, only the query term (vector) part where the input keyword 3 does have an input keyword is calculated, and for the query term part τ without a keyword input, the distance is Am is set to 0. With this processing, even if the query is performed when the operator cannot determine a certain keyword (for example, the system type cannot be determined) 'this embodiment can be processed smoothly, that is, blurring can be achieved The purpose of input extraction, which represents the weight of the m-th vector (ie, the query term) in the input keyword group Q, and also the importance of all the vectors in the input keyword group. Equation 18 is It is used to calculate the similarity value between the input keyword group g and the software component keyword group ζ of the original storage node Q. Its Yin, 32 paper standards are applicable to China National Standards (CNS) Λ4 specification (2! 0 X 297 mm) m — ^ — fr —I— —I rlr —-I u ^ i -¾, -'e (Please read the notes on the back before filling this page) Printed by Zhengong Consumer Cooperative, Central Bureau of Standards, Ministry of Economic Affairs 11 420ϊ78 Λ7 Β7 V. Description of the invention (30) ΪΜ 'and Fe' are fuzzy operators, which are used to describe the range covered by the attribution function of the distance between the input keyword-level bean and the software component keyword set stored in this node. And the sensitivity to distance. The similarity value 'calculated in Equation 18 can not only be provided to this L7 node to determine whether it can output the identification code of the corresponding software component, but also can be provided to the component sorting and filtering program. The unit 5 determines the order in which the extracted software components are sorted. Equation 19 is used to actually determine whether to retrieve the software components corresponding to this L7 node. In other words, if the similarity value of the input keyword group $ and the software component keyword group ζ describing the software component is coincident) exceeds a predetermined threshold Θ4, it can be decided to output the software component identification code LcCCi ), And also output the corresponding similarity value to the component sorting and filtering program unit 5 for subsequent processing. The following is a description of the setting of the threshold value 04 in each L7 node. When starting to use the information retrieval system of the present invention, the operator will first input the query words of each query item on the query day surface as shown in Figure 2 to index the software component database. Some query word input methods used in this embodiment are directly input by the operator freely, for example, the software component function 62 ', while others use the pull-down (p0p_up) option to let the operator choose, such as the associated media 64 , Software component type 66, and system type 68. In the software component function 62 using direct typing, the operator can fill in multiple query words, and the query words and the query words are separated by a separator. When the system processes the query item, it will separate each query word, process one at a time, and finally combine the results. This paper size applies the Chinese National Standard (CNS) A4 specification (210X297 mm). --- ^- --- Clothing— (Please read the notes on the back before filling out this page)-Nail. Printed by the Consumers 'Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs 420778 Α7 Β7 Printed by the Consumers' Cooperative of the Central Standards Bureau of the Ministry of Economic Affairs 31) ~ " to handle. However, the following pull-type option losers, because each option is almost independent and mutually exclusive, so only one option can be selected at a time. In addition, only the query words in the software component function 62 will be processed by the synonym program unit 1, and many keywords will be obtained; as for other query items that do not require synonym processing, the query words selected by the operator Set keywords directly. In the information indexing program, there are 3 keywords. Each time, from each of the 4 items in the search, a keyword is selected to form a group of keyword groups for component extraction processing. For all software components found by the information indexing program section 3, if all are selected by the operator at this time, three situations may occur. First, the number of software components retrieved is moderate, so the operator can immediately find a suitable software component. Second, the number of software components retrieved is too large for the operator to choose. Third, the number of software components retrieved is too small, and the operator cannot find a suitable software component. The component sorting and filtering program section 5 described later can adjust the threshold value 04 of each L7 layer node to solve the problems caused by the following two cases. When the number of software components is too large, the critical value Θ4 is adjusted. In other words, the critical value Θ4 ′ used to control whether or not the software components output is adjusted to increase the number of software components to reduce the number of software components output. Conversely, when the number of software components is too small, the threshold value θ4 needs to be lowered to allow the operator to choose from more software components. According to the above description of the information indexing program section 3, it is known that the information (software component) extraction system of this embodiment can perform software component extraction in a fuzzy manner for the query word entered by the operator. Paper size is applicable to China National Standards (CNS) Α4 size (2! 〇χ297mm) II ----- n I 1 ^ ----- I__Γ (Please read the notes on the back first and keep writing and writing) 420778 Λ7 ________B7 V. Description of Invention (32) Take. That is, even if the operator cannot determine the correct inner valley of a query item, this information extraction system can find all possible software components in the closest way. (5) In addition to using the critical value 04 to filter the number of software components extracted, the component sorting filtering unit 5 can also use the corresponding similarity value sent by the information indexing program unit 3 as described above. To determine the order of each software component. For & 'when the similarity value is greater', the entered query phrase and the keyword describing the software component are closer, so their priority is higher; relatively, when The smaller the similarity value, the greater the difference between the two, so its priority is lower. When the extracted software components are output for the operator to choose, the output can be determined according to the size of the similarity value The priority order allows the operator to quickly select the required software components. The personalized sorting and filtering adjustment section 7 can adjust the ordering mechanism and Filtering mechanism to adjust according to personal usage habits and retrieval preferences. Central Industry Bureau, Ministry of Economic Affairs, Weekly Consumption Cooperation, Du Yinhua --..--.----- < Please read the note on the back before filling in this page] According to the above, the software component extraction system of this embodiment has the following advantages: ~ 1. Can be used according to the operator's usage habits and personal preferences, flexibility Adjust synonym dictionary to facilitate query; 2. Can provide the ability to input fault tolerance; 3. Provide the performance and processing of uncertain information, that is, provide fuzzy check 35's 1-scale universal towel SS] family standard rate (CNS ) A4 specification (2 丨 GXW? Public trend Γ -------------- Λ7 420778 _______B7 V. Description of the invention (33) 'Inquiry method; 4. Can be based on the operator's retrieval preferences And usage habits, sorting the extracted software components and filtering the software components with less relevance to increase the convenience of the operator when selecting. Although the present invention is disclosed above in a preferred embodiment, it is not intended to be used. To limit the present invention, anyone skilled in the art can do some modifications and retouching without departing from the spirit and scope of the present invention. Therefore, the scope of protection of the present invention shall be defined by the scope of the appended patents as follows: Standard. (谙 Please read the notes on the back before filling this page) Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs This paper size applies to China National Standard (CNS) A4 (210X297 mm)

Claims (1)

420778 A8 B8 C8 D8 六、申請專利範圍 — ' 1.—種資訊擷取系統,用以根據輸入之查詢字,查詢 出對應之資訊元件,該資訊擷取系統包括一同義字程 序’用以根據該查詢字’搜尋出具有同義關係之至少一 索引關鍵字;一資訊索引程序,用以根據該等索引關鍵 子’搜尋出對應之至少一資訊元件;以及一元件排序過 遽程序’用以將搜尋出之該等資訊元件進行排序及過 濾’輪出以供選擇出所需之資訊元件,其特徵在於: 該同義字程序包括: 一編碼部,用以編碼該查詢字,產生一查詢字編碼樣 式’該查詢字編碼樣式對應於該查詢字之特徵; 一相似度計算部’用以將該查詢字編碼樣式與複數系 統同義字編碼產生之系統同義字編碼樣式進行比對,藉 以產生對應之相似度值’該等相似度值用以表示該查詢 字和該等系統同義字之間的相似程度,該等系統同義字 則分別對應於複數系統關鍵字;以及 一關聯部’根據該相似度計算部所產生之該等相似度 •值’以及該等系統同義字和該等系統關鍵字之間的關聯 性值’與一既定臨界值進行比較,藉以從該等系統關鍵 字中’選擇輸出做為該等索引關鍵字。 2·如申請專利範圍第1項所述之資訊擷取系統,其中 該資訊元件係為各種型式、功能、及關聯媒介,以及適 用在各種作業平台之軟體元件。 3 _如申請專利範圍第1項所述之資訊擷取糸統,其中 該相似度計算部包括: J^i nn —1- I 1^1 1 1^1- I ^11 -M3. 、v" (請先閱讀背面之注意事項再填寫本页) 經濟部中央榡準局員工消費合作社印褽 —37 420778 AS B8 C8 D8 經 中 準 局 貝 工 消 費 入 杜 印 製 肀請專利範圍 複數相似度計算單元,分別用以儲存該等系統同義字 所對應之系統同義字編碼樣式,並接收該查詢字編碼樣 式,根據一既定相似度計算法則,計算對應之相似度值。 4. 如申清專利範圍第I項所述之資訊擷取系統,其肀 該關聯部中包括: 複數關聯單元,分別用以接收該相似度計算部所產生 之相似度值,並且根據該等關聯性值,與該既定臨界值 進行比較,藉以決定對應之系統關鍵字是否做為該索引 關鍵字。 5. 如申請專利範圍第丨項所述之資訊擷取系統其中 Sx編碼。卩係根據該查询字,記錄各字母之數量、字母第 一次出現之位置、以及所有字母,產生對應之查詢字編 碼格式D 6. 如申請專利範圍第5項所述之資訊擁取系統其中 該同義字程序部尚包含一位置量偏移計算部,其利用各 字母之第一次出現之位置,計算其位置量偏移,藉以判 斷是否有輸入錯誤,當存在輸入錯誤時,則對該查詢字 進行補償。 7_ —種資訊擷取系統’用以根據輸入之查詢字’查詢 出對應之資訊元件,該資訊擷取系統包括一同義字程 序,用以根據該查詢字,搜尋出具有同義關係之至少一 索引關鍵字;一資訊索引程序,用以根據該等索引關鍵 字,搜尋出對應之至少一資訊元件;以及一元件排序過 濾程序’用以將搜尋出之該等資訊元件進行排序及過 I I n n- n n Is , j I n - - In TJ ! -9 (請先閱讀背面之注意事項再填寫本頁) 38 經濟部中央標準局員工消費合作社印策 4^iv i /¾ A8 B8 C8 ____US I六、申請專利範圍 ~~ ~~~ 滤,輸出以供選擇出所需之資訊元件,其特徵在於_ 該同義字程序包括: 一編碼部’用以編碼該查詢字以及被選擇之資訊元件 所對應之擷取元件關鍵字,產生一查詢字編碼樣式和一 擷取元件關鍵字編碼樣式,該查詢字編碼樣式對應於該 查詢字之特徵,該擷取元件關鍵字編碼樣式對應於該擷 取元件關鍵字之特徵; 一相似度計算部,用以分別將該查詢字編碼樣式和該 擷取元件關鍵字編碼樣式,與複數系統同義字編碼產生 之系統同義字編碼樣式進行比對,藉以產生對應之第一 組相似度值和第二組相似度值,分別表示該查詢字和該 擷取元件關鍵字,與該等系統同義字之間的相似程度, 該等系統同義字則分別對應於複數系統關鍵字; 一關聯部’根據該相似度計算部所產生之該第一組相 似度值和該第二組相似度值,以及該等系統同義字和該 等系統關鍵字間之原始關聯性值,計算對應之輸出值; 以及 一判斷部,根據該第一組相似度值、該第二組相似度 和該等輸出值,判斷是否修正該相似度計算部以及該關 聯部。 8_如申請專利範圍第7項所述之資訊擷取系統’其中 該資汛擷取系統尚包括一同義字調整部,根據該判斷部 之判斷結果’調整該相似度計算部以及該關聯部。 9_如申請專利範圍第7或8項所述之資訊擷取系統, 39 裝------ (請先閲讀背面之注意事項再填寫本頁) 本紙張尺度適用中CNS ) (21GX297公釐厂' 420778 Μ Β8 C8 D8 六'申請專利範圍 其中該資訊70件係為各種型式、功能、及關聯媒介,以 及適用在各種作業平台之軟體元件。 10.如申請專利範圍第7或8項所述之資訊擷取系統, 其中該相似度計算部包括: 複數相似度計算單元,分別用以儲存該等系統同義字 所對應之系統同義字編碼樣式,並分別接收該查詢字編 碼樣式和該擷取元件關鍵字編碼樣式,用以計算該第一 組相似度值和該第二組相似度值。 11·如申請專利範圍第8項所述之資訊擷取系統,其中 該判斷部包括-第-選擇部,用以分別選擇出該第一乡且 相似度中最大者和該第二組相似度中最大者,並與一第 一既定臨界值進行比較,藉以決定對該相似度計算部以 及該關聯部。 12·如申請專利範圍第8項所述之資訊擷取系統,其中 该判斷部包括一第二選擇部,用以選擇出該等輸出值中 最大者,並與一第二既定臨界值進行比較,藉以決定對 該相似度計算部以及該關聯部。 13_如申請專利範圍第7或8項所述之資訊擷取系統, 其中該編碼部係根據該查詢字和該擷取元件關鍵字,記 錄所有字母之數量、字母第一次出現之位置、以及所有 字母本身。 -14.如申請專利範圍第g項所述之資訊擷取系統,其 中該同義字程序部尚包含一位置量偏移計算部,其利用 各字母之第一次出現之相對位置’計算其位置量偏移, 40 本紙張尺度適用中國國家標準(CNS ) A4規格(210X297公釐) —---^-------^-— (請先聞讀背面之注意事項再填寫本頁) -訂_ 經濟部中央標準局員工消費合作社印製 Α8 Β8 C8 D8 4^0ϊ78 六、申請專利範圍 ’當存在輸入錯誤時,則對令 藉以判斷是否有輸入錯誤 査詢字進行補償。 „------^-------ΐτ (請先閎讀背面之注意事項再填寫本頁) 15_—種資訊擷取系統,用以根據輸入之查詢字,查 詢出對應之資訊元件,該資訊擷取系統包括一同義字程 序,用以根據該查詢字,搜尋出具有同義關係之至少— 索引關鍵字;-資訊索引程序’用以根據該等索引關鍵 字,搜尋㈣紅至少元件;以及—元件排序過 濾程序,用以將搜尋出之該等資訊元件進行排序及過 濾,輸出以供選擇出所需之資訊元件,其特徵在於: 該資訊索引程序包括: —索引關鍵字暫存部,依序儲存該等索引關鍵字;以 及 複數元件索引單元,分別對應於該等資訊元件’並且 儲存相對於該等資訊元件之元件關鍵字組,以及對應之 資訊7G件辨識碼,當接收輸入的索引關鍵字組時,用以 計算其與所儲存之元件關鍵字組間的相似度,並判斷是 否超過一既定臨界值,以決定是否輸出對應之資訊元件 辨識碼。 經濟部中央標準局員工消費合作社印製 16. 如申請專利範圍第15項所述之資訊擷取系統,其 中該資訊元件係為各種型式、功能、及關聯媒介,以及 適用在各種作業平台之軟體元件。 17. 如申請專利範圍第丨5項所述之資訊擷取系統,其 中該既定臨界值係可調整’藉以決定可以輸出對應元件 索引單元之資訊元件辨識碼的數量。 41 巧·張尺度 釐)---- 4*:) A8 B8 C8 D8 經濟部中央標準局員工消費合作社印製 穴、申請專利範圍 18. —種資訊擷取系統,用以根據輸入之查詢字,查 詢出對應之資訊元件;其包括: 一同義字程序部’其根據一 ·同義字模糊機制,利用該 查詢字搜尋出具有同義關係之至少一索引關鍵字; 一資訊索引程序部’用以根據該等索引關鍵字,搜尋 出對應之至少一資訊元件,· 一元件排序過濾程序部,其根據一排序過濾模糊機 制,將搜尋出之該等資訊元件進行排序及過濾,輸出以 供選擇出所需之資訊元件; 一個人化同義字調整部,其根據所選擇出之資訊元 件,調整該同義字程序部中之該同義字模糊機制;以及 一個人化排序過濾調整部,其根據所選擇出之資訊元 件,調整該元件排序過濾程序部之該排序過濾模糊機 制。· 19. 如申請專利範圍第18項所述之資訊擷取系統,其 中該資訊元件係'為各種型式、功能、及關聯媒介,以及 適用在各種作業平台之軟體元件。 20. 如申凊專利範圍第丨8項所述之資訊擷取系統其 中忒同義字程序部係由一類神經網路所構成,用以達到 平行快速處理。 21:如申請專利範圍第18項所述之資訊擷取系統,其 中該資訊㈣程序部係由—類神經網路所構成,用以達 到平行快迷處理。 22.如申請專利範圍第18項所述之資訊擷取系統,其 (锖先閱讀背面之注^>項再填寫本頁) -1裝-420778 A8 B8 C8 D8 6. Scope of patent application — '1. — An information retrieval system for querying the corresponding information components based on the entered query word, the information retrieval system includes a word definition program' for The query word 'searches for at least one index key having a synonymous relationship; an information indexing program for searching for at least one corresponding information element according to the index keys; and a component sorting program' for The searched-out information elements are sorted and filtered for selection. The synonym word program includes: an encoding unit for encoding the query word and generating a query word code. Style 'The query word encoding style corresponds to the characteristics of the query word; a similarity calculation section' is used to compare the query word encoding style with the system synonym encoding style generated by the plural system synonym encoding to generate a corresponding Similarity values' These similarity values are used to indicate the similarity between the query word and the synonyms of these systems. The words respectively correspond to plural system keywords; and an association section 'the similarity values according to the similarity calculation section' and the correlation values between the system synonyms and the system keywords' Compare with a predetermined threshold to 'select output from these system keywords as the index keywords. 2. The information retrieval system described in item 1 of the scope of patent application, wherein the information components are various types, functions, and related media, and software components applicable to various operating platforms. 3 _ The information retrieval system described in item 1 of the scope of patent application, wherein the similarity calculation unit includes: J ^ i nn —1- I 1 ^ 1 1 1 ^ 1- I ^ 11 -M3., V " (Please read the notes on the back before filling out this page) Printed by the Consumers' Cooperative of the Central Bureau of Standards of the Ministry of Economic Affairs—37 420778 AS B8 C8 D8 Printed by DuPont of the Bureau of Standards and Technology of the People's Republic of China. Please include multiple similarities in patent scope. The calculation units are respectively used to store the system synonym word encoding styles corresponding to the system synonym words, and receive the query word encoding style, and calculate the corresponding similarity value according to a predetermined similarity calculation rule. 4. The information retrieval system described in item I of the patent scope, wherein the association section includes: a plurality of association units for receiving similarity values generated by the similarity calculation section, and according to these The correlation value is compared with the predetermined threshold to determine whether the corresponding system key is used as the index key. 5. The information retrieval system described in item 丨 of the scope of patent application, wherein the Sx encoding. According to the query word, the number of each letter, the position where the letter first appears, and all the letters are generated to generate the corresponding query word encoding format D 6. The information acquisition system described in item 5 of the scope of patent application The synonym program section also includes a position amount offset calculation section, which uses the first occurrence position of each letter to calculate the position amount offset to determine whether there is an input error. When there is an input error, the corresponding Query word to compensate. 7_ —An information retrieval system 'for querying corresponding information components according to the input query word'. The information retrieval system includes a synonym program for searching at least one index having a synonymous relationship based on the query word. Keywords; an information indexing program to search for at least one information component according to the index keywords; and a component sorting and filtering program 'to sort and search the information components through II n n -nn Is, j I n--In TJ! -9 (Please read the precautions on the back before filling out this page) 38 Printing Policy of Employee Consumer Cooperatives, Central Standards Bureau, Ministry of Economic Affairs 4 ^ iv i / ¾ A8 B8 C8 ____US I 、 Applicable patent scope ~~ ~~~ Filter and output for selecting the required information components, which is characterized by the following: _ The synonym program includes: An encoding unit 'is used to encode the query word and the corresponding information component. Extracting a component keyword to generate a query word encoding style and a retrieval component keyword encoding style, the query word encoding style corresponding to the characteristics of the query word, and the retrieval component key The word encoding style corresponds to the characteristics of the keyword of the retrieval component; a similarity calculation unit for separately identifying the query word encoding style and the retrieval component keyword encoding style with the system synonym generated by the plural system synonym encoding The coding styles are compared to generate the corresponding first set of similarity values and the second set of similarity values, which respectively represent the similarity between the query word and the retrieval component keyword and the synonymous words of these systems. The equivalent system synonyms are respectively corresponding to plural system keywords; an association unit 'according to the first group of similarity values and the second group of similarity values generated by the similarity calculation unit, and the system synonyms and the Calculate the corresponding output value based on the original correlation value between system keywords; and a judgment unit, based on the first group similarity value, the second group similarity value, and the output values, determine whether to modify the similarity calculation Department and the associated department. 8_ The information retrieval system described in item 7 of the scope of the patent application, wherein the information retrieval system further includes a synonym adjustment unit, and adjusts the similarity calculation unit and the associated unit according to the judgment result of the judgment unit. . 9_ If the information retrieval system described in item 7 or 8 of the scope of patent application, 39 units --- (Please read the precautions on the back before filling out this page) CNS in this paper size) (21GX297 company Li plant '420778 Μ B8 C8 D8 VI' patent application scope, of which 70 pieces of information are various types, functions, and related media, as well as software components applicable to various operating platforms. 10. If the scope of patent application is 7 or 8 In the information retrieval system, the similarity calculation unit includes: a plurality of similarity calculation units respectively configured to store system synonym word coding styles corresponding to the system synonym words, and respectively receive the query word code style and the query word code style Retrieve the component keyword encoding style to calculate the first set of similarity values and the second set of similarity values. 11. The information retrieval system described in item 8 of the scope of patent application, wherein the judgment section includes- A first selection unit for selecting the first town with the largest similarity and the second group with the largest similarity, respectively, and comparing it with a first predetermined threshold to determine the similarity meter 12. The information retrieval system as described in item 8 of the scope of patent application, wherein the judgment section includes a second selection section for selecting the largest of these output values and connecting with a first Two predetermined thresholds are compared to determine the similarity calculation section and the related section. 13_ The information retrieval system described in item 7 or 8 of the scope of patent application, wherein the encoding section is based on the query word and the Extract component keywords, record the number of all letters, the first occurrence of the letters, and all the letters themselves. -14. The information retrieval system described in item g of the patent application scope, wherein the synonym program department is still Contains a position amount offset calculation section, which uses the relative position of the first occurrence of each letter to calculate its position amount offset. 40 paper sizes are applicable to the Chinese National Standard (CNS) A4 specification (210X297 mm) --- -^ ------- ^ -— (Please read the notes on the back before filling out this page) -Order _ Printed by Employee Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs Α8 Β8 C8 D8 4 ^ 0ϊ78 6. Apply for a patent Range 'when If you make a mistake, you will be compensated to determine whether you have entered a wrong query. „------ ^ ------- ΐτ (Please read the precautions on the back before filling this page) 15_ — An information retrieval system for querying corresponding information components based on the input query word. The information retrieval system includes a synonym program for searching for at least synonymous relationships based on the query word. — Index key Words;-information indexing program 'to search for at least components based on these index keywords; and-component sorting and filtering program to sort and filter the information components found and output for selection The required information components are characterized in that the information indexing process includes:-an index key temporary storage unit that sequentially stores the index keys; and a plurality of element index units corresponding to the information elements' and storing the The component keyword groups of these information components, and the corresponding information 7G component identification codes, are used to calculate the relationship with the stored keyword when the input index keyword group is received. Similarity between members keyword group, and determines whether more than one predetermined threshold value to determine whether the output code corresponding to the identification information element. Printed by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 16. The information retrieval system described in item 15 of the scope of patent application, where the information components are various types, functions, and related media, and software suitable for various operating platforms element. 17. The information retrieval system described in item 5 of the patent application scope, wherein the predetermined threshold value is adjustable 'to determine the number of information element identification codes that can output corresponding element index units. 41 Qiao · Zhang Zhili) ---- 4 * :) A8 B8 C8 D8 Printed holes and applied for patents by the Consumer Cooperatives of the Central Standards Bureau of the Ministry of Economic Affairs 18. A type of information retrieval system, based on the input query words To query the corresponding information element; it includes: a synonym program unit which uses the query word to search for at least one index keyword having a synonymous relationship according to a synonym fuzzy mechanism; an information index program unit is used to According to the index keywords, a corresponding at least one information component is searched out. A component sorting and filtering program unit sorts and filters the searched information components according to a sorting and filtering fuzzy mechanism, and outputs it for selection. Required information components; a humanized synonym adjustment unit that adjusts the synonym fuzzy mechanism in the synonym program unit according to the selected information element; and a humanized sorting filter adjustment unit based on the selected information element Information element, adjusting the order filtering fuzzy mechanism of the order filtering process part of the element. · 19. The information retrieval system described in item 18 of the scope of patent application, wherein the information components are 'types, functions, and related media, and software components applicable to various operating platforms. 20. The information retrieval system described in item 8 of the patent application, wherein the synonym program department is composed of a type of neural network to achieve parallel and fast processing. 21: The information retrieval system described in item 18 of the scope of the patent application, wherein the information / programming department is composed of a class-like neural network for parallel fan processing. 22. The information retrieval system as described in item 18 of the scope of patent application, which (锖 read the note on the back ^ > then fill out this page) -1 pack- 42C778 A8 B8 C8 D8 六、申請專利範圍 中該同義子程序部包括一編碼部,用以將查詢字進行編 碼’該查询字之編碼格式包括該查詢字中所有字母之數 量、字母第一次出現之位置、以及所有字母本身;以及 位置量偏移計算部,其利用各字母之第一次出現之位 置,计算其位置量偏移,藉以判斷是否有輸入錯誤,當 存在輪入錯誤時,則對該查詢字進行補償。 {請先閱讀背面之注意事項再填寫本頁) -3 經濟部中央標準局員工消費合作社印製 本紙張尺度適用中國國家標率(CNS ) A4规格(210X297公釐)42C778 A8 B8 C8 D8 6. The synonymous subroutine unit in the scope of patent application includes a coding unit to encode the query word. The coding format of the query word includes the number of all letters in the query word and the first occurrence of the letter Position, and all the letters themselves; and a position offset calculation unit, which uses the position of the first occurrence of each letter to calculate its position offset to determine whether there is an input error. When there is a rotation error, then Compensate the query word. {Please read the notes on the back before filling this page) -3 Printed by the Consumer Cooperatives of the Central Bureau of Standards of the Ministry of Economic Affairs This paper applies the Chinese National Standard (CNS) A4 specification (210X297 mm)
TW87107685A 1998-05-18 1998-05-18 An information retrieval system realized by fuzzy neutral network model TW420778B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
TW87107685A TW420778B (en) 1998-05-18 1998-05-18 An information retrieval system realized by fuzzy neutral network model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
TW87107685A TW420778B (en) 1998-05-18 1998-05-18 An information retrieval system realized by fuzzy neutral network model

Publications (1)

Publication Number Publication Date
TW420778B true TW420778B (en) 2001-02-01

Family

ID=21630127

Family Applications (1)

Application Number Title Priority Date Filing Date
TW87107685A TW420778B (en) 1998-05-18 1998-05-18 An information retrieval system realized by fuzzy neutral network model

Country Status (1)

Country Link
TW (1) TW420778B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI732271B (en) * 2018-08-29 2021-07-01 大陸商騰訊科技(深圳)有限公司 Human-machine dialog method, device, electronic apparatus and computer readable medium

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI732271B (en) * 2018-08-29 2021-07-01 大陸商騰訊科技(深圳)有限公司 Human-machine dialog method, device, electronic apparatus and computer readable medium
US11775760B2 (en) 2018-08-29 2023-10-03 Tencent Technology (Shenzhen) Company Limited Man-machine conversation method, electronic device, and computer-readable medium

Similar Documents

Publication Publication Date Title
Lewis et al. Rcv1: A new benchmark collection for text categorization research
CN108304378B (en) Text similarity computing method, apparatus, computer equipment and storage medium
Wu et al. An interactive clustering-based approach to integrating source query interfaces on the deep web
Bhagavatula et al. Methods for exploring and mining tables on wikipedia
Sarawagi et al. Interactive deduplication using active learning
Yates et al. Unsupervised methods for determining object and relation synonyms on the web
US9639609B2 (en) Enterprise search method and system
US5606690A (en) Non-literal textual search using fuzzy finite non-deterministic automata
CN106649597B (en) Method for auto constructing is indexed after a kind of books book based on book content
CN107247745B (en) A kind of information retrieval method and system based on pseudo-linear filter model
Tayal et al. ATSSC: Development of an approach based on soft computing for text summarization
Piskorski et al. Exploring linguistically-lightweight keyword extraction techniques for indexing news articles in a multilingual set-up
Yerra et al. A sentence-based copy detection approach for web documents
Clinchant et al. Xrce’s participation in wikipedia retrieval, medical image modality classification and ad-hoc retrieval tasks of imageclef 2010
CN113111178A (en) Method and device for disambiguating homonymous authors based on expression learning without supervision
Branting A comparative evaluation of name-matching algorithms
Balaji et al. Text summarization using NLP technique
CN113673252A (en) An automatic join recommendation method for data tables based on field semantics
US20070061320A1 (en) Multi-document keyphrase exctraction using partial mutual information
CN107193916B (en) A kind of personalized diversified query recommendation method and system
TW420778B (en) An information retrieval system realized by fuzzy neutral network model
Bharambe et al. A survey: detection of duplicate record
Bossard et al. An evolutionary algorithm for automatic summarization
CN102646099B (en) Pattern matching system, pattern mapping system, pattern matching method and pattern mapping method
Yong et al. Docor: Document-level openie with coreference resolution

Legal Events

Date Code Title Description
GD4A Issue of patent certificate for granted invention patent
MK4A Expiration of patent term of an invention patent