.201102842 六、發明說明: 【發明所屬之技術領域】 本申請關於資料處理技術,特別關於一種詞匹配及資 訊查詢方法及裝置。 【先前技術】 潛在詞義通常是指一個詞(包括短語)潛在的意義, 通常可以通過另外一個或多個詞(包括短語)來表達,比 如通常所稱的“冰箱”其一般情況下潛在的詞義是指“電 冰箱”,而“棉拖”其一般情況下潛在的詞義是指“全棉 拖鞋”等。 自動發現潛在詞義是自然語目處理的一個基本問題’ 它的解決可以提高文字檔理解'機器翻譯及搜尋引擎的效 果及性能。 分詞技術是自然語言處理中常用的技術,分詞是將一 個輸入字串分成若干個詞或短語’比如“曾經有一段誠摯 @感情擺在我的面前”,經過分詞處理後,通常情況下得 麥J的分詞結果爲“曾經I有I —段I誠摯I的I感情I擺在I我I的丨 面前”。 用戶回饋曰誌記錄了查詢詞對應的查詢結果(文字檔 或網頁ID等)及查詢結果點撃頻率 '曝光頻率等。點擊 _率、曝光頻率等資訊反應了用戶對該查詢結果的認同程 度,一般意義上符合用戶需求的文字檔點擊率比不符合用 戶意圖的點擊率要高,比如查“西藥”,結果“批發西藥 -5- 201102842 ”及“江西藥廠”的單字的匹配程度是一樣的,但是通常 第一個結果的點擊率會比第二個結果要高》 通過分析用戶回饋日誌可以發現與查詢詞字元匹配程 度較高,同時表達方式不同的詞,比如搜尋“冰箱—詞 時,會發現很多帶“電冰箱”的結果,比如“雙開門電冰 箱”、“發明了冰箱”、“電冰箱廠家”、“銷售電冰箱 ’’、“存冰箱子”等,收集點擊率相對較高的結果,並且 對出現冰箱的句子分詞,統計每個分詞的頻率,如果某個 或多個分詞結果大於設定的閾値,則做下面處理:查詢詞 包含在一個高頻分詞結果中,比如“冰箱”包含在“電冰 箱”中,則認爲“電冰箱”是“冰箱”的潛在詞義;查詢 詞包含在相鄰的兩個高頻分詞中,例如:查詢詞“玻璃瓶 ”包含在“玻璃”及“瓶子”這兩個高頻分詞中,這也通 常被認爲“玻璃瓶子”是“玻璃瓶”的潛在詞義。 目前潛在語意的自動發現上已經有過不少的硏究,大 多是通過詞語的共現或鏈結關係來發現近義詞。例如陸勇 、侯漢清在文章“基於P a g e R a n k .演算法的漢語同義詞自 動識別”中介紹了 一種同義詞的自動發現方法,該文章將 辭彙之間解釋與被解釋的關係看成是一種鏈結,把 PageRank値看成是體現辭彙之間語義相似性的衡量指標 ,然後根據語義相似度的大小識別同義詞。這個方法的缺 點是:基於人工標注的語料,控掘得到的詞條數量會比較 有限。如果改成基於網際網路網頁之間的鏈結關係’這種 鏈結關係有時又很不可靠,同義詞自動發現的效果很難得 -6 - 201102842 到保障。 搜尋引擎的索引方式包括單字搜尋、分詞索引及混合 索引單子索引需要計算文字檔內單字之間的距離,效率 不高,並且精確率低,比如搜尋“農藥,’時,單字索引無 法區分“神農藥廠”及“神農農藥廠”的區別:而分詞搜 尋精確率冑,速度快,但是分詞索引召回率有時比較低, 比如搜"冰箱”時,分詞索引方法只能找到“冰箱,,的結 果’而找不到“電冰箱”的結果;單字索引及分詞索引結 合的混合索引方法通常是先根據分詞索引查詢,然後再根 據單字索引查詢’比如$ “玻璃瓶”時,先按分詞索引找 到“玻璃瓶’,的結果,再按單字索引找出其他結果,這種 彌補了兩種方法的缺點,但是“玻璃瓶子,,是根據單字索 引的方式找到的’搜尋引擎不能區分“玻璃瓶子,,及‘‘生 產玻璃瓶頸在於”的差異,影響搜尋的準確性; 前面的方法缺少足夠的資料量,或者缺少用戶的回饋 ,抽取出來的潛在語意太少或很有可能是錯誤的。 如陸勇、侯漢清提到的詞義自動發現方法主要是通過 已有的sll典資料作爲抽取來源’樣本量在幾千條左右。如 果是以網際網路網頁等大資料量爲基礎的控掘方法,又缺 乏準確性。 因此現有技術的不足在於:當面臨如網際網路等存在 著大資料量的情況時’尙沒有一種好的查詢方案能夠準確 的預知用戶真正所需查i旬的內容’也因此不能向用戶回饋 用戶真正所需的查詢結果。 201102842 【發明內容】 本申請提供了 一種詞匹配方法及裝置,用以提供一種 在存在海量資料的情況下,準確判斷詞與詞之間的內在聯 繫,並將其匹配的方案。 本申請實施例提供了一種詞匹配方法,包括如下步驟 獲取待匹配詞: 根據待匹配詞獲取用戶回饋日誌; 根據該用戶回饋日誌確定與待匹配詞匹配的詞。 ,較佳地’該用戶回饋日誌包括以該待匹配詞爲目標進 行查詢的歷史查詢結果,以及用戶對歷史查詢結果的點選 頻率。 較佳地’根據該用戶回饋日誌中的歷史查詢結果以及 點選頻率確定與待匹配詞匹配的詞。 較佳地’該點選頻率包括:對歷史查詢結果的點選頻 率及/或對歷史查詢結果的內容的點選頻率。 較佳地,該根據對歷史查詢結果的內容的點選頻率確 定與待匹配詞匹配的詞,包括: 獲取待匹配詞的歷史查詢結果的內容; 對歷史查詢結果的內容進行分詞處理獲得分詞後的詞 » 根據分詞後的詞的點選頻率確定與待匹配詞匹配的詞 -8 - 201102842 較佳地,該分詞後的詞包括下述方式的詞或者其組合 分詞後與待匹配詞相鄰的詞; 分詞後包含待匹配詞的詞; 分詞後包括待匹配詞組成部分的詞。 較佳地,在根據該查詢結果以及點選頻率確定與待 匹配詞匹配的詞時,該點選頻率大於設定閾値。 較佳地,該獲取待匹配詞,包括: 獲取用戶輸入的資訊內容; 對該資訊內容進行分詞處理後獲得分詞後的詞,及/ 或,將該資訊內容分解爲字; 將分詞後的詞及/或字作爲待匹配詞。 較佳地,該點選頻率包括歷史查詢結果的點擊頻率、 歷史查詢結果的曝光頻率、對歷史查詢結果的閱讀時間、 歷史查詢結果的重要度其中之一或者其組合。 較佳地,進一步包括: 在用戶輸入待匹配詞時,獲取該用戶的用戶特徵; 該獲取用戶回饋曰誌時,根據該用戶的用戶特徵獲取 用戶回饋曰誌。 較佳地,進一步包括: 在用戶輸入待匹配詞時,獲取該用戶的用戶特徵; 所述獲取用戶回饋日誌時,獲取用戶回饋日誌中包括 以所述待匹配詞爲目標進行查詢的歷史查詢結果,以及用 戶對歷史查詢結果的點選頻率,該歷史查詢結果包括該用 -9 - 201102842 戶特徵。 較佳地,進一步包括: 在用戶輸入待匹配詞時,獲取該用戶的用戶特徵; 該根據該用戶回饋曰誌確定與待匹配詞匹配的詞時, 根據該用戶特徵確定與待匹配詞匹配的詞。 本申請實施例還提供了一種詞匹配裝置,包括: 待匹配詞獲取模組,用於獲取待匹配詞; 用戶回饋日誌獲取模組,用於根據待匹配詞獲取用戶 回饋日誌; 匹配模組’用於根據該用戶回饋日誌以及點選頻率確 定與待匹配詞匹配的詞。 較佳地’該用戶回饋日誌獲取模組進一步用於獲取包 括以該待匹配詞爲目標進行查詢的歷史查詢結果,以及用 戶對歷史查詢結果的點選頻率的用戶回饋日誌。 較佳地’匹配模組進一步用於根據該用戶回饋日誌中 的歷史查詢結果以及點選頻率確定與待匹配詞匹配的詞。 較佳地’該用戶回饋日誌獲取模組進一步用於獲取包 括:對歷史查詢結果的點選頻率及/或對歷史查詢結果的 內容的點選頻率作爲該點選頻率。 較佳地’該匹配模組包括: 內容獲取單元,用於獲取待匹配詞的歷史查詢結果的 內容; 分詞單元,用於對歷史查詢結果的內容進行分詞處理 獲得分詞後的詞; -10 - 201102842 匹配單元,用於根據分詞後的詞的點選頻率確定與待 匹配詞匹配的詞。 較佳地,該分詞單元進一步用於在分詞後獲得下述方 式的詞或者其組合: 分詞後與待匹配詞相鄰的詞; 分詞後包含待匹配詞的詞; 分詞後包括待匹配詞組成部分的詞。 較佳地,該匹配模組進一步用於在根據該歷史查詢結 果以及點選頻率確定與待匹配詞匹配的詞時,該點選頻率 大於設定閾値。 較佳地,該待匹配詞獲取模組包括: 資訊內容獲取單元,用於獲取用戶輸入的資訊內容; 分詞/分解單元,用於對該資訊內容進行分詞處理後 獲得分詞後的詞,及/或,將該資訊內容分解爲字; 待匹配詞確定單元,用於將分詞後的詞及/或字作爲 待匹配詞。 較佳地’該用戶回饋日誌獲取模組進一步用於獲取包 括歷史查詢結果的點擊頻率、歷史查詢結果的曝光頻率、 對歷史查詢結果的閱讀時間、歷史查詢結果的重要度其中 之一或者其組合的參數作爲點選頻率。 較佳地’該待匹配詞獲取模組進一步用於在用戶輸入 待匹配詞時,獲取該用戶的用戶特徵; 該用戶回饋日誌獲取模組進一步用於根據用戶特徵獲 取用戶回饋日誌。 -11 - 201102842 較佳地,該待匹配詞獲取模組進一步用於在用戶輸入 待匹配詞時,獲取該用戶的用戶特徵; 該用戶回饋日誌獲取模組進一步用於在獲取用戶回饋 曰誌時,獲取用戶回饋日誌中包括以該待匹配詞爲目標進 行查詢的歷史查詢結果,以及用戶對歷史查詢結果的點選 頻率,該歷史查詢結果包括該用戶特徵》 較佳地,該待匹配詞獲取模組進一步用於在用戶輸入 待匹配詞時,獲取該用戶的用戶特徵; 該匹配模組進一步用於在根據該用戶回饋日誌確定與 待匹配詞匹配的詞時,根據該用戶特徵確定與待匹配詞匹 配的詞。 基於同一構_思,本申請提供一種資訊查詢方法及裝置 ,用以提供一種在存在海量資料的情況下,利用前述的詞 與詞之間匹配關係,準確判斷用戶查詢資訊的真實需要, 並回饋用戶真正所需的查詢結果。 本申請實施例中提供了一種資訊查詢方法,包括如下 步驟: 獲取輸入的第一查詢關鍵詞; 根據第一查詢關鍵詞獲取用戶回饋日誌; 根據該用戶發饋日誌確定與第一查詢關鍵詞匹配的第 二查詢關鍵詞; 回饋以第二查詢關鍵詞爲目標進行查詢的查詢結果。 較佳地’該用戶回饋日誌包括以該第一查詢關鍵詞爲 目標進行查詢的歷史查詢結果’以及用戶對歷史查詢結果 -12- 201102842 的點選頻率。 較佳地’根據該用戶回饋日誌中的歷史查詢結果以及 點選頻率確定與第一查詢關鍵詞匹配的第二查詢關鍵詞。 較佳地’該點選頻率包括:對歷史查詢結果的點選頻 率及/或對歷史查詢結果的內容的點選頻率。 較佳地’該根據對歷史查詢結果的內容的點選頻率確 定與第一查詢關鍵詞匹配的第二查詢關鍵詞,包括: 獲取第一關鍵詞的歷史查詢結果的內容; 對歷史查詢結果的內容進行分詞處理獲得分詞後的詞 根據分詞後的詞的點選頻率確定與第一查詢關鍵詞匹 配的第二查詢關鍵詞。 較佳地,該分詞後的詞是指下述方式的詞或者其,袓合 分詞後與第一查詢關鍵詞相鄰的詞; 分詞後包含第一查詢關鍵詞的詞; 分詞後包括第一查詢關鍵詞組成部分的詞。 較佳地,在根據該歷史查詢結果以及點選頻率確定與 第一查詢關鍵詞匹配的第二查詢關鍵詞時,該點選頻率大 於設定閩値。 _父佳地,該獲取輸入的第一查詢關鍵詞,包括· 獲取用戶輸入的資訊內容; 對該資訊內容進行分詞處理後獲得分詞後的胃,& / 或,將該資訊內容分解爲字; -13 - 201102842 將分詞後的詞及/或字作爲第一查詢關鍵詞。 較佳地’該點選頻率包括歷史查詢結果的點擊頻率' 歷史查詢結果的曝光頻率、對歷史查詢結果的閱讀時間、 歷史查詢結果的重要度其中之一或者其組合。 較佳地,進一步包括: 在用戶輸入第~查詢關鍵詞時,獲取該用戶的用戶特 徵; 該獲取用戶回饋日誌時,根據該用戶的用戶特徵獲取 用戶回饋曰誌。 較佳地,進一步包括: 在用戶輸入第一查詢關鍵詞時,獲取該用戶的用戶特 徵; 該獲取用戶回饋日誌時,獲取用戶回饋日誌中包括以 該第一查詢關鍵詞爲目標進行查詢的歷史查詢結果,以及 用戶對歷史查詢結果的點選頻率,該歷史查詢結果包括該 用戶特徵。 較佳地,進一步包括: 在用戶輸入第一查詢關鍵詞時,獲取該用戶的用戶特 徵; 該根據該用戶回饋日誌確定第二查詢關鍵詞時’根據 該用戶特徵確定第二查詢關鍵詞。 本申請實施例中還提供了一種資訊查詢裝置’包括: 第一查詢關鍵詞獲取模組,用於獲取輸入的第一查詢 關鍵詞; -14- 201102842 用戶回饋曰誌獲取模組,用於根據第一查詢關鍵詞獲 取用戶回饋日誌; 匹配模組,用於根據該用戶回饋日誌確定與第一查詢 關鍵詞匹配的第二查詢關鍵詞; 查詢結果回饋模組,用於回饋以第二查詢關鍵詞爲目 標進行查詢的查詢結果。 較佳地’用戶回饋日誌獲取模組進一步用於獲取包括 以該第一查詢關鍵詞爲目標進行查詢的歷史查詢結果,以 及用戶對歷史查詢結果的點選頻率的用戶回饋日誌。 較佳地’匹配模組進一步用於根據該用戶回饋日誌中 的歷史查詢結果以及點選頻率確定與第一查詢關鍵詞匹配 的第二查詢關鍵詞。 較佳地’該用戶回饋日誌獲取模組進一步用於獲取包 括·對歷史查S旬結果的點選頻率及/或對歷史查詢結果的 內容的點選頻率作爲該點選頻率。 較佳地’該匹配模組包括: 內容獲取單元’用於獲取第一關鍵詞的歷史查詢結果 的內容; 分詞單元,用於對歷史查詢結果的內容進行分詞處理 獲得分詞後的詞; 匹配單元’用於根據分詞後的詞的點選頻率確定與第 一查詢關鍵詞匹配的第二查詢關鍵詞。 較佳地’該分詞單元進一步用於在分詞後獲得下述方 式的詞或者其組合: -15- 201102842 分詞後與第一查詢關鍵詞相鄰的詞; 分詞後包含第一查詢關鍵詞的詞; 分詞後包括第一查詢關鍵詞組成部分的詞。 較佳地,該匹配模組進一步用於在根據該歷史查詢結 果以及點選頻率確定與第一查詢關鍵詞匹配的第二查詢關 鍵詞時’該點選頻率大於設定閩値。 較佳地,該第一查詢關鍵詞獲取模組包括: 資訊內容獲取單元,用於獲取用戶輸入的資訊內容; 分詞/分解單元,用於對該資訊內容進行分詞處理後 獲得分詞後的詞,及/或,將該資訊內容分解爲字; 第一查詢關鍵詞確定單元,用於將分詞後的詞及/或 字作爲第一查詢關鍵詞。 較佳地’該用戶回饋日誌獲取模組進一步用於獲取包 括歷史查詢結果的點擊頻率、歷史查詢結果的曝光頻率、 對歷史查詢結果的閱讀時間、歷史查詢結果的重要度其中 之一或者其組合的參數作爲點選頻率。 較佳地’該第一查詢關鍵詞獲取模組進一步用於在用 戶輸入第一查詢關鍵詞時,獲取該用戶的用戶特徵; 該用戶回饋日誌獲取模組進一步用於根據用戶特徵獲 取用戶回饋日誌。 較佳地’該第一查詢關鍵詞獲取模組進一步用於在用 戶輸入第一查詢關鍵詞時,獲取該用戶的用戶特徵; 該用戶回饋日誌獲取模組進一步用於在獲取用戶回饋 日誌時,獲取用戶回饋日誌中包括以該待匹配詞爲目標進 -16- 201102842 行查詢的歷史查詢結果,以及用戶對歷史查詢結果的點選 頻率’該歷史查詢結果包括該用戶特徵。 較佳地’該第一查詢關鍵詞獲取模組進一步用於在用 戶輸入第一查詢關鍵詞時,獲取該用戶的用戶特徵; 所述匹配模組進一步用於在根據該用戶回饋日誌確定 第二查詢關鍵詞時,根據該用戶特徵確定第二查詢關鍵詞 〇 本申請有益效果如下: 本申請實施中,在獲取輸入的第一查詢關鍵詞後,就 去獲取第一查詢關鍵詞的用戶回饋日誌,而用戶回饋日誌 中包括了以該第一查詢關鍵詞爲目標進行查詢的歷史查詢 結果’以及用戶對歷史查詢結果的點選頻率;然後根據歷 史查詢結果以及點選頻率來確定與第一查詢關鍵詞匹配的 第二查詢關鍵詞;最後回饋的是以匹配後的第二查詢關鍵 詞爲目標進行查詢的查詢結果。由於在此過程中採用了用 戶回饋曰誌作爲發現用戶查詢資訊潛在詞義的基礎,因此 在擁有大量的資料情況下,能夠利用以往的用戶回饋資訊 準確的確定出查詢資訊的潛在詞義,從而提高了資訊查詢 的準確性。 【實施方式】 下面結合附圖對本申請的具體實施方式進行說明。 圖1爲資訊查詢方法實施流程示意圖,如圖所示,可 以包括如下步驟: -17- 201102842 步驟101、獲取輸入的第一查詢關鍵詞; 步驟102、根據第一查詢關鍵詞獲取用戶回饋日誌; 用戶回饋日誌包括以該第一查詢關鍵詞爲目標進行查 詢的歷史查詢結果’以及用戶對歷史查詢結果的點選頻率 1 步驟103、根據該歷史查詢結果以及點選頻率確定與 第一查詢關鍵詞匹配的第二查詢關鍵詞: 步驟104、回饋以第二查詢關鍵詞爲目標進行查詢的 查詢結果。 下面對各步驟的具體實施進行說明。 步驟1 01中’對於第一查詢關鍵詞,可以是·· 獲取用戶輸入的資訊內容; 對該資訊內容進行分詞處理後獲得分詞後的詞,及/ 或,將該資訊內容分解爲字: 將分詞後的詞及/或字作爲第一查詢關鍵詞。 可以看出,本申請實施過程中用於查詢的關鍵詞可以 是詞也可以是字’當是字時,可以視爲通常所指的單字查 詢’通過對用戶輸入的需要查詢的資訊內容來說,以各種 查詢單位’如字或詞來查詢,或者結合起來查詢顯然可以 使查詢結果的精度更高、更準確。 步驟102中,用戶回饋日誌通常是指搜尋引擎用來收 集用戶輸入的關鍵詞及歷史查詢結果(通常是網頁文字檔 ID等)及歷史查詢結果的點擊頻率、曝光率等。 實施中,用戶回饋日誌可以包括的是歷次以第一查詢 -18- 201102842 關鍵詞爲目標進行查詢的歷史查詢結果,以及歷次用戶對 歷史查詢結果的點選頻率’用戶回饋日誌作爲建立潛在詞 義的樣本’可以採用歷次的記錄,但是,用戶回饋日誌的 目的在於通過以往的記錄來確定詞與詞之間的內在關係, 從而建立潛在詞義’只要能實現該目的,顯然也可以選取 部分歷史查詢結果,或者是隨機選取等等方式來採集確定 潛在詞義的樣本。同樣道理,用戶回饋日誌在選取時,並 不是以用戶爲物件來進行選取,而是以歷史上進行查詢的 詞爲目標來進行選取’例如需要獲取第一查詢關鍵詞爲“ 西藥”的用戶回饋日誌時,獲取的是歷史上用“西藥”爲 查詢詞的所有或者部分用戶的用戶回饋日誌。 潛在詞義的自動發現特指找出一個詞(短語)及另外 詞義相關或相近的一個詞(短語)或多個詞(短語)。本 申請實施例的本質在於通過利用用戶參與的用戶回饋日誌 以便能夠非常可靠的自動發現查詢詞及歷史查詢結果之間 體現用戶意圖的潛在詞義關係,並利用該關係來提高搜尋 引擎的準確率及智慧。因此,用戶回饋日誌中可以包括歷 次以該第一查詢關鍵詞爲目標進行查詢的歷史查詢結果, 以及歷次用戶對歷史查詢結果的點選頻率。並在步驟! 〇3 中基於歷史查詢結果以及點選頻率來尋找第一查詢關鍵詞 的潛在詞義。即’在步驟1 〇 2中獲取的是用戶回饋日誌, 並利用用戶回饋日誌來確定第—查詢關鍵詞的潛在詞義, 從而能夠通過步驟丨03輸出及步驟I 〇丨中第一查詢關鍵詞 之間存在潛在詞義關係的第二查詢關鍵詞。 -19- 201102842 其中’點選頻率可以包括:對歷史查詢結果的點選頻 率及/或對歷史查詢結果的內容的點選頻率。 下面對步驟1 03的具體實施進行說明。 首先對根據對歷史查詢結果的內容的點選頻率確定與 第一查詢關鍵詞匹配的第二查詢關鍵詞進行說明。 獲取第一關鍵詞的歷史査詢結果的內容; 對歷史查詢結果的內容進行分詞處理獲得分詞後的詞 » 根據分詞後的詞的點選頻率確定與第一查詢關鍵詞匹 配的第二查詢關鍵詞。 實施中’分詞後的詞是指下述方式的詞或者其組合: 第一種詞:分詞後與第一查詢關鍵詞相鄰的詞,爲描 述方便,實施例中將該種情況下的點選頻率相關的統計結 果記爲P 1 ; 第二種詞:分詞後包括第一查詢關鍵詞組成部分的詞 ,爲描述方便’實施例中將該種情況下的點選頻率相關的 統計結果記爲P 2 ; 第三種詞:分詞後包含第一查詢關鍵詞的詞,爲描述 方便,實施例中將該種情況下的點選頻率相關的統計結果 記爲P 3。 下面先對步驟103的實施原理進行說明。 用戶回饋日誌是用來記錄查詢詞對應的歷史查詢結果 及歷史查詢結果的點擊率、曝光頻率等資訊的,如查詢結 果爲網頁等:發明人在發明過程中注意到:對於某個查詢 -20- 201102842 詞點擊率越高的網頁與查詢詞越相關。一個詞的潛在詞義 是指及它同義、近義或者部分同義的詞,比如“玻璃瓶” 及“玻璃瓶子”,又如“雙人床”、“單人床”、“彈簧 床”等詞都潛在“床”的詞義,而“機床”等則不潛在“ 床”的詞義。在本申請實施例中定義了三種潛在詞義:第 一種詞是經常成對出現的詞,比如“摩托羅拉”及“公司 ” > “摩托羅拉”及“手機”,這種關係通常是一個詞及 另外一個詞密切相關’即,分詞後的有些詞與查詢詞相鄰 ;第二種詞是一個詞及另外多個並按一定順序出現的詞, 比如“玻璃瓶”及“玻璃” “瓶子”,“美女”及“美麗 的” “女人”,即分詞後其包含了查詢詞的組成部分;第 三種詞是一個詞是一個詞組成部分,比如“蝦”及“對蝦 ” ’ “酒”及“啤酒”,即,分詞後的詞包含了查詢詞。 這些通過點擊率等用戶回饋自動發現的潛在詞義往往代表 了用戶輸入的搜尋關鍵字的潛在意圖,可以用來提高搜尋 引擎的準確率,比如用戶搜尋“床”時大部分用戶的實際 意圖是睡覺的床比如“單人床”、“雙人床”、“木板床 ”等,而不是機械設備比如“機床”或“車床”。通過用 戶點擊等回饋就能知道前者有“床”的潛在詞義,而後者 (機床等)沒有。 本申請在具體實施中,首先輸入第一查詢關鍵詞、歷 史查詢結果(網頁,文字檔ID等)及歷史查詢結果的點 擊率、曝光率等資訊或其中之一,即輸入步驟101中的第 一查詢關鍵詞以及步驟1 02獲取用戶回饋日誌的執行結果 -21 - 201102842 ;然後β弟一查詢關鍵詞進行分詞,如果第一查詢關鍵詞 包括多個詞’則將這條查詢詞的用戶回饋日誌中對應的歷 史查詢結果及相關資訊添加到這條查詢詞中相應的每個分 詞中去,即’使這條查詢詞在分詞後的每個詞都有自己的 歷史查詢結果’這樣處理後,用戶回饋日誌的每個query (查詢)都是一個單獨的分詞;然後對每個分詞後得到詞 或其中部分分別做上述與p 1、P 2、P 3有關的處理,直到 所有或部分分詞後的詞處理完畢,歷史查詢結果的選取可 以根據歷史查詢結果總的查詢次數、點擊次數、曝光次數 等資訊或其中之一確定;對分詞後的詞對應的歷史查詢結 果分別做處理直到所有歷史查詢結果處理完畢;從用戶回 饋日誌中的歷史查詢結果中找出所有與分詞後的詞完全匹 配的字串(這裏完全匹配是指分詞後的詞是字串的一個子 串),字串的尺度可以是包含分詞後的詞的句子長度,或 包含分詞後的詞長度的Μ倍,Μ可以是大於1的任何數, 然後對字串分詞後做上述與PI、Ρ2、Ρ3有關的處理,需 要說明的是,爲便於描述’下述實施例中以文字檔爲查詢 結果,實施時’同時考慮了對查詢結果的點選頻率及對查 詢結果的內容的點選頻率’顯然,只考慮其中一個同樣能 實現申請目的。 具體實施中’當在輸入第一查詢關鍵詞、歷史查詢結 果(網頁,文字檔ID等)及歷史查詢結果的點擊率、曝 光率等資訊或其中之一時,可以設置一個查詢詞典,提前 輸入歷史查詢結果(網頁,文字檔ID等)及歷史查詢結 -22- 201102842 果的點擊率、曝光率等資訊或其中之一,這樣當輸入第一 查詢關鍵詞時,通過查詢詞典便可以快捷的獲得第二查詢 關鍵詞。也就是將以往的用戶回饋日誌的內容預先儲存用 於查詢,也可以根據新的用戶回饋日誌隨時對查詢詞典進 行更新:當然也可以在輸入第一查詢關鍵詞後再調用用戶 回饋日誌。 第一種:分詞後與第一查詢關鍵詞相鄰的詞的實施。 如果第一查詢關鍵詞是字串的一個分詞,比如第一查 詢關鍵詞是“美女”,用戶回饋曰誌中的歷史查詢結果是 “中國I古代I美女|西施丨名|夷光卜|春秋丨戰國丨時期丨出生” (這裏“ Γ表示分詞結果),這時將查詢詞前後的T個 分詞在字串中出現的次數乘以該文字檔的點擊頻率及曝光 頻率(或其中之一)作爲權重的一個係數,記爲次數加權 (1 ),加到總的查詢結果的統計P 1,P 1中包含了第一查 詢關鍵詞前後出現的每個詞的次數加權(1 ),例如本例 中,如果文字檔的權重爲0.5,則p 1中“古代”及“西施 ”(這只是τ等於1的情況)對應的結果會相應加〇. 5。 第二種:分詞後包括第一查詢關鍵詞組成部分的詞。 如果第一查詢關鍵詞包含在字串相鄰的多個分詞結果 中,比如第一查詢關鍵詞是“美女”,用戶回饋日誌中的 歷史查詢結果是“西施I是I個I美麗的I女人I ” (這裏“ Γ 表示分詞結果),這時將包含第一查詢關鍵詞的分詞出現 次數並乘以該文字檔的點擊頻率/曝光頻率(或其中之一 )作爲權重的一個係數’記爲次數加權(2 ),加到總的 -23- 201102842 查詢結果的統計P2,P2中是包括第一查詢關鍵詞的多個 分詞按照相同順序出現的次數加權(2 ),例如本例中, 如果文字檔的權重爲0.3,則將P 2中“丨美麗的丨女人Γ 對應的結果加0.3。 第三種:分詞後包含第一查詢關鍵詞的詞。 如果第一查詢關鍵詞是字串一個分詞的字串,比如查 詢詞是“冰箱”,用戶回饋日誌中的歷史查詢結果是“電 冰箱I空調器丨原理丨與|維修”(這裏“ 表示分詞結果) ,這時將包含第一查詢關鍵詞的分詞出現次數並乘以該文 字檔的點擊頻率及曝光頻率(或其中之一)作爲權重的一 個係數,記爲次數加權(3 ),加到總的查詢結果的統計 P3,P3是包括第一查詢關鍵詞的分詞出現的次數加權(3 ),例如本例中,如果文字檔的權重爲0.8,則將P3中“ 電冰箱”對應的結果加0.8。 不斷重複直到對於單個分詞後的詞所有的用戶回饋曰 誌中的歷史查詢結果全部處理完畢;按照P1中分詞出現 的次數加權和,取次數加權和大於設定的第一閾値的分詞 ,將這些分詞作爲該查詢詞的第一種潛在詞義關係,同樣 ,按照P2 ’ P3中分詞出現的次數加權和,並取次數加權 和大於設定的第二、第三閾値的分詞,將這些分詞作爲該 詞的第二種潛在詞義及第三種潛在詞義關係。 本領域技術人員容易知道,實施中可以選用三種選擇 潛在詞義中的一種,也可以任意兩種組合或三種組合; 同樣’實施中,第一、第二 '第三閾値可以是固定閾 -24- 201102842 値,也可以根據查詢詞總體查詢結果動態設定’比如將所 有包含了匹配字串的文字檔權重求和’然後再乘以一個係 數,該係數便可根據查詢結果動態設定:閾値設置的目的 在於有選擇的確定一部分查詢詞的潛在詞義的詞,並非將 所有的詞都無條件回饋。 具體實施中,在根據該歷史查詢結果以及點選頻率確 定與第一查詢關鍵詞匹配的第二查詢關鍵詞時,可以要求 點選頻率大於設定閩値,其中,點選頻率可以是用戶對歷 史查詢結果的點選頻率,也可以是用戶對歷史查詢結果的 內容的點選頻率。其目的在於將文字檔或者其內容的點擊 頻率及曝光頻率(或其中之一)作爲權重的一個係數,該 係數可以與點擊率及曝光率二者之一或兩者的組合,係數 大小及點擊及曝光頻率可以是線性或非線性的關係,比如 (不限於)兩者頻率高於某一設定閾値的全部爲1,其他 爲0;或者點擊率及曝光率最高的爲1,其他的除以最大 値歸一化到〔〇, 1〕。點選頻率的選取目的在於通過它來 發現潛在詞義,因而可以通過設定閾値來過濾一些點選頻 率較低的資訊,從而提高發現潛在詞義的速度,同時也可 以避免一些資訊的干擾。 貫施中,點選頻率包括歷史查詢結果的點撃頻率、歷 史查詢結果的曝光頻率、對歷史查詢結果的閱讀時間、歷 史查詢結果的重要度其中之一或者其組合。本領域技術人 員容易理解’該文字檔的點擊頻率及曝光頻率(或其中之 一)作爲權重的一個係數’係數也可以是文字檔的其他資 -25- 201102842 訊,比如閱讀時間,重要程度等或其中 光率的結合。 實施中,潛在詞義不但是查詢詞與 反過來也成立。例如“玻璃瓶”潛在言丨 ,等價於“玻璃丨瓶子”潛在“玻璃瓶 潛在詞義“電冰箱”,等價於“電冰箱 *> 0 在確定了第一查詢關鍵詞的潛在詞 步驟104,步驟104、回饋以潛在詞義 詞爲目標進行查詢的查詢結果了。 實施中,在步驟101的獲取輸入的 ,可以進行如下處理: 獲取用戶輸入的資訊內容; 對該資訊內容進行分詞處理後獲转 或,將該資訊內容分解爲字; 將分詞後的詞及/或字作爲第一査詢 在確定第一查詢關鍵詞時,可以採 是對用戶輸入的資訊內容先進行分詞, 果進行查詢,或者將該資訊內容以字爲 字查詢。顯然這兩種方式可以同時進行 在組合時可以是:先對用戶輸入的查詢 詞結果做查詢,然後再根據查詢詞分詞 ,最後做單字查詢。分詞結果做查詢是 詞結果從分詞索引中查詢相關結果;單 之一或與點擊率曝 潛在詞義的關係, ]義“玻璃I瓶子” ’,或者“冰箱” ”潛在詞義“冰箱 義後,便可以執行 ,即第二查詢關鍵 第一查詢關鍵詞時 卜分詞後的詞,及/ 關鍵詞β 用兩種來源,一稹 然後用分詞後的結 單位分解後進行單 也可以組合進行, 詞分詞,再根據分 的潛在語意做查詢 指根據查詢詞的分 字查詢是指從單字 -26 - 201102842 索引中查詢結果;潛在語意查詢是指利用查詢詞的潛在意 義得到查詢結果’對於在上述實施例中提到的三種語意( 或其中任意一種)分別(或單獨)做如下處理: 對於第一種潛在詞義的詞,通過“查詢詞+第一種潛 在詞義的詞”查詢得到相關結果,如查詢詞是“摩托羅拉 ”,那麼相應的第一種潛在詞義的詞查詢爲“摩托羅拉公 司”、“摩托羅拉手機”,這裏假定“摩托羅拉”的第一 種潛在詞義的詞是“公司”及“手機”:對於第二種潛在 詞義的詞,通過第二種潛在詞義的“相鄰查詢詞”得到查 詢結果,比如“玻璃瓶”相應的第二種潛在詞義的詞爲“ 玻璃I瓶子”;對於第三種潛在詞義的詞,是通過第三種 潛在詞義的詞得到的查詢結果,例如查詢“電冰箱”,第 三潛在詞義的詞是“冰箱”。 顯然,基於潛在詞義查詢的查詢結果在計算查詢詞與 文字檔的相關程度時,應該比單字查詢得到結果的相關程 度高,這個相關程度的分値會影響查詢結果的排序(根據 相關程度及網頁重要程度等,如PageRank )。 進一步的,實施中還可以在步驟101獲取第一查詢關 鍵詞時,還獲取輸入第一查詢關鍵詞的用戶的用戶特徵; 即,可以在用戶輸入第一查詢關鍵詞時,獲取該用戶的用 戶特徵。 這樣,在步驟1 〇2獲取用戶回饋日誌時’還可以根據 用戶特徵獲取用戶回饋日誌。 或者,在獲取用戶回饋日誌時’獲取用戶回饋日誌中 -27- 201102842 包括以該第一查詢關鍵詞爲目標進行查詢的歷史查詢結果 ,以及用戶對歷史查詢結果的點選頻率,而在這些歷史查 詢結果中則包括了這些用戶特徵。 或者,在根據用戶回饋日誌確定第二查詢關鍵詞時, 根據用戶特徵確定第二査詢關鍵詞。 即:在根據用戶回饋日誌匹配第二查詢關鍵詞時,還 可以根據輸入第一查詢關鍵詞的用戶特徵匹配不同的第二 查詢關鍵詞。採用用戶特徵來對用戶回饋日誌進行甄選, 有利於更進一步的發現第一查詢關鍵詞的潛在詞義。比如 :按前述實施例,用戶在搜尋“床”時,大部分用戶的實 際意圖是睡覺的床,比如“單人床”、“雙人床”、“木 板床”等,而不是機械設備比如“機床”或“車床”。這 時通過用戶點擊等回饋就能知道前者有“床”的潛在詞義 ,而潛在詞義中則不包含“機床”等;然而,同樣的查詢 關鍵詞“床”,如果用戶是機械設備領域的技術人員,則 其潛在詞義則應當是“機床”,而非“單人床”、“雙人 床”、“木板床”等,本實施例中’ “機械設備領域的技 術人員”便是用戶特徵,其作用在於對用戶回饋日誌進行 分類,以便更好的發現詞的潛在詞義。 再例如:用戶輸入的第一查詢關鍵詞是“蘋果”,如果 用戶特徵是電腦工作者,則匹配電腦類的第二查詢關鍵詞 ;如果用戶特徵是農業科學工作者’則匹配水果類的第二 關鍵詞。具體實施中,用戶特徵可以包括用戶所在區域( 例如所在國家、地區、城鎭)、用戶以前頻繁流覽的網頁 -28 - 201102842 、用戶不久前流覽的網頁、用戶以前輸入的搜尋關鍵詞、 用戶的性別 '年齡 '職業、愛好等等。對用戶特徵的分析 歸類上,可以根據需要使用分析IP位址、分析用戶端流 覽器歷史資料、分析用戶端COOKIE資料、分析用戶網上 註冊資訊等技術手段,這對本領域技術人員來說是容易瞭 解的。 基於同一發明構思,本申請還提供了一種詞匹配方法 及裝置、一種資訊查詢裝置,由於詞匹配方法及裝置、資 訊查詢裝置與資訊查詢方法是基於同一發明構思,它們具 有相似的原理,因此在詞匹配方法及裝置、資訊查詢裝置 實施中可以參考資訊查詢方法的實施,重複之處不再贅述 〇 圖2爲資訊查詢裝置結構示意圖,如圖所示,裝置中 可以包括: 第一查詢關鍵詞獲取模組2 0 1,用於獲取輸入的第一 查詢關鍵詞; 用戶回饋日誌獲取模組202,用於獲取第一查詢關鍵 詞的用戶回饋日誌; 匹配模組203,用於根據該用戶回饋日誌確定與第一 查詢關鍵詞匹配的第二查詢關鍵詞; 查詢結果回饋模組204,用於回饋以第二查詢關鍵詞 爲目標進行查詢的查詢結果。 實施中,用戶回饋日誌獲取模組可以進一步用於獲取 包括歷次以該第一查詢關鍵詞爲目標進行查詢的歷史查詢 -29- 201102842 結果’以及歷次用戶對歷史查詢結果的點選頻率的用戶回 饋曰誌; 匹配模組則可以進一步用於根據該用戶回饋日誌中的 歷史查詢結果以及點選頻率確定與第一查詢關鍵詞匹配的 第二查詢關鍵詞。 實施中,用戶回饋日誌獲取模組可以進一步用於獲取 包括:對歷史查詢結果的點選頻率及/或對歷史查詢結果 的內容的點選頻率作爲該點選頻率。 圖3爲匹配模組結構示意圖,如圖所示,匹配模組可 以包括: 內容獲取單元203 1,用於獲取第一關鍵詞的歷史查詢 結果的內容; 分詞單元2032,用於對歷史查詢結果的內容進行分詞 處理獲得分詞後的詞; 匹配單元203 3,用於根據分詞後的詞的點選頻率確定 與第一查詢關鍵詞匹配的第二查詢關鍵詞。 在實施中,分詞單元還可以進一步用於在分詞後獲得 下述方式的詞或者其組合: 分詞後與第一查詢關鍵詞相鄰的詞; 分詞後包含第一查詢關鍵詞的詞; 分詞後包括第一查詢關鍵詞組成部分的詞。 實施中,匹配模組可以進一步用於在根據該歷史查詢 結果以及點選頻率確定與第一查詢關鍵詞匹配的第二查詢 關鍵詞時,該點選頻率大於設定閾値。 -30- 201102842 圖4爲第一查詢關鍵詞獲取模組結構示意圖,如圖所 示’第一查詢關鍵詞獲取模組中可以包括: 資訊內容獲取單元2011,用於獲取用戶輸入的資訊內 容; 分詞/分解單元2012,用於對該資訊內容進行分詞處 理後獲得分詞後的詞,及/或,將該資訊內容分解爲字; 第一查詢關鍵詞確定單元2 0 1 3,用於將分詞後的詞及 /或字作爲第一查詢關鍵詞。 實施中,用戶回饋日誌獲取模組可以進一步用於獲取 包括歷史查詢結果的點擊頻率、歷史查詢結果的曝光頻率 、對歷史查詢結果的閱讀時間、歷史查詢結果的重要度其 中之一或者其組合的參數作爲點選頻率。 實施中,第一查詢關鍵詞獲取模組可以進一步用於在 用戶輸入第一查詢關鍵詞時,獲取該用戶的用戶特徵用戶 特徵;用戶回饋曰誌獲取模組可以進一步用於根據用戶特 徵獲取用戶回饋日誌。 實施中,第一查詢關鍵詞獲取模組可以進一步用於在 用戶輸入第一查詢關鍵詞時’獲取該用戶的用戶特徵; 用戶回饋日誌獲取模組還可以進一步用於在獲取用戶 回饋曰誌時’獲取用戶回饋曰誌中包括以該待匹配詞爲目 標進行查詢的歷史查詢結果,以及用戶對歷史查詢結果的 點選頻率’該歷史查詢結果包括該用戶特徵。 W施中,第一查詢關鍵詞獲取模組還可以進一步用於 在用戶輸入第一查詢關鍵詞時,獲取該用戶的用戶特徵; -31 - 201102842 匹配模組可以進一步用於在根據該用戶回饋日誌確定 第二查詢關鍵詞時,根據用戶特徵確定第二查詢關鍵詞。 圖5爲詞匹配方法實施流程示意圖,如圖所示,在進 行詞匹配時可以包括如下步驟: 步驟501、獲取待匹配詞; 步驟502、根據待匹配詞獲取用戶回饋日誌,該用戶 回饋日誌包括歷次以該待匹配詞爲目標進行查詢的歷史查 詢結果,以及歷次用戶對歷史查詢結果的點選頻率; 步驟503、根據該歷史查詢結果以及點選頻率確定與 待匹配詞匹配的詞。 實施中,點選頻率可以包括:對歷史查詢結果的點選 頻率及/或對歷史查詢結果的內容的點選頻率。 實施中,根據對歷史查詢結果的內容的點選頻率確定 與待匹配詞匹配的詞,可以爲: 獲取待匹配詞的歷史查詢結果的內容; 對歷史查詢結果的內容進行分詞處理獲得分詞後的詞. 201102842 VI. Description of the Invention: [Technical Field] The present application relates to data processing technology, and more particularly to a word matching and information query method and apparatus. [Prior Art] The potential meaning of a word usually refers to the potential meaning of a word (including a phrase), which can usually be expressed by another word or phrases (including phrases), such as the so-called "refrigerator", which is generally called a potential The meaning of the word refers to "refrigerator", while the "cotton drag" in its general case is the meaning of "cotton slippers". Automatic discovery of potential meanings is a fundamental problem in natural language processing. Its solution can improve the understanding of the effects and performance of machine translation and search engines. Word segmentation technology is a commonly used technique in natural language processing. The word segmentation is to divide an input string into several words or phrases. For example, "There is a sincere @情情在我", after the word segmentation, usually The result of the word segmentation of Mai J is "I used to have I - I I I have a feeling I was in front of I I." The user feedback records the query results (text file or web page ID, etc.) corresponding to the query word and the query result point frequency "exposure frequency." Clicking on the _ rate, exposure frequency and other information reflects the user's recognition of the results of the query. In general, the click rate of the text file that meets the user's needs is higher than the click rate that does not meet the user's intention. For example, check the "Western Medicine" and the result "Wholesale The matching degree of the words of Western Medicine-5-201102842" and "Jiangxi Pharmaceutical Factory" is the same, but usually the click rate of the first result will be higher than the second result." By analyzing the user feedback log, you can find the word with the query word. Words with a high degree of meta-matching and different expressions, such as searching for "refrigerator-words, will find many results with "refrigerators", such as "double-door refrigerators", "invented refrigerators", "refrigerator manufacturers" ", "selling refrigerators", "storing refrigerators", etc., collecting results with relatively high click-through rates, and counting the frequency of each word segmentation for the sentence segmentation of the refrigerator, if one or more word segmentation results are greater than the setting The threshold is as follows: the query word is included in a high-frequency word segmentation result, such as "refrigerator" is included in the "refrigerator", The meaning of "refrigerator" is the meaning of "refrigerator"; the query word is included in two adjacent high-frequency participles. For example, the query word "glass bottle" is included in the two high-frequency participles of "glass" and "bottle". This is also often referred to as the "glass bottle" is the potential meaning of "glass bottle". At present, there have been many studies on the automatic discovery of potential semantics, mostly through the co-occurrence or linkage of words to find synonyms. For example, Lu Yong and Hou Hanqing are in the article "based on P a g e R a n k . "Automatic Synonym Recognition of Algorithms" introduces an automatic discovery method for synonyms. The article regards the relationship between interpretation and interpretation between lexicons as a link, and PageRank値 as the semantics between vocabulary The measure of similarity, and then identify the synonym according to the size of the semantic similarity. The disadvantage of this method is: based on the artificially annotated corpus, the number of entries obtained by the control will be limited. If it is based on the Internet-based web pages The chain relationship 'this chain relationship is sometimes very unreliable, the effect of synonym automatic discovery is difficult to get -6 - 201102842 to the protection. Search engine indexing methods include single word search, word segmentation index and mixed index single sub-index need to calculate text The distance between the words in the file is not efficient, and the accuracy is low. For example, when searching for “pesticide,” the word index cannot distinguish the difference between “God Pesticide Factory” and “Shen Nong Pesticide Factory”: Fast, but the index index recall rate is sometimes low, such as search "refrigerator, word segmentation index method Can find the "refrigerator, the result of 'can not find the "refrigerator" results; single-index index and word segmentation index combined with the hybrid index method is usually based on the word segmentation query, and then according to the word index query 'such as $ glass bottle "When you first find the "glass bottle" by the word segment index, and then find other results according to the word index, this makes up for the shortcomings of the two methods, but "glass bottle, which is found according to the word index" Search engines can't distinguish between "glass bottles, and 'the bottleneck of production glass", which affects the accuracy of the search; the previous method lacks sufficient data, or lacks user feedback, and the potential semantics extracted are too little or very It may be wrong. For example, Lu Yong and Hou Hanqing mentioned that the automatic discovery method of word meaning is mainly through the existing sll code data as the source of extraction. The sample size is about several thousand. If it is based on large data volumes such as Internet pages, it is lack of accuracy. Therefore, the shortcoming of the prior art is that when there is a large amount of data such as the Internet, there is no good query solution that can accurately predict the content that the user really needs to check, and therefore cannot give back to the user. The result of the query that the user really needs. 201102842 SUMMARY OF THE INVENTION The present application provides a word matching method and apparatus for providing a scheme for accurately determining the intrinsic relationship between words and words in the presence of massive data and matching them. The embodiment of the present application provides a word matching method, including the following steps: acquiring a to-be-matched word: acquiring a user feedback log according to the to-be-matched word; and determining a word matching the to-be-matched word according to the user feedback log. Preferably, the user feedback log includes a historical query result for querying the target to be matched, and a frequency of clicking on the historical query result by the user. Preferably, the words matching the words to be matched are determined based on the historical query results in the user feedback log and the click frequency. Preferably, the selection frequency includes: a frequency of selection of historical query results and/or a frequency of selection of content of historical query results. Preferably, the determining the word matching the to-be-matched word according to the click frequency of the content of the historical query result comprises: obtaining the content of the historical query result of the to-be-matched word; performing word segmentation on the content of the historical query result to obtain the word segmentation The word » according to the frequency of the word after the word segmentation to determine the word matching the word to be matched - 8 - 201102842 Preferably, the word after the word segment includes the following words or their combined word segmentation and adjacent to the word to be matched The word after the word segmentation contains the words to be matched; the word segmentation includes the words of the part to be matched. Preferably, when the word matching the to-be-matched word is determined according to the query result and the click frequency, the click frequency is greater than the set threshold. Preferably, the obtaining the to-be-matched word comprises: obtaining the information content input by the user; performing word segmentation on the information content to obtain the word after the word segmentation, and/or decomposing the information content into words; And / or word as a word to be matched. Preferably, the selection frequency includes one of a click frequency of the historical query result, an exposure frequency of the historical query result, a reading time of the historical query result, and an importance of the historical query result or a combination thereof. Preferably, the method further includes: acquiring a user feature of the user when the user inputs the word to be matched; and acquiring the user feedback message according to the user feature of the user when the user returns the message. Preferably, the method further includes: acquiring a user feature of the user when the user inputs the word to be matched; and obtaining the historical query result of the user feedback log with the target to be matched as the target when the user feedback log is obtained. And the frequency of the user's selection of the historical query result, the historical query result includes the -9 - 201102842 household feature. Preferably, the method further includes: when the user inputs a word to be matched, acquiring a user feature of the user; and determining, according to the user feedback message, a word that matches the to-be-matched word, determining, according to the user feature, matching the word to be matched word. The embodiment of the present application further provides a word matching device, including: a to-be-matched word acquisition module, configured to acquire a to-be-matched word; a user feedback log acquisition module, configured to obtain a user feedback log according to the to-be-matched word; It is used to determine a word that matches the word to be matched according to the user feedback log and the click frequency. Preferably, the user feedback log obtaining module is further configured to obtain a historical query result including the query for the to-be-matched word, and a user feedback log of the user's click frequency for the historical query result. Preferably, the 'matching module is further configured to determine a word that matches the word to be matched according to the historical query result in the user feedback log and the click frequency. Preferably, the user feedback log obtaining module is further configured to obtain: a click frequency of the historical query result and/or a click frequency of the content of the historical query result as the click frequency. Preferably, the matching module includes: a content acquisition unit, configured to obtain content of a historical query result of the word to be matched; a word segmentation unit, configured to perform word segmentation on the content of the historical query result to obtain a word after the word segmentation; 201102842 A matching unit, configured to determine a word matching the word to be matched according to the frequency of the word selection after the word segmentation. Preferably, the word segmentation unit is further configured to obtain a word or a combination thereof in the following manner after the word segmentation: a word adjacent to the word to be matched after the word segmentation; a word containing the word to be matched after the word segmentation; and a word to be matched after the word segmentation Part of the word. Preferably, the matching module is further configured to: when the word matching the to-be-matched word is determined according to the historical query result and the click frequency, the click frequency is greater than a set threshold. Preferably, the to-be-matched word acquisition module comprises: an information content acquisition unit, configured to acquire information content input by the user; a word segmentation/decomposition unit, configured to perform segmentation processing on the information content, and obtain the word after the word segmentation, and/ Or, the information content is decomposed into words; the to-be-matched word determining unit is configured to use the words and/or words after the word segmentation as the to-be-matched words. Preferably, the user feedback log obtaining module is further configured to obtain one of a click frequency including a historical query result, an exposure frequency of a historical query result, a reading time of a historical query result, and an importance of a historical query result, or a combination thereof. The parameters are used as the selection frequency. Preferably, the to-be-matched word acquisition module is further configured to acquire the user feature of the user when the user inputs the word to be matched; the user feedback log obtaining module is further configured to obtain the user feedback log according to the user feature. -11 - 201102842 Preferably, the to-be-matched word acquisition module is further configured to acquire a user feature of the user when the user inputs the to-be-matched word; the user feedback log obtaining module is further configured to obtain the user feedback The obtaining user feedback log includes a historical query result that is queried with the target to be matched, and a frequency of the user's selection of the historical query result, and the historical query result includes the user feature. Preferably, the to-be-matched word is obtained. The module is further configured to: when the user inputs the word to be matched, obtain the user feature of the user; the matching module is further configured to determine and wait according to the user feature when determining a word that matches the to-be-matched word according to the user feedback log. Match the words that match the word. Based on the same structure, the present application provides an information query method and apparatus for providing a method for accurately determining the true need of a user to query information by using the foregoing matching relationship between words and words in the presence of a large amount of data, and giving feedback. The result of the query that the user really needs. An embodiment of the present application provides an information query method, including the following steps: acquiring an input first query keyword; acquiring a user feedback log according to the first query keyword; and determining, according to the user feed log, that the first query keyword is matched The second query keyword; the feedback result of the query with the second query keyword as the target. Preferably, the user feedback log includes a historical query result for querying the first query keyword and a user's click frequency for the historical query result -12-201102842. Preferably, the second query keyword matching the first query keyword is determined based on the historical query result in the user feedback log and the click frequency. Preferably, the selection frequency includes: a frequency of selection of historical query results and/or a frequency of selection of content of historical query results. Preferably, the determining, by the click frequency of the content of the historical query result, the second query keyword that matches the first query keyword comprises: obtaining the content of the historical query result of the first keyword; The word after the word segmentation process obtains the word segmentation, and the word corresponding to the first query keyword is determined according to the click frequency of the word after the word segmentation. Preferably, the word after the word segmentation refers to a word in the following manner or a word adjacent to the first query keyword after the word segmentation; a word containing the first query keyword after the word segmentation; Query the words of the keyword component. Preferably, when the second query keyword matching the first query keyword is determined according to the historical query result and the click frequency, the click frequency is greater than the set threshold. _Family good, the first query keyword obtained by the input, including: obtaining the information content input by the user; performing the word segmentation processing on the information content to obtain the stomach after the word segmentation, & / or, decomposing the information content into words ; -13 - 201102842 Use the word and / or word after the word segment as the first query keyword. Preferably, the click frequency includes one of a click frequency of a historical query result, an exposure frequency of a historical query result, a reading time for a historical query result, and an importance of a historical query result, or a combination thereof. Preferably, the method further includes: when the user inputs the query keyword, the user feature of the user is obtained; when the user feedback log is obtained, the user feedback message is obtained according to the user feature of the user. Preferably, the method further includes: acquiring a user feature of the user when the user inputs the first query keyword; and acquiring the user feedback log includes obtaining a history of querying the first query keyword The result of the query, and the frequency of the user's selection of the historical query result, the historical query result including the user feature. Preferably, the method further includes: when the user inputs the first query keyword, acquiring the user feature of the user; and determining, according to the user feedback log, the second query keyword according to the user feature. An information query device is further provided in the embodiment of the present application, comprising: a first query keyword acquisition module, configured to obtain an input first query keyword; -14- 201102842 user feedback feed module, for The first query keyword obtains a user feedback log; the matching module is configured to determine a second query keyword that matches the first query keyword according to the user feedback log; and the query result feedback module is configured to feed back the second query key The result of the query for the query for the target. Preferably, the user feedback log obtaining module is further configured to obtain a historical query result including the query of the first query keyword and a user feedback log of the user's click frequency of the historical query result. Preferably, the 'matching module is further configured to determine a second query keyword that matches the first query keyword according to the historical query result in the user feedback log and the click frequency. Preferably, the user feedback log acquisition module is further configured to obtain, as the selection frequency, a click frequency including a history check result and/or a click frequency of a content of the history query result. Preferably, the matching module includes: a content acquiring unit for acquiring content of a historical query result of the first keyword; a word segment unit for performing word segmentation on the content of the historical query result to obtain a word after the word segmentation; 'Used to determine a second query keyword that matches the first query keyword based on the click frequency of the word after the word segmentation. Preferably, the word segmentation unit is further configured to obtain a word or a combination thereof in the following manner after the word segmentation: -15- 201102842 words adjacent to the first query keyword after the word segmentation; words containing the first query keyword after the word segmentation After the word segmentation, the words including the components of the first query keyword are included. Preferably, the matching module is further configured to: when the second query keyword matching the first query keyword is determined according to the historical query result and the click frequency, the click frequency is greater than the setting threshold. Preferably, the first query keyword acquisition module comprises: an information content acquisition unit, configured to acquire information content input by the user; a word segmentation/decomposition unit, configured to perform segmentation processing on the information content to obtain a word after the word segmentation, And/or, the information content is decomposed into words; the first query keyword determining unit is configured to use the word and/or word after the word segmentation as the first query keyword. Preferably, the user feedback log obtaining module is further configured to obtain one of a click frequency including a historical query result, an exposure frequency of a historical query result, a reading time of a historical query result, and an importance of a historical query result, or a combination thereof. The parameters are used as the selection frequency. Preferably, the first query keyword acquisition module is further configured to acquire a user feature of the user when the user inputs the first query keyword; the user feedback log obtaining module is further configured to obtain the user feedback log according to the user feature. . Preferably, the first query keyword acquisition module is further configured to acquire a user feature of the user when the user inputs the first query keyword; the user feedback log obtaining module is further configured to: when acquiring the user feedback log, The obtained user feedback log includes a historical query result that is queried by the to-be-matched word, and a user selects a frequency of the historical query result. The historical query result includes the user feature. Preferably, the first query keyword acquisition module is further configured to acquire a user feature of the user when the user inputs the first query keyword; the matching module is further configured to determine a second according to the user feedback log. When the keyword is queried, the second query keyword is determined according to the user feature. The utility model has the following beneficial effects: In the implementation of the present application, after obtaining the input first query keyword, the user feedback log of the first query keyword is obtained. And the user feedback log includes a historical query result for querying the first query keyword and a frequency of clicking the user on the historical query result; and then determining the first query according to the historical query result and the click frequency The second query keyword matched by the keyword; the last feedback is the query result of the query with the matched second query keyword as the target. In this process, user feedback is used as the basis for discovering the potential meaning of the user's query information. Therefore, with a large amount of data, the user's feedback information can be used to accurately determine the potential meaning of the query information, thereby improving the potential meaning of the query information. The accuracy of the information query. [Embodiment] Hereinafter, specific embodiments of the present application will be described with reference to the accompanying drawings. FIG. 1 is a schematic diagram of an implementation process of the information query method. As shown in the figure, the method may include the following steps: -17- 201102842 Step 101: Acquire an input first query keyword; Step 102: Obtain a user feedback log according to the first query keyword; The user feedback log includes a historical query result for querying the first query keyword and a click frequency 1 of the user to the historical query result. Step 103, determining and the first query keyword according to the historical query result and the click frequency The matched second query keyword: Step 104: Retrieve the query result that is queried with the second query keyword as the target. The specific implementation of each step will be described below. In step 01, 'for the first query keyword, it may be: · obtain the information content input by the user; after the word segmentation processing, obtain the word after the word segmentation, and/or, decompose the information content into words: The words and/or words after the word segmentation are used as the first query keyword. It can be seen that the keyword used for querying in the implementation process of the present application may be a word or a word 'when it is a word, it can be regarded as a commonly-called single-word query'. Querying with various query units such as words or words, or combining them, can obviously make the query results more accurate and accurate. In step 102, the user feedback log generally refers to the keyword used by the search engine to collect the user input and the historical query result (usually the web page file ID, etc.) and the click frequency and exposure rate of the historical query result. In the implementation, the user feedback log may include historical query results that are queried by the first query -18-201102842 keyword, and the user's click frequency of the historical query result 'user feedback log' as the potential meaning of the user. The sample 'can use the previous records, but the purpose of the user feedback log is to determine the intrinsic relationship between words and words through the previous records, thus establishing the potential meaning of the word 'as long as the purpose can be achieved, it is obvious that some historical query results can also be selected. Or randomly select and other methods to collect samples that determine the meaning of the potential word. By the same token, when the user feedback log is selected, the user is not selected as the object, but the target is searched for in the history. For example, the user who needs to obtain the first query keyword is “Western medicine” is given feedback. In the log, the user feedback log of all or part of the users who used the "Western Medicine" as the query word was obtained. The automatic discovery of potential meanings refers specifically to finding a word (phrase) and another word (phrase) or words (phrases) that are related or similar. The essence of the embodiment of the present application is that the user feedback log recorded by the user can be used to automatically discover the potential semantic relationship between the query word and the historical query result, and use the relationship to improve the accuracy of the search engine. wisdom. Therefore, the user feedback log may include a historical query result that is queried for the first query keyword, and a frequency of clicks of the previous user on the historical query result. And in the steps! In 〇3, the potential meaning of the first query keyword is found based on the historical query result and the frequency of the selection. That is, 'the user feedback log is obtained in step 1 〇2, and the user feedback log is used to determine the potential meaning of the first query keyword, so that the first query keyword can be output through step 丨03 and step I 〇丨A second query keyword with a potential semantic relationship. -19- 201102842 where 'the selection frequency may include: a frequency of selection of historical query results and/or a frequency of selection of content of historical query results. The specific implementation of step 103 will be described below. First, a second query keyword matching the first query keyword is determined based on the click frequency of the content of the historical query result. Obtaining the content of the historical query result of the first keyword; performing word segmentation on the content of the historical query result to obtain the word after the word segmentation » determining the second query keyword matching the first query keyword according to the click frequency of the word after the word segmentation . The word after the word segmentation in the implementation refers to the words or combinations thereof in the following manner: The first word: the word adjacent to the first query keyword after the word segmentation, for convenience of description, the point in the case in the embodiment The statistical result related to the frequency selection is recorded as P 1 ; the second word: the word including the component of the first query keyword after the word segmentation, for the convenience of description, the statistical result related to the frequency of the selection in this case in the embodiment It is P 2 ; The third word: the word containing the first query keyword after the word segmentation, for the convenience of description, the statistical result of the click frequency in this case in the embodiment is recorded as P 3 . The implementation principle of step 103 will be described below first. The user feedback log is used to record the historical query result corresponding to the query word and the click rate and exposure frequency of the historical query result, such as the query result is a web page, etc.: the inventor notices during the invention: for a query -20 - 201102842 The higher the word hit rate, the more relevant the page is to the query term. The potential meaning of a word refers to its synonymous, synonymous or partially synonymous words, such as "glass bottle" and "glass bottle", as well as the words "double bed", "single bed", "spring bed" and so on. The meaning of the potential "bed", while the "machine tool" and so on is not the meaning of the "bed". Three potential meanings are defined in the embodiment of the present application: the first word is a word that often appears in pairs, such as "Motorola" and "company" > "Motorola" and "mobile phone", the relationship is usually a word and The other word is closely related 'that is, some words after the word segment are adjacent to the query word; the second word is a word and a number of other words that appear in a certain order, such as "glass bottle" and "glass" "bottle" "beauty" and "beautiful" "woman", that is, after the word segmentation, it contains the components of the query word; the third word is a word is a word component, such as "shrimp" and "prawn" '"wine" And "beer", that is, the word after the word segmentation contains the query word. These potential words automatically discovered by user feedback such as click-through rate often represent the potential intent of the search keyword entered by the user, which can be used to improve the accuracy of the search engine. For example, when the user searches for "bed", the actual intention of most users is to sleep. Beds such as "single beds", "double beds", "wooden beds", etc., rather than mechanical equipment such as "machine tools" or "lathes". By means of user clicks and other feedbacks, it is possible to know that the former has the potential meaning of "bed", while the latter (machine tool, etc.) does not. In the specific implementation, the first query keyword, the historical query result (webpage, text file ID, etc.) and the click rate, exposure rate, and the like of the historical query result are first input, or one of the steps is input. A query keyword and step 102 obtain the execution result of the user feedback log-21 - 201102842; then the β brother queries the keyword for the word segmentation, and if the first query keyword includes multiple words ', the user of the query word is fed back The corresponding historical query result and related information in the log are added to each corresponding part of the query word, that is, 'making each query word have its own historical query result after the word segmentation' Each query (query) of the user feedback log is a separate participle; then the word or part of each word segment is processed separately for p 1 , P 2 , P 3 , until all or part of the word segmentation After the word processing is completed, the historical query result can be selected based on the total number of queries, clicks, exposures, etc. of the historical query results or Determine; the historical query results corresponding to the words after the word segmentation are processed until all the historical query results are processed; from the historical query results in the user feedback log, all the strings that exactly match the word after the word segmentation are found (the exact match here) It means that the word after the word segment is a substring of the string. The scale of the string can be the length of the sentence containing the word after the word segmentation, or the length of the word after the word segmentation, and the number can be any number greater than 1. Then, after the word segmentation, the above-mentioned processing related to PI, Ρ2, Ρ3 is performed. It should be noted that, in order to facilitate the description of the following example, the text file is used as the query result, and the implementation time 'consides the point of the query result. The frequency of selection and the frequency of the selection of the contents of the query results 'obviously, only one of them can also achieve the purpose of the application. In the specific implementation, when inputting the first query keyword, the historical query result (web page, text file ID, etc.) and the click rate and exposure rate of the historical query result or one of them, a query dictionary can be set, and the history is input in advance. Query results (web pages, text file IDs, etc.) and historical query results -22- 201102842 such as click rate, exposure rate, etc., so that when the first query keyword is entered, the query dictionary can be quickly obtained. The second query keyword. That is, the contents of the previous user feedback log are stored in advance for query, and the query dictionary can be updated at any time according to the new user feedback log: of course, the user feedback log can be called after the first query keyword is input. The first type: the implementation of words adjacent to the first query keyword after the word segmentation. If the first query keyword is a participle of the string, for example, the first query keyword is “beauty”, the historical query result of the user feedback is “China I Ancient I Beauty|Xishi Shiyi Name|Yiguang Bu|Spring and Autumn丨 丨 丨 丨 ” ” ” ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( ( A coefficient of weight, recorded as the number of weights (1), added to the statistics of the total query results P 1, P 1 contains the weight of each word appearing before and after the first query keyword (1), such as this example If the text file has a weight of 0. 5, then the results of "ancient" and "Xishi" in p1 (this is only the case where τ is equal to 1) will be correspondingly increased. 5. The second type: the word after the word segmentation includes the components of the first query keyword. If the first query keyword is included in a plurality of word segmentation results adjacent to the string, for example, the first query keyword is “beauty”, the historical query result in the user feedback log is “Xishi Shi I is a beautiful I woman. I ” (where “ Γ indicates the result of the word segmentation”, then the number of occurrences of the participle of the first query keyword and multiplied by the click frequency/exposure frequency (or one of them) of the text file as a coefficient of the weight' is counted as the number of times Weighted (2), added to the total -23- 201102842 query results statistics P2, P2 is the number of times the multiple word segments including the first query keyword appear in the same order weighted (2), for example, in this case, if the text The weight of the file is 0. 3, then add the result of the corresponding 丨 丨 Γ P P in P 2. 3. The third type: the word containing the first query keyword after the word segmentation. If the first query keyword is a string of word segmentation, for example, the query word is "refrigerator", the historical query result in the user feedback log is "refrigerator I air conditioner 丨 principle | and | repair" (here "representation participle" Result), at this time, the number of occurrences of the participle of the first query keyword will be included and multiplied by the click frequency and exposure frequency (or one of them) of the text file as a coefficient of weight, which is recorded as the weight of the number (3), added to the total The statistics of the query results P3, P3 are the number of times the number of occurrences of the participle including the first query keyword is weighted (3), for example, in this example, if the weight of the text file is 0. 8, the result of the "refrigerator" in P3 is increased by 0. 8. Repeatedly until all the user feedbacks for the words after the single word segmentation are all processed; the weighted sum of the number of occurrences of the word segmentation in P1, the weighted sum of the number of times is greater than the set word segmentation of the first threshold, and these participles are As the first potential semantic relationship of the query word, similarly, according to the number of occurrences of the participle in P2 'P3, the weighted sum is taken to be greater than the set second and third threshold 分, and these participles are used as the word. The second potential word meaning and the third potential word meaning relationship. Those skilled in the art will readily appreciate that one of the three selection potential meanings may be selected in the implementation, or any two combinations or three combinations may be used. Also in the implementation, the first and second 'third thresholds may be fixed thresholds-24- 201102842 値, can also dynamically set the query result based on the overall query result, such as the sum of all the text files containing the matching string, and then multiply by a coefficient, which can be dynamically set according to the query result: the purpose of the threshold setting The word that selectively determines the potential meaning of a part of the query word does not unconditionally return all the words. In a specific implementation, when determining the second query keyword that matches the first query keyword according to the historical query result and the click frequency, the click frequency may be required to be greater than the setting threshold, wherein the click frequency may be the user history. The frequency of the selection of the query result may also be the frequency of the user's selection of the content of the historical query result. The purpose is to use the click frequency and the exposure frequency (or one of them) of the text file or its content as a coefficient of weight, which can be combined with one or both of the click rate and the exposure rate, the coefficient size and the click. And the exposure frequency may be a linear or non-linear relationship, such as (not limited to) the frequency of both is higher than a certain threshold 値, and the others are 0; or the click rate and the highest exposure rate are 1, and the others are divided by The maximum 値 is normalized to [〇, 1]. The purpose of selecting the frequency is to find out the potential meaning of the word. Therefore, by setting the threshold 过滤 to filter some information with lower frequency, the speed of discovering the potential meaning of the word can be improved, and some information interference can be avoided. In the implementation, the selection frequency includes one of the point frequency of the historical query result, the exposure frequency of the historical query result, the reading time of the historical query result, and the importance of the historical query result or a combination thereof. Those skilled in the art can easily understand that the click frequency and the exposure frequency (or one of them) of the text file as a coefficient of the weight 'coefficient can also be other words of the text file - such as reading time, importance, etc. Or a combination of light rates. In the implementation, the potential meaning is not only the query word but also the reverse. For example, the "glass bottle" is potentially equivalent to the "glass bottle" potential "glass bottle potential word meaning "refrigerator", equivalent to "refrigerator *" 0 in determining the potential word steps of the first query keyword 104. Step 104: Retrieve the query result of the query by using the potential word meaning as a target. In the implementation, if the input is obtained in step 101, the following processing may be performed: obtaining the information content input by the user; performing the word segmentation processing on the information content, or transferring the information content into words; and the word after the word segmentation and/or When the word is used as the first query, when determining the first query keyword, the information content input by the user may be segmented first, or the information content may be queried by word. Obviously, the two methods can be performed at the same time. In the combination, the query result can be queried first, then the query word segmentation is performed according to the query word, and finally the single word query is performed. The result of the word segmentation is the result of the word. The result of the word is searched from the word segmentation index; the relationship between the single word or the potential word meaning of the click rate is exposed,] the meaning of "glass I bottle", or "refrigerator" "potential meaning" after the refrigerator, It can be executed, that is, the word after the second query key word of the first query keyword, and / the keyword β are used by two sources, and then the unit after the word segmentation is decomposed and then can be combined, the word segmentation And then according to the potential semantics of the sub-inquiry, the query according to the query word refers to the query result from the single word -26 - 201102842 index; the potential semantic query refers to the use of the potential meaning of the query word to obtain the query result 'for the above embodiment The three semantics (or any one of them) are treated separately (or separately) as follows: For the first potential word, the query is obtained by querying the word "query + first potential word", such as a query. The word is “Motorola”, then the corresponding first potential word meaning is “Motorola” and “Motorola”. It is assumed here that the first potential word meaning of "Motorola" is "company" and "mobile phone": for the second potential meaning word, the query result is obtained by the "adjacent query word" of the second potential meaning, such as " The corresponding second potential meaning of the glass bottle is “glass I bottle”; for the third potential word, the result is obtained by the third potential word, such as the query “refrigerator”, The word of the three potential words is "refrigerator". Obviously, the query result based on the potential word sense query should be more relevant than the single word query when calculating the correlation degree between the query word and the text file. The relevance of the relevance will affect the ranking of the query results (according to the degree of relevance and the webpage). The importance level, etc., such as PageRank). Further, in the implementation, when the first query keyword is obtained in step 101, the user feature of the user who inputs the first query keyword is also obtained; that is, the user of the user may be obtained when the user inputs the first query keyword. feature. In this way, when the user feedback log is obtained in step 1 ’ 2, the user feedback log can also be obtained according to the user characteristics. Or, in the process of obtaining the user feedback log, the 'acquisition user feedback log -27-201102842 includes the historical query result of the query with the first query keyword as the target, and the frequency of the user's selection of the historical query result, and in these history These user characteristics are included in the query results. Alternatively, when the second query keyword is determined according to the user feedback log, the second query keyword is determined according to the user feature. That is, when the second query keyword is matched according to the user feedback log, the second query keyword may be matched according to the user feature inputting the first query keyword. The user feature is used to select the user feedback log, which is helpful for further discovering the potential meaning of the first query keyword. For example, according to the foregoing embodiment, when the user searches for "bed", the actual intention of most users is a sleeping bed, such as "single bed", "double bed", "wooden bed", etc., instead of mechanical equipment such as " Machine tool or "lathe". At this time, the user can know that the former has the potential meaning of "bed" through the user's click and other feedback, while the potential word meaning does not include "machine tool", etc.; however, the same query keyword "bed" if the user is a technician in the field of mechanical equipment. , the potential meaning of the word should be "machine tool", not "single bed", "double bed", "wooden bed", etc., in this embodiment '"the technical personnel in the field of mechanical equipment" is the user feature, The role is to classify the user feedback log to better discover the potential meaning of the word. For another example, the first query keyword input by the user is “Apple”, if the user feature is a computer worker, the second query keyword of the computer class is matched; if the user feature is an agricultural scientist, the matching fruit type is Two keywords. In a specific implementation, the user characteristics may include the user's area (such as the country, region, and city), the webpage -28 - 201102842 that the user frequently visited, the webpage that the user browsed recently, the search keyword that the user previously input, User's gender 'age' occupation, hobbies, etc. For the analysis of user characteristics, technical means such as analyzing IP address, analyzing user end browser history data, analyzing client COOKIE data, analyzing user online registration information, etc. can be used as needed, which is known to those skilled in the art. It is easy to understand. Based on the same inventive concept, the present application further provides a word matching method and device, and an information query device. Since the word matching method and device, the information query device and the information query method are based on the same inventive concept, they have similar principles, and thus The implementation of the word matching method and device, and the information query device may refer to the implementation of the information query method, and the repeated description will not be repeated. FIG. 2 is a schematic diagram of the structure of the information query device. As shown in the figure, the device may include: The module 203 is configured to obtain the input first query keyword; the user feedback log obtaining module 202 is configured to obtain the user feedback log of the first query keyword; the matching module 203 is configured to The log determines a second query keyword that matches the first query keyword. The query result feedback module 204 is configured to feed back a query result that is queried with the second query keyword as a target. In the implementation, the user feedback log obtaining module may be further used to obtain a historical query -29-201102842 result including the previous query with the first query keyword, and user feedback of the previous user's click frequency of the historical query result. The matching module may be further configured to determine a second query keyword that matches the first query keyword according to the historical query result in the user feedback log and the click frequency. In the implementation, the user feedback log obtaining module may be further configured to obtain, as the selection frequency, a click frequency of the historical query result and/or a click frequency of the content of the historical query result. FIG. 3 is a schematic structural diagram of a matching module. As shown in the figure, the matching module may include: a content acquiring unit 203 1 for acquiring content of a historical query result of the first keyword; and a word segmentation unit 2032 for using the historical query result The content is subjected to word segmentation to obtain a word after the word segmentation; the matching unit 203 3 is configured to determine a second query keyword that matches the first query keyword according to the click frequency of the word after the word segmentation. In the implementation, the word segmentation unit may further be used to obtain a word or a combination thereof in the following manner after the word segmentation: a word adjacent to the first query keyword after the word segmentation; a word containing the first query keyword after the word segmentation; A word that includes the components of the first query keyword. In an implementation, the matching module may be further configured to: when the second query keyword matching the first query keyword is determined according to the historical query result and the click frequency, the click frequency is greater than a set threshold. -30- 201102842 FIG. 4 is a schematic structural diagram of a first query keyword acquisition module. As shown in the figure, the first query keyword acquisition module may include: an information content acquisition unit 2011, configured to acquire information content input by the user; a word segmentation/decomposition unit 2012, configured to perform a word segmentation process on the information content to obtain a word after the word segmentation, and/or to decompose the information content into words; a first query keyword determining unit 2 0 1 3 for dividing the word segmentation The following words and / or words as the first query keyword. In an implementation, the user feedback log obtaining module may be further used to obtain one of a click frequency including a historical query result, an exposure frequency of a historical query result, a reading time of a historical query result, and an importance degree of a historical query result, or a combination thereof. The parameter is used as the click frequency. In the implementation, the first query keyword obtaining module may be further configured to acquire the user feature user feature of the user when the user inputs the first query keyword; the user feedback module may further be used to acquire the user according to the user feature. Reward the log. In the implementation, the first query keyword obtaining module may be further configured to: when the user inputs the first query keyword, the user feature is acquired; the user feedback log obtaining module may further be used to obtain the user feedback The 'acquisition user feedback record includes a historical query result that is queried with the target to be matched, and a user selects a frequency of the historical query result', and the historical query result includes the user feature. In the application, the first query keyword acquisition module may be further configured to acquire the user feature of the user when the user inputs the first query keyword; -31 - 201102842 the matching module may be further used to feed back according to the user When the log determines the second query keyword, the second query keyword is determined according to the user feature. FIG. 5 is a schematic diagram of a process for implementing the word matching method. As shown in the figure, the following steps may be included in the process of performing word matching: Step 501: Acquire a word to be matched; Step 502: Obtain a user feedback log according to the word to be matched, where the user feedback log includes The historical query result that is queried with the target to be matched, and the frequency of the previous user's selection of the historical query result; Step 503: Determine a word that matches the to-be-matched word according to the historical query result and the click frequency. In the implementation, the click frequency may include: a click frequency of the historical query result and/or a click frequency of the content of the historical query result. In the implementation, the word matching the to-be-matched word is determined according to the click frequency of the content of the historical query result, which may be: obtaining the content of the historical query result of the word to be matched; performing word segmentation on the content of the historical query result to obtain the word segmentation word
I 根據分詞後的詞的點選頻率確定與待匹配詞匹配的詞 0 實施中,分詞後的詞是指下述方式的詞或者其組合: 分詞後與待匹配詞相鄰的詞; 分詞後包含待匹配詞的詞; 分詞後包括待匹配詞組成部分的詞。 實施中,在根據該歷史查詢結果以及點選頻率確定與 -32- 201102842 待匹配詞匹配的詞時,該點選頻率大於設定閾値。 獲取待匹配關鍵詞時,可以爲: 獲取用戶輸入的資訊內容; 對該資訊內容進行分詞處理後獲得分詞後的詞,及/ 或,將該資訊內容分解爲字; 將分詞後的詞及/或字作爲待匹配詞。 實施中,點選頻率可以包括歷史查詢結果的點擊頻率 、歷史查詢結果的曝光頻率、對歷史查詢結果的閱讀時間 、歷史查詢結果的重要度其中之一或者其組合。 實施中,還可以進一步包括: 在用戶輸入待匹配詞時,獲取該用戶的用戶特徵; 獲取用戶回饋日誌時,根據用戶特徵獲取用戶回饋曰 誌。 實施中,還可以進一步包括: 在用戶輸入待匹配詞時,獲取該用戶的用戶特徵; 獲取用戶回饋日誌時,獲取用戶回饋日誌中包括以該 待匹配詞爲目標進行查詢的歷史查詢結果,以及用戶對歷 史查詢結果的點選頻率,該歷史查詢結果包括該用戶特徵 〇 實施中,還可以進一步包括: 在用戶輸入待匹配詞時,獲取該用戶的用戶特徵; 根據用戶回饋曰誌確定與待匹配詞匹配的詞時’根據 該用戶特徵確定與待匹配詞匹配的詞。 圖6爲詞匹配裝置結構示意圖,如圖所示’可以包括 -33- 201102842 待匹配詞獲取模組60 1,用於獲取待匹配詞; 用戶回饋日誌獲取模組602,用於根據待匹配詞獲取 用戶回饋日誌; 匹配模組603,用於根據該用戶回饋日誌確定與待匹 配詞匹配的詞。 實施中,用戶回饋日誌獲取模組可以進一步用於獲取 包括歷次以該待匹配詞爲目標進行查詢的歷史查詢結果, 以及歷次用戶對歷史查詢結果的點選頻率的用戶回饋日誌 匹配模組可以進一步用於根據該用戶回饋日誌中的歷 史查詢結果以及點選頻率確定與待匹配詞匹配的詞。 用戶回饋日誌獲取模組可以進一步用於獲取包括:對 歷史查詢結果的點選頻率及/或對歷史查詢結果的內容的 點選頻率作爲該點選頻率。 實施中,匹配模組可以包括: 內容獲取單元,用於獲取待匹配詞的歷史查詢結果的 內容; 分詞單元’用於對歷史查詢結果的內容進行分詞處理 獲得分詞後的詞; 匹配單元,用於根據分詞後的詞的點選頻率確定與待 匹配詞匹配的詞。 分詞單元可以進一步用於在分詞後獲得下述方式的詞 或者其組合: -34- 201102842 分詞後與待匹配詞相鄰的詞; 分詞後包含待匹配詞的詞; 分詞後包括待匹配詞組成部分的詞。 匹配模組可以進一步用於在根據該歷史查詢結果以及 點選頻率確定與待匹配詞匹配的詞時,該點選頻率大於設 定閾値。 待匹配詞獲取模組可以包括: 資訊內容獲取單元,用於獲取用戶輸入的資訊內容: 分詞/分解單元,用於對該資訊內容進行分詞處理後 獲得分詞後的詞,及/或,將該資訊內容分解爲字; 待匹配詞確定單元,用於將分詞後的詞及/或字作爲 待匹配詞。 用戶回饋日誌獲取模組可以進一步用於獲取包括歷史 查詢結果的點擊頻率、歷史查詢結果的曝光頻率、對歷史 查詢結果的閱讀時間、歷史查詢結果的重要度其中之一或 者其組合的參數作爲點選頻率。 實施中,待匹配詞獲取模組進一步用於在用戶輸入待 匹配詞時,獲取該用戶的用戶特徵;用戶回饋曰誌獲取模 組進一步用於根據用戶特徵獲取用戶回饋日誌。 實施中,待匹配詞獲取模組可以進一步用於在用戶輸 入待匹配詞時,獲取該用戶的用戶特徵; 用戶回饋曰誌獲取模組還可以進一步用於在獲取用戶 回饋日誌時’獲取用戶回饋日誌中包括以該待匹配詞爲目 標進行查詢的歷史查詢結果,以及用戶對歷史查詢結果的 -35- 201102842 點選頻率,該歷史查詢結果包括該用戶特徵。 實施中,待匹配詞獲取模組還可以進一步用於在用戶 輸入待匹配詞時,獲取該用戶的用戶特徵; 匹配模組可以進一步用於在根據該用戶回饋日誌確定 與待匹配詞匹配的詞時,根據該用戶特徵確定與待匹配詞 匹配的詞。 由上述實施例可知,本申請實施中基於對用戶回饋曰 誌分析,因而能夠自動發現詞語的潛在語意,從而能夠準 確發現詞之間的內在聯繫;進一步的,還利用自動發現詞 語的潛在語意及將查詢詞的相關語意用來提高搜尋引擎的 效果;進一步的,在自動發現查詢詞的潛在詞義時,還可 以根據査詢詞前後單字的詞頻,而不是僅用分詞結果來達 到類似的效果。因此,在本申請實施例中通過自動發現詞 的潛在詞義提高搜尋引擎的性能,與傳統方式相比,能夠 提高搜尋的精確度及效率; 例如與現有技術中陸勇'侯漢清提到的詞義自動發現 方法相比,其主要是通過已有的詞典資料作爲抽取來源, 樣本量在幾千條左右。如果它是以網際網路網頁等大資料 量爲基礎來抽取’就會缺乏準確性。而本申請實施中通過 用戶參與的用戶回饋日誌,就可以非常可靠的自動發現查 詢詞及查詢結果之間體現用戶意圖的潛在詞義關係,特別 適合原來提高搜尋引擎的準確率及智慧。 爲了描述的方便,描述以上系統時以功能分爲各種模 組或單元分別描述。當然,在實施本發明時可以把各模組 -36- 201102842 或單元的功能在同一個或多個軟體及/或硬體中實現。 本領域內的技術人員應明白,本申請的實施例可提供 爲方法、系統、或電腦程式產品。因此,本申請可採用完 全硬體實施例、完全軟體實施例、或結合軟體及硬體方面 的實施例的形式。而且,本申請可採用在一個或多個其中 包含有電腦可用程式碼的電腦可用儲存介質(包括但不限 於磁盤記億體、CD-ROM、光學記憶體等)上實施的電腦 程式產品的形式。 本申請是參照根據本申請實施例的方法、設備(系統 )、及電腦程式產品的流程圖及/或方框圖來描述的。應 理解可由電腦程式指令實現流程圖及/或方框圖中的每一 流程及/或方框、以及流程圖及/或方框圖中的流程及/ 或方框的結合。可提供這些電腦程式指令到通用電腦、專 用電腦、嵌入式處理機或其他可編程資料處理設備的處理 器以產生一個機器’使得通過電腦或其他可編程資料處理 設備的處理器執行的指令產生用於實現在流程圖一個流程 或多個流程及/或方框圖一個方框或多個方框中指定的功 能的裝置。 這些電腦程式指令也可儲存在能引導電腦或其他可編 程資料處理設備以特定方式工作的電腦可讀記憶體中’使 得儲存在該電腦可讀記憶體中的指令產生包括指令裝置的 製造品’該指令裝置實現在流程圖一個流程或多個流程及 /或方框圖一個方框或多個方框中指定的功能。 這些電腦程式指令也可裝載到電腦或其他可編程資料 -37- 201102842 處理設備上’使得在電腦或其他可編程設備上執行一系列 操作步驟以產生電腦實現的處理,從而在電腦或其他可編 程設備上執行的指令提供用於實現在流程圖一個流程或多 個流程及/或方框圖一個方框或多個方框中指定的功能的 步驟。 儘管已描述了本申請的較佳實施例,但本領域內的技 術人員一旦得知了基本創造性槪念,則可對這些實施例作 出另外的變更及修改。所以,所附申請專利範圍意欲解釋 爲包括較佳實施例以及落入本申請範圍的所有變更及修改 〇 顯然,本領域的技術人員可以對本申請進行各種改動 及變形而不脫離本申請的精神及範圔。這樣,倘若本申請 的這些修改及變形屬於本申請申請專利範圍及其等同技術 的範圍之內,則本申請也意圖包含這些改動及變形在內。 【圖式簡單說明】 圖1爲本申請實施例中資訊查詢方法實施流程示意圖 » 圖2爲本申請實施例中資訊查詢裝置結構示意圖; 圖3爲本申請實施例中匹配模組結構示意圖; 圖4爲本申請實施例中第一查詢關鍵詞獲取模組結構 示意圖; 圖5爲本申請實施例中詞匹配方法實施流程示意圖; 圖6爲本申請實施例中詞匹配裝置結構示意圖。. -38- 201102842 【主要元件符號說明】 20 1 ··第一查詢關鍵詞獲取模組 202 :用戶回饋日誌獲取模組 2 0 3 :匹配模組 2 04 :查詢結果回饋模組 2031:內容獲取單元 2 0 3 2 :分詞單兀 2033:匹配單元 2 0 1 1 :資訊內容獲取單元 2012:分詞/分解單元 2013 :第一查詢關鍵詞確定單元 6 0 1 :待匹配詞獲取模組 602 :用戶回饋曰誌獲取模組 603 :匹配模組 -39-I Determine the word 0 that matches the word to be matched according to the frequency of the word selection after the word segmentation. In the implementation, the word after the word segment refers to the word or combination of the following: the word adjacent to the word to be matched after the word segmentation; The word containing the word to be matched; the word after the word segment includes the words of the part to be matched. In the implementation, when the word matching the -32-201102842 to be matched word is determined according to the historical query result and the click frequency, the click frequency is greater than the set threshold. When the keyword to be matched is obtained, the information content input by the user may be obtained: the word after the word segmentation is processed, and the word after the word segmentation is obtained, and/or the information content is decomposed into words; the word after the word segmentation and/or Or word as a word to be matched. In the implementation, the click frequency may include one of the click frequency of the historical query result, the exposure frequency of the historical query result, the reading time of the historical query result, and the importance of the historical query result or a combination thereof. In the implementation, the method further includes: acquiring a user feature of the user when the user inputs the word to be matched; and acquiring the user feedback message according to the user feature when the user feedback log is obtained. In the implementation, the method further includes: acquiring a user feature of the user when the user inputs the word to be matched; and obtaining a historical query result of the user feedback log, including the target to be matched, and The user selects the frequency of the historical query result, and the historical query result includes the user feature 〇 implementation, and may further include: acquiring the user feature of the user when the user inputs the word to be matched; determining and waiting according to the user feedback When matching words matching words, 'the words matching the words to be matched are determined according to the user characteristics. 6 is a schematic structural diagram of a word matching device, which may include a -33-201102842 to-be-matched word acquisition module 601 for acquiring a word to be matched; a user feedback log obtaining module 602 for using a word to be matched Obtaining a user feedback log; the matching module 603 is configured to determine, according to the user feedback log, a word that matches the to-be-matched word. In the implementation, the user feedback log obtaining module may be further used to obtain a historical query result including the previous query with the to-be-matched word as the target, and the user feedback log matching module of the previous user's selection frequency of the historical query result may further The method is used to determine a word that matches the to-be-matched word according to the historical query result in the user feedback log and the click frequency. The user feedback log obtaining module may be further configured to obtain, as the selection frequency, a click frequency of the historical query result and/or a click frequency of the content of the historical query result. In an implementation, the matching module may include: a content obtaining unit, configured to obtain content of a historical query result of the word to be matched; a word segment unit “for performing word segmentation on the content of the historical query result to obtain a word after the word segmentation; The word matching the word to be matched is determined according to the frequency of the click of the word after the word segmentation. The word segmentation unit may be further used to obtain a word or a combination thereof in the following manner after the word segmentation: -34- 201102842 words adjacent to the word to be matched after the word segmentation; words containing the word to be matched after the word segmentation; Part of the word. The matching module may be further configured to: when the word matching the to-be-matched word is determined according to the historical query result and the click frequency, the click frequency is greater than a set threshold. The to-be-matched word acquisition module may include: an information content acquisition unit, configured to acquire information content input by the user: a word segmentation/decomposition unit, configured to perform segmentation processing on the information content, obtain a word after the word segmentation, and/or The information content is decomposed into words; the to-be-matched word determining unit is used to use the words and/or words after the word segmentation as the to-be-matched words. The user feedback log obtaining module may be further configured to obtain, as a point, a click frequency including a historical query result, an exposure frequency of a historical query result, a reading time of a historical query result, an importance of a history query result, or a combination thereof. Select the frequency. In the implementation, the to-be-matched word acquisition module is further configured to acquire the user feature of the user when the user inputs the to-be-matched word; the user feedback mode acquisition module is further configured to obtain the user feedback log according to the user feature. In the implementation, the to-be-matched word acquisition module may be further used to obtain the user feature of the user when the user inputs the word to be matched; the user feedback module may further be used to obtain the user feedback when acquiring the user feedback log. The log includes a historical query result that is queried with the target to be matched, and a user selects a frequency of the historical query result of -35-201102842, and the historical query result includes the user feature. In the implementation, the to-be-matched word acquisition module may be further configured to: when the user inputs the to-be-matched word, obtain the user feature of the user; the matching module may be further configured to: determine, according to the user feedback log, a word that matches the to-be-matched word. At the time, words matching the words to be matched are determined according to the user characteristics. It can be seen from the above embodiments that the implementation of the present application is based on the feedback analysis of the user, so that the potential semantics of the words can be automatically found, so that the internal relationship between the words can be accurately found; further, the potential semantic meaning of the words is automatically found. The relevant semantics of the query words are used to improve the search engine's effect; further, when the potential meaning of the query words is automatically found, the word frequency of the single words before and after the query words can be used instead of using only the word segmentation results to achieve a similar effect. Therefore, in the embodiment of the present application, the performance of the search engine is improved by automatically discovering the potential meaning of the word, and the accuracy and efficiency of the search can be improved compared with the traditional method; for example, the word meaning automatically mentioned in the prior art by Lu Yong' Hou Hanqing Compared with the discovery method, it mainly uses the existing dictionary data as the source of extraction, and the sample size is about several thousand. If it is based on large amounts of data such as Internet pages, it will lack accuracy. In the implementation of the application, the user feedback log recorded by the user can automatically discover the potential meaning relationship between the query word and the query result, which is particularly suitable for improving the accuracy and wisdom of the search engine. For the convenience of description, the above system is described as being divided into various modules or units by function. Of course, in the practice of the present invention, the functions of each module -36-201102842 or unit can be implemented in the same software or software and/or hardware. Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of a complete hardware embodiment, a fully software embodiment, or an embodiment combining the software and hardware. Moreover, the present application can take the form of a computer program product implemented on one or more computer usable storage media (including but not limited to disk recorders, CD-ROMs, optical memories, etc.) containing computer usable code therein. . The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowcharts and/or block diagrams, and combinations of flow and/or blocks in the flowcharts and/or block diagrams can be implemented by computer program instructions. These computer program instructions can be provided to a general purpose computer, a special purpose computer, an embedded processor or other programmable data processing device processor to generate a machine that enables the generation of instructions by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more flows of the flowchart or in a block or blocks of the flowchart. The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner 'to cause an instruction stored in the computer readable memory to produce an article of manufacture including the instruction device'. The instruction means implements the functions specified in one or more blocks of the flow or in a flow or block diagram of the flowchart. These computer program instructions can also be loaded onto a computer or other programmable data -37-201102842 processing device' to enable a series of operating steps on a computer or other programmable device to produce computer-implemented processing, thereby enabling the computer or other programmable The instructions executed on the device provide steps for implementing the functions specified in one or more blocks of the flowchart or in a block or blocks of the flowchart. While the preferred embodiment of the present invention has been described, it will be apparent to those of Therefore, the scope of the appended claims is intended to be construed as the preferred embodiment of the invention Fan Wei. Therefore, it is intended that the present invention cover the modifications and variations of the present invention. BRIEF DESCRIPTION OF THE DRAWINGS FIG. 1 is a schematic structural diagram of an information inquiry method according to an embodiment of the present application. FIG. 2 is a schematic structural diagram of an information inquiry apparatus according to an embodiment of the present application; FIG. 3 is a schematic structural diagram of a matching module according to an embodiment of the present application; 4 is a schematic structural diagram of a first query keyword acquisition module in the embodiment of the present application; FIG. 5 is a schematic flowchart of a word matching method in the embodiment of the present application; FIG. 6 is a schematic structural diagram of a word matching device according to an embodiment of the present application. -38- 201102842 [Description of main component symbols] 20 1 ··First query keyword acquisition module 202: User feedback log acquisition module 2 0 3 : Matching module 2 04 : Query result feedback module 2031: Content acquisition Unit 2 0 3 2 : word segmentation unit 2033: matching unit 2 0 1 1 : information content acquisition unit 2012: word segmentation/decomposition unit 2013: first query keyword determination unit 6 0 1 : to-be-matched word acquisition module 602: user Rewarding the acquisition module 603: matching module -39-