TWI486797B

TWI486797B - Methods and devices for sorting search results

Info

Publication number: TWI486797B
Application number: TW099106782A
Authority: TW
Original assignee: Alibaba Group Holding Ltd
Priority date: 2010-03-09
Filing date: 2010-03-09
Publication date: 2015-06-01
Also published as: TW201131395A

Description

Method and apparatus for sorting search results

本申請案係有關電腦資料處理技術領域，特別是指一種對搜索結果進行排序的方法和裝置。The application relates to the field of computer data processing technology, and in particular to a method and device for sorting search results.

在搜索引擎中，需要根據查詢字串的幾個詞在檢索結果(目標字串)中出現的位置距離來估計檢索結果與查詢字串的匹配程度，距離近的通常具有更高的匹配程度，因而獲得更加靠前的排名。例如查詢字串是“消毒機”，包含“消毒機”的檢索結果通常比“消毒工業洗衣機”更接近用戶的意圖，而後者又比“消毒設備、脫水器、烘乾機”更接近用戶的意圖，這都將影響檢索結果的排名。In the search engine, it is necessary to estimate the degree of matching between the search result and the query string according to the position distance of several words of the query string appearing in the search result (target string), and the distance is generally higher. Thus get a more advanced ranking. For example, the query string is a "disinfector", and the search result containing the "disinfector" is usually closer to the user's intention than the "disinfecting industrial washing machine", which is closer to the user than the "disinfection equipment, dehydrator, dryer". Intention, this will affect the ranking of the search results.

計算查詢字串的多個詞語在目標字串中的距離的一種習知實現方式是最小滑動視窗，亦即，在目標字串中尋找一個長度儘量小的區間，該區間中包含查詢字串的每一個字和詞，用這個區間的長度來描述查詢詞語在目標字串中的遠近。例如查詢字串是“我|看|風景”，目標字串是“我|在|橋|上|看|風景|，|看|風景|的|人|在|橋|下|看|我|。”(豎線代表分詞結果)則最小滑動窗口是“我|在|橋|上|看|風景”，長度為6個詞語。A conventional implementation for calculating the distance of a plurality of words of a query string in a target string is a minimum sliding window, that is, finding a segment having a length as small as possible in the target string, the interval containing the query string For each word and word, the length of the interval is used to describe the distance of the query term in the target string. For example, the query string is "I|Look|Landscape", the target string is "I|在|桥|上|看|风景|,|看|风景|的|人|在|桥|下|看|我| "The vertical line represents the result of the word segmentation." The minimum sliding window is "I | In | Bridge | On | Look | Landscape", with a length of 6 words.

另一種計算詞語長度的方法是編輯距離，跟最小滑動窗口不一樣的是，它並不是計算單一字串的詞語長度，而是計算兩個字串間的差異部分的長度之和。例如“我和你”和“大和小”差異部分共兩個詞(第一和第三個詞)，編輯距離為2。Another way to calculate the length of a word is to edit the distance. Unlike the minimum sliding window, it is not to calculate the length of the word of a single string, but to calculate the sum of the lengths of the difference between the two strings. For example, the difference between "I and you" and "big and small" is two words (first and third words), and the editing distance is 2.

目前，通常是根據長度或距離確定查詢字串和目標字串的匹配程度，也就是說，如果最小滑動窗口長度或編輯距離越小，則匹配程度越高，反之則匹配程度低。At present, the matching degree between the query string and the target string is usually determined according to the length or the distance, that is, if the minimum sliding window length or the editing distance is smaller, the matching degree is higher, and vice versa, the matching degree is low.

然而在某些情況下，簡單的長度或距離並不能準確地反映匹配程度。例如查詢字串是“諾基亞電池”，檢索結果A是“諾基亞電池”，B是“諾基亞手機，贈送電池”，C是“諾基亞n73手機原裝電池”。按照簡單的距離計算，A的“諾基亞”和“電池”之間的距離為0，匹配程度最好；B和C的“諾基亞”和“電池”之間的距離都是3個詞，匹配程度都不夠好。但是實際上C的“n73手機”是跟“諾基亞”強烈相關的詞語，“原裝”也是跟“電池”強烈相關的詞語，雖然中間都是間隔了3個詞，但是C的匹配程度比B高很多。However, in some cases, a simple length or distance does not accurately reflect the degree of matching. For example, the query string is "Nokia battery", the search result A is "Nokia battery", B is "Nokia mobile phone, free battery", C is "Nokia n73 mobile phone original battery". According to the simple distance calculation, the distance between A's "Nokia" and "Battery" is 0, and the matching degree is the best; the distance between "Nokia" and "Battery" of B and C is 3 words, matching degree Not good enough. But in fact, C's "n73 mobile phone" is a strong word related to "Nokia". "Original" is also a word strongly related to "battery". Although there are 3 words in the middle, the matching degree of C is higher than B. a lot of.

考慮不同詞語在距離計算上的不同影響，前人已有一些研究，例如可以根據詞性(POS)來設定詞語權重。但是這種根據詞性來設定權重的方法，仍舊過於簡單，沒有涉及一個本質問題，就是查詢字串和目標字串語義是否相關，因而得到的長度或距離不能準確地反映出查詢字串和目標字串的匹配程度，亦即，不能保證和查詢字串語義相關的目標字串被排在前面。Considering the different influences of different words on distance calculation, previous studies have been conducted, for example, word weights can be set according to part of speech (POS). However, this method of setting weights according to part of speech is still too simple. It does not involve an essential question, that is, whether the query string and the target string semantics are related, and thus the obtained length or distance cannot accurately reflect the query string and the target word. The degree of matching of the strings, that is, the target string associated with the query string semantics is not guaranteed to be ranked first.

本申請案提供一種對搜索結果進行排序的方法和裝置，透過查詢字串和目標字串的語義關聯度，能夠更準確地對目標字串進行排序，反映出各目標字串與查詢字串的匹配程度。The present application provides a method and apparatus for sorting search results. By querying the semantic relevance of a string and a target string, the target string can be more accurately sorted, reflecting the target string and the query string. Matching degree.

本申請案提供了一種對搜索結果進行排序的方法，包括：伺服器預先計算統計樣本中每兩個詞語之間的語義關聯權重，獲得並保存詞語權重表，所述方法還包括：伺服器接收用戶終端輸入的查詢字串，根據查詢字串而進行搜索並獲得目標字串；伺服器對所述查詢字串和目標字串分別進行分詞，將查詢字串的各分詞依次與目標字串的分詞兩兩組合；查詢詞語權重表，獲得每個分詞組合的權重值；及根據所述權重值而獲得加權詞語長度，根據所述加權詞語長度而對每個目標字串進行排序，並反饋給用戶終端。The present application provides a method for sorting search results, including: the server pre-calculates the semantic association weight between each two words in the statistical sample, obtains and saves the word weight table, and the method further includes: receiving by the server The query string input by the user terminal searches according to the query string and obtains the target string; the server separately separates the query string and the target string, and sequentially segments the word segment of the query string with the target string. Combining word segmentation two-two; querying a word weight table, obtaining a weight value of each word segment combination; and obtaining a weighted word length according to the weight value, sorting each target word string according to the weighted word length, and feeding back User terminal.

其中，所述伺服器預先計算統計樣本中每兩個詞語之間的語義關聯權重，獲得詞語權重表的步驟包括：伺服器獲取統計樣本；從所述統計樣本中選取第一詞語和第二詞語，統計所述第一詞語和第二詞語在統計樣本中共同出現的次數C(第一詞語，第二詞語)；統計第二詞語在統計樣本中出現的次數ΣC(Yi，第二詞語)，其中，所述Yi代表每個跟第二詞語共同出現的詞語；計算所述第一詞語在第二詞語出現條件下的概率P( 第一詞語|第二詞語)=C(第一詞語，第二詞語)/ΣC(Yi，第二詞語)；在查詢第二詞語時，取第一詞語與第二詞語的語義相關權重為W=1-P，其中，所述W為權重，所述P為第一詞語在第二詞語出現條件下的概率；及重複上述步驟，依次獲得所述統計樣本中每個詞語相對其他詞語的語義相關權重，獲得到詞語權重表。The server pre-calculates the semantic association weight between each two words in the statistical sample. The step of obtaining the word weight table includes: the server acquires a statistical sample; and selects the first word and the second word from the statistical sample. , counting the number C (first word, second word) in which the first word and the second word co-occur in the statistical sample; and counting the number of occurrences of the second word in the statistical sample ΣC (Yi, second word), Wherein the Yi represents each word that appears together with the second word; and calculates a probability P of the first word under the condition of occurrence of the second word ( First word|second word>=C (first word, second word)/ΣC(Yi, second word); when querying the second word, take the semantic correlation weight of the first word and the second word as W =1-P, wherein the W is a weight, the P is a probability that the first word appears under the condition of the second word; and repeating the above steps, sequentially obtaining the semantics of each word in the statistical sample relative to other words Related weights are obtained to the word weight table.

其中，所述統計樣本的來源包括任何形式的文本或符號，所述文本包括網頁文本、用戶搜索日誌、及用戶點擊日誌。Wherein, the source of the statistical sample includes any form of text or symbol, the text including webpage text, user search log, and user click log.

其中，所述加權詞語長度為最小滑動窗口加權長度；根據所述權重值而獲得加權詞語長度對每個目標字串進行排序的步驟包括：分別取目標字串的各個分詞在查詢字串各分詞的權重最小值；或者，分別取查詢字串的各個分詞在目標字串各分詞的權重最小值；對各個目標字串，根據所述權重最小值分別計算最小滑動窗口加權長度；及比較各目標字串的最小滑動視窗加權長度，長度小則排序在前，反之，排序在後。The weighted word length is a minimum sliding window weighting length; and the step of obtaining the weighted word length according to the weight value to sort each target string comprises: respectively taking each word segment of the target string in each part of the query string The minimum value of the weight; or, respectively, the minimum weight of each participle of the query string in the target string; for each target string, the minimum sliding window weighted length is calculated according to the minimum value of the weight; and each target is compared The minimum sliding window weighting length of the string, the length is small, the order is first, and vice versa, the sorting is later.

其中，計算每個目標字串的最小滑動視窗加權長度具體包括：最小滑動窗口加權長度其中，W表示權重，Ti表示目標字串中的第i個的分詞，k、h分別表示目標字串最小滑動視窗的起始位置和結束位置，Qj表示查詢字串中的第j個分詞，m表示查詢字串分詞的個數。The calculation of the minimum sliding window weighting length of each target string specifically includes: a minimum sliding window weighting length Where W represents the weight, Ti represents the i-th participle in the target string, k and h represent the start position and end position of the minimum sliding window of the target string, respectively, and Qj represents the j-th participle in the query string. m represents the number of query word segmentation words.

本申請案還提供了一種對搜索結果進行排序的方法，伺服器預先計算統計樣本中每兩個詞語之間的語義關聯權重，獲得並保存詞語權重表，所述方法還包括：伺服器接收用戶終端輸入的查詢字串，根據查詢字串而進行搜索並獲得目標字串；伺服器對所述查詢字串和目標字串分別進行分詞；伺服器根據所述存詞語權重表，計算插入的詞語相對查詢字串各分詞的權重最小值；伺服器根據所述存詞語權重表，計算刪除的詞語相對目標字串各分詞的權重最小值；及根據所述權重最小值計算總的編輯距離，根據所述總的編輯距離對每個目標字串進行排序，並反饋給用戶終端。The application also provides a method for sorting search results. The server pre-calculates the semantic association weight between each two words in the statistical sample, obtains and saves the word weight table, and the method further includes: the server receives the user The query input string is searched according to the query string and obtains the target string; the server separates the query string and the target string respectively; the server calculates the inserted words according to the stored word weight table Corresponding to the minimum weight of each word segment of the query string; the server calculates the minimum weight of each word segmentation of the deleted word relative to the target word string according to the stored word weight table; and calculates the total editing distance according to the minimum value of the weight, according to The total edit distance sorts each target string and feeds back to the user terminal.

其中，所述根據所述詞語權重表，計算插入的詞語相對查詢字串各分詞的權重最小值的步驟包括：根據詞語權重表，獲得插入的詞語相對查詢字串各分詞的權重值；及計算插入的詞語相對查詢字串各分詞的權重最小值為The step of calculating the minimum weight of each of the inserted words relative to the word segmentation of the query string according to the word weight table includes: obtaining a weight value of each of the inserted words relative to the query word according to the word weight table; and calculating The minimum weight of the inserted words relative to the word segmentation of the query string is

其中，W表示權重，I_t 表示插入字串中的第t個的分詞，n分別表示插入分詞的個數，Qj表示查詢字串中的第j個分詞，m表示查詢字串分詞的個數。Where W represents the weight, I _t represents the t-th part of the inserted string, n represents the number of inserted participles, Qj represents the j-th part of the query string, and m represents the number of the query-word participle .

其中，所述根據所述詞語權重表，計算刪除的詞語相對目標字串各分詞的權重最小值的步驟包括：根據詞語權重表，獲得刪除的詞語相對目標字串各分詞的權重值；計算刪除的詞語相對目標字串各分詞的權重最小值為According to the word weight table, the step of calculating the weight minimum value of the deleted word relative to each word segment of the target word string includes: obtaining, according to the word weight table, the weight value of each word segment of the deleted word relative to the target word string; The minimum weight of each word of the word relative to the target string is

其中，W表示權重，Ti表示目標字串中的第i個的分詞，q表示目標字串分詞的個數，D_d 表示刪除詞語中的第d個分詞，p表示刪除分詞的個數。Where W represents the weight, Ti represents the i-th part of the target string, q represents the number of target word segmentation, D _d represents the d-th part of the deleted word, and p represents the number of deleted participles.

其中，根據所述權重最小值計算總的編輯距離，對每個目標字串進行排序的步驟包括：對各個目標字串，分別確定總的編輯距離，所述總的編輯距離為：The step of sorting each target string according to the weight minimum value, the step of sorting each target string includes: determining, for each target string, a total edit distance, where the total edit distance is:

W_總 =W_I +W_D W _total = W _I + W _D

其中，W_總表示總的編輯距離，W_I 表示插入詞語相對查詢字串各分詞的權重最小值，W_D 表示刪除詞語相對目標字串各分詞的權重最小值；及比較各目標字串的總的編輯距離，總的編輯距離小則排序在前，反之，排序在後。Wherein, W _always represents the total edit distance, W _I represents the minimum weight of each word segmentation of the inserted word relative to the query string, W _D represents the minimum weight of each word segmentation of the deleted word relative target string; and compares the total of each target string The editing distance, the total editing distance is small, the sorting is in front, and vice versa, the sorting is in the back.

其中，在計算總的編輯距離長度之前，還包括：計算替換詞語的編輯距離的權重最小值；根據所述權重最小值而計算總的編輯距離，確定查詢字串和目標字串的匹配程度的步驟包括：對各個目標字串，分別確定總的編輯距離，所述總的編輯距離為：W_總 =W_I +W_D +W_C Before calculating the total edit distance length, the method further includes: calculating a weight minimum value of the edit distance of the replacement word; calculating a total edit distance according to the weight minimum value, determining a matching degree between the query string and the target string; The step includes: determining a total edit distance for each target string, the total edit distance is: W _total = W _I + W _D + W _C

其中，W_總表示總的編輯距離，W_I 表示插入詞語相對查詢字串各分詞的權重最小值，W_D 表示刪除詞語相對目標字串各分詞的權重最小值，W_C 表示替換詞語相對查詢字串和/或目標字串各分詞的權重最小值；及比較各目標字串的總的編輯距離，總的編輯距離小則排序在前，反之，排序在後。Where W _always represents the total edit distance, W _I represents the minimum weight of each word segmentation of the inserted word relative to the query string, W _D represents the minimum weight of each word segmentation of the deleted word relative to the target string, and W _C represents the replacement word relative query word. The minimum weight of each word segment of the string and/or target string; and the total editing distance of each target string is compared, and the total editing distance is small, and the order is first; otherwise, the sorting is followed.

其中，所述獲取替換詞語的編輯距離的權重最小值的方式包括：令替換詞語的編輯距離的權重最小值等於預設的固定值，或者，令替換詞語的編輯距離等於插入詞語相對查詢字串各分詞的權重最小值與刪除詞語相對目標字串各分詞的權重最小值之和，或平均值，或兩者中的最大值。The manner of obtaining the minimum weight of the edit distance of the replacement word includes: making the minimum value of the edit distance of the replacement word equal to the preset fixed value, or making the edit distance of the replacement word equal to the inserted word relative query string The sum of the minimum weight of each word segment and the minimum value of the weight of each word segmentation of the deleted word relative to the target word string, or the average value, or the maximum of the two.

本申請案還提供了一種對搜索結果進行排序的裝置，包括：詞語權重表獲取模組，用以計算統計樣本中每兩個詞語之間的語義關聯權重，獲得並保存詞語權重表；詞獲取模組，用以接收用戶終端輸入的查詢字串，根據查詢字串而進行搜索並獲得目標字串；分詞模組，用以在伺服器獲得查詢字串和目標字串後，對所述查詢字串和目標字串分別進行分詞；組合模組，用以將查詢字串的各分詞依次與目標字串的分詞兩兩組合，查詢模組，用以查詢所述詞語權重表，獲得每個分詞組合的權重值；及匹配模組，用以根據所述權重值而獲得加權詞語長度，對每個目標字串進行排序，並反饋給用戶終端。The application further provides an apparatus for sorting search results, comprising: a word weight table obtaining module, configured to calculate a semantic association weight between each two words in a statistical sample, obtain and save a word weight table; The module is configured to receive a query string input by the user terminal, perform a search according to the query string and obtain a target string; and a word segmentation module, configured to: after the server obtains the query string and the target string, the query The word string and the target string are respectively segmented; The combination module is configured to combine the word segments of the query string with the word segmentation of the target string in turn, and the query module is configured to query the word weight table to obtain the weight value of each word segment combination; and the matching module And obtaining a weighted word length according to the weight value, sorting each target string, and feeding back to the user terminal.

其中，所述詞語權重表獲取模組包括：樣本獲取模組，用以獲取統計樣本；The word weight table obtaining module includes: a sample obtaining module, configured to obtain a statistical sample;

第一統計模組，用以從所述統計樣本中選取第一詞語和第二詞語，統計所述第一詞語和第二詞語在統計樣本中共同出現的次數C(第一詞語，第二詞語)a first statistic module, configured to select a first word and a second word from the statistical sample, and count the number C (first word, second word) in which the first word and the second word co-occur in the statistical sample )

第二統計模組，用以統計第二詞語在統計樣本中出現的次數ΣC(Yi，第二詞語)，其中，所述Yi代表每個跟第二詞語共同出現的詞語；概率計算模組，用以計算所述第一詞語在第二詞語出現條件下的概率P(第一詞語|第二詞語)=C(第一詞語，第二詞語)/ΣC(Yi，第二詞語)a second statistical module, configured to count the number of occurrences of the second word in the statistical sample ΣC(Yi, second word), wherein the Yi represents each word that appears together with the second word; the probability calculation module, a probability P (first word|second word) = C (first word, second word) / ΣC (Yi, second word) for calculating the first word under the condition of occurrence of the second word

權重計算模組，用以在查詢第二詞語時，取第一詞語與第二詞語的語義相關權重為W=1-P，其中，所述W為權重，所述P為第一詞語在第二詞語出現條件下的概率；及產生模組，用以獲得所述統計樣本中每個詞語相對其他詞語的語義相關權重後，產生詞語權重表。a weight calculation module, configured to: when querying the second word, take a semantic correlation weight of the first word and the second word as W=1-P, where the W is a weight, and the P is a first word in the first The probability of occurrence of the two words; and generating a module for obtaining a semantic weight of each word in the statistical sample relative to other words, and generating a word weight table.

其中，當所述加權詞語長度為最小滑動視窗加權長度時，所述匹配模組包括：權重最小值獲取模組，用以分別取目標字串的各個分詞在查詢字串各分詞的權重最小值；或者，分別取查詢字串的各個分詞在目標字串各分詞的權重最小值；第一計算模組，用以對各個目標字串，根據所述權重最小值分別計算最小滑動窗口加權長度；及排序模組，用以比較各目標字串的最小滑動視窗加權長度，長度小則排序在前，反之，排序在後。Wherein, when the length of the weighted word is the minimum sliding window weighting length, the matching module comprises: a weight minimum obtaining module for respectively taking the weight minimum of each participle of the target string in the query string Or, respectively, taking the minimum weight of each participle of the query string in the target string; the first calculating module is configured to calculate the minimum sliding window weighting length according to the weight minimum for each target string; And a sorting module for comparing the minimum sliding window weighting length of each target string, the length is small, the sorting is first, and vice versa, the sorting is followed.

本申請案還提供了一種對搜索結果進行排序的裝置，包括：詞語權重表獲取模組，用以計算統計樣本中每兩個詞語之間的語義關聯權重，以獲得並保存詞語權重表；詞獲取模組，用以接收用戶終端輸入的查詢字串，根據查詢字串而進行搜索並獲得目標字串；分詞模組，用以在伺服器獲得查詢字串和目標字串後，對所述查詢字串和目標字串分別進行分詞；第一權重最小值計算模組，用以計算插入的詞語相對查詢字串各分詞的權重最小值；第二權重最小值計算模組，用以計算刪除的詞語相對目標字串各分詞的權重最小值；及匹配模組，用以根據所述權重最小值而計算總的編輯距離，對每個目標字串進行排序，並反饋給用戶終端。The application further provides an apparatus for sorting search results, comprising: a word weight table obtaining module, configured to calculate a semantic association weight between each two words in the statistical sample, to obtain and save a word weight table; Obtaining a module, configured to receive a query string input by the user terminal, perform a search according to the query string and obtain a target string; and a word segmentation module, after the server obtains the query string and the target string, The query string and the target string are respectively segmented; the first weight minimum calculation module is configured to calculate the minimum weight of the inserted words relative to the word segmentation of the query string; and the second weight minimum calculation module is used for calculating the deletion. And a matching module, configured to calculate a total editing distance according to the minimum value of the weight, sort each target string, and feed back to the user terminal.

其中，所述匹配模組包括：第一總編輯距離計算模組，用以對各個目標字串，分別確定總的編輯距離，所述總的編輯距離為：W_總 =W_I +W_D 其中，W_總表示總的編輯距離，W_I 表示插入詞語相對查詢字串各分詞的權重最小值，W_D 表示刪除詞語相對目標字串各分詞的權重最小值；及排序模組，用以比較各目標字串的總的編輯距離，總的編輯距離小則排序在前，反之，排序在後。The matching module includes: a first total editing distance calculation module, configured to determine a total editing distance for each target string, where the total editing distance is: W _total = W _I + W _D W _always indicates the total editing distance, W _I indicates the minimum weight of each word segmentation of the inserted words relative to the query string, W _D indicates the minimum weight of each word segmentation of the deleted words relative to the target string; and the sorting module is used to compare each The total edit distance of the target string. If the total edit distance is small, the sort is first. Otherwise, the sort is after.

其中，所述裝置還包括：第三權重最小值計算模組，用以在計算總的編輯距離長度之前，獲取替換詞語的編輯距離的權重最小值；所述匹配模組包括：第二總編輯距離計算模組，用以對各個目標字串，分別確定總的編輯距離，所述總的編輯距離為：W_總 =W_I +W_D +W_C 其中，W_總表示總的編輯距離，W_I 表示插入詞語相對查詢字串各分詞的權重最小值，W_D 表示刪除詞語相對目標字串各分詞的權重最小值，W_C 表示替換詞語相對查詢字串和/或目標字串各分詞的權重最小值；及排序模組，用以比較各目標字串的總的編輯距離，總的編輯距離小則排序在前，反之，排序在後。The device further includes: a third weight minimum calculation module, configured to obtain a minimum weight of the edit distance of the replacement word before calculating the total edit distance length; the matching module includes: a second chief editor The distance calculation module is configured to determine a total edit distance for each target string, and the total edit distance is: W _total = W _I + W _D + W _C where W _always indicates the total edit distance, W _I represents the minimum weight of each word segmentation of the inserted word relative to the query string, W _D represents the minimum weight of each word segmentation of the deleted word relative to the target string, and W _C represents the weight of the replacement word relative to the query word string and/or the word segmentation word segmentation word segmentation. The minimum value; and the sorting module is used to compare the total editing distance of each target string, and the total editing distance is small, and the sorting is in the front, and vice versa.

應用本申請案，相對於習知的簡單的詞語長度或距離的計算沒有考慮目標字串中的詞語跟查詢詞語的語義關聯程度，本申請案透過引入表示查詢字串和目標字串的語義關聯度的詞語權重，更準確地對目標字串進行排序，將與查詢字串語義相關的目標字串排在前面，反映出了各目標字串與查詢字串的匹配程度。在實際應用中應用簡單，且效果好。Applying the present application, the semantic association of the words in the target string with the query term is not considered in relation to the conventional simple term length or distance calculation, and the present application introduces a semantic association representing the query string and the target string. The weight of the words is more accurately sorted by the target string, and the target string related to the semantics of the query string is ranked in front, reflecting the degree of matching between each target string and the query string. In practical applications, the application is simple and the effect is good.

下面將結合本申請案之實施例中的附圖，對本申請案之實施例中的技術方案進行清楚、完整地描述，顯然，所描述的實施例僅僅是本申請案的一部分實施例，而不是全部的實施例。基於本申請案中的實施例，本領域普通技術人員在沒有作出創造性勞動前提下所獲得的所有其他實施例，都屬於本申請案之保護的範圍。The technical solutions in the embodiments of the present application will be clearly and completely described in conjunction with the accompanying drawings in the embodiments of the present application. It is obvious that the described embodiments are only a part of the embodiments of the present application, rather than All embodiments. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments of the present application without departing from the inventive scope are the scope of protection of the present application.

本申請案在計算詞語距離或詞語長度中加入了語義因素，考慮了查詢字串和目標字串之間的語義關聯，更佳地衡量了查詢字串和目標字串的匹配程度，使得搜索引擎中的搜索結果可以得到更合理的排名。當然，本申請案可以應用在任何計算字串匹配度的地方，並不局限於搜索引擎。In this application, a semantic factor is added in calculating the word distance or the length of the word, considering the semantic association between the query string and the target string, and the degree of matching between the query string and the target string is better measured, so that the search engine Search results in can get a more reasonable ranking. Of course, this application can be applied to any place where the string matching degree is calculated, and is not limited to the search engine.

由於本申請案考慮的字串之間的語義，因而需要每兩個詞語之間的語義關聯權重，下面首先說明如何獲得每兩個詞語之間的語義關聯權重，以獲得詞語權重表，參見圖1，具體包括如下步驟：步驟101，伺服器獲取統計樣本；該統計樣本的來源包括任何形式的文本或符號，其中，所述文本包括網頁文本、用戶搜索日誌、用戶點擊日誌等。Due to the semantics between the strings considered in this application, the semantic association weight between each two words is required. First, how to obtain the semantic association weight between each two words to obtain the word weight table, see the figure 1, the method includes the following steps: Step 101: The server obtains a statistical sample; the source of the statistical sample includes any form of text or symbol, wherein the text includes webpage text, a user search log, a user click log, and the like.

通常來說，如果統計樣本中第一詞語和第二詞語共同出現的次數越多，說明第一詞語和第二詞語越相關。例如，在文本中“諾基亞”和“手機”經常共同出現，或者用戶經常搜索“諾基亞”然後點擊了帶有“手機”的結果，都能在某種程度表示“諾基亞”和“手機”高度相關，因而如果用戶搜索“諾基亞”時，結果中含有“手機”對我們來說不是個意外。In general, the more the first word and the second word appear together in the statistical sample, the more relevant the first word and the second word are. For example, in the text "Nokia" and "mobile phone" often appear together, or users often search for "Nokia" and then click on the result with "mobile phone", can be said to some extent that "Nokia" and "mobile phone" are highly correlated Therefore, if the user searches for "Nokia", the result contains "mobile phone" is not an accident for us.

步驟102，從統計樣本中選取第一詞語和第二詞語，統計所述第一詞語和第二詞語在統計樣本中共同出現的次數C(第一詞語，第二詞語)；例如，統計“手機”和“諾基亞”的共現次數C(手機，諾基亞)，並且於是可以得出，最後輸出所有詞語(在搜索每個詞語時)的權重。Step 102: Select a first word and a second word from the statistical sample, and count the number C (first word, second word) in which the first word and the second word appear together in the statistical sample; for example, statistics “mobile phone The number of co-occurrences with "Nokia" C (mobile phone, Nokia), and then can be derived, and finally the weight of all words (when searching for each word).

步驟103，統計第二詞語在統計樣本中出現的次數ΣC(Yi，第二詞語)，其中，所述Yi代表每個跟第二詞語共同出現的詞語；例如，統計“諾基亞”和其他詞語共現的總次數即“諾基亞”的出現總次數)ΣC(Yi，諾基亞)，其中Yi代表每個跟“諾基亞”共現的詞語。Step 103: Count the number of occurrences of the second word in the statistical sample ΣC(Yi, second word), wherein the Yi represents each word that appears together with the second word; for example, the statistics "Nokia" and other words The total number of times is the total number of occurrences of "Nokia") ΣC (Yi, Nokia), where Yi represents each word that is co-occurring with "Nokia".

步驟104，計算第一詞語在第二詞語出現條件下的概率P(第一詞語|第二詞語)=C(第一詞語，第二詞語)/ΣC(Yi，第二詞語)；例如，可以得到“手機”在“諾基亞”出現條件下的概率P(手機|諾基亞)=C(手機，諾基亞)/ΣC(Yi，諾基亞)。Step 104: Calculate a probability P (first word|second word)=C (first word, second word)/ΣC (Yi, second word) of the first word under the condition of occurrence of the second word; for example, Get the probability of "mobile phone" in the presence of "Nokia" P (Mobile | Nokia) = C (Mobile, Nokia) / Σ C (Yi, Nokia).

步驟105，當查詢第二詞語時，取第一詞語與第二詞語的語義相關權重為W=1-P；其中，W為權重，P為第一詞語在第二詞語出現條件下的概率。Step 105: When querying the second word, taking the semantic relevance weight of the first word and the second word as W=1-P; wherein W is a weight, and P is a probability that the first word appears under the condition that the second word appears.

例如，取W=1-P作為查詢“諾基亞”時，“手機”和“諾基亞”的語義相關權重。For example, take W=1-P as the semantically related weight of “mobile phone” and “Nokia” when querying “Nokia”.

本例中權重採用的是1減去第一詞語在第二詞語出現下的條件概率，在其他實施例中也可以採用其他方式表示權重，如直接用P作為權重等等。In this example, the weight is 1 minus the conditional probability that the first word appears in the second word. In other embodiments, the weight may be expressed in other ways, such as directly using P as the weight.

步驟106，判斷統計樣本中是否所有詞語都處理完畢，是則執行步驟107，否則重複上述步驟，依次獲得所述統計樣本中每個詞語相對其他詞語的語義相關權重，步驟107，輸出包含統計樣本中每個詞語相對其他詞語的語義相關權重，以獲得到詞語權重表。Step 106: Determine whether all words in the statistical sample are processed. If yes, execute step 107. Otherwise, repeat the above steps to sequentially obtain semantic relevance weights of each word in the statistical sample relative to other words. In step 107, the output includes statistical samples. The semantic relevance weight of each word in relation to other words to obtain a word weight table.

例如，詞語權重表的其中一種可能的形式可以如表1所示：For example, one of the possible forms of the word weight table can be as shown in Table 1:

需要說明的是，表1所示詞語權重表僅僅是一具體實施例，在實際應用中詞語權重表還可以有其他的表現形式，這裏，並不對詞語權重表的表現形式進行限定。It should be noted that the word weight table shown in Table 1 is only a specific embodiment. In practice, the word weight table may have other forms of expression. Here, the expression form of the word weight table is not limited.

至此，獲得了詞語權重表，亦即獲得了在查詢第二詞語時第一詞語的權重。At this point, the word weight table is obtained, that is, the weight of the first word when the second word is queried is obtained.

需要說明的是，詞語權重的獲取可以使用任何方式，圖1所示僅為透過統計語言模型而獲得到的統計概率一具體實施例而已，在實際應用中還可以採用其他方式獲取，如任何自動計算或人工設定的方式，在此，並不對獲取詞語權重表的方式進行限定。It should be noted that the word weight can be obtained in any manner. FIG. 1 is only a specific embodiment obtained by using the statistical language model. In actual applications, other methods can also be used, such as any automatic The manner of calculation or manual setting, here, does not limit the way in which the word weight table is obtained.

圖2是根據本申請案實施例的一種對搜索結果進行排序的方法流程圖，具體包括以下步驟：步驟201，伺服器獲得查詢字串和目標字串。FIG. 2 is a flowchart of a method for sorting search results according to an embodiment of the present application, which specifically includes the following steps: Step 201: A server obtains a query string and a target string.

其中，查詢字串通常是用戶輸入的，目標字串通常是伺服器經檢索後得到的與查詢字串相關的字串，例如，查詢字串亦即用戶輸入的是“諾基亞電池”，伺服器檢索後獲得到的目標字串是A“諾基亞電池”，B“諾基亞手機，贈送電池”，C“諾基亞n73手機原裝電池”，則上述透過檢索而獲得到的A、B、C都是目標字串。本申請案實施例的目的就是判斷各目標字串(如檢索結構A、B、C)與查詢字串的匹配程度。也就是說，伺服器接收用戶終端輸入的查詢字串，根據查詢字串而進行搜索並獲得目標字串。The query string is usually input by the user, and the target string is usually a string related to the query string obtained by the server after searching, for example, the query string, that is, the user input is “Nokia battery”, the server The target string obtained after the retrieval is A "Nokia battery", B "Nokia mobile phone, free battery", C "Nokia n73 mobile phone original battery", then the above A, B, C obtained through the search are the target words string. The purpose of the embodiment of the present application is to determine the degree of matching between each target string (such as the retrieval structure A, B, C) and the query string. That is to say, the server receives the query string input by the user terminal, searches according to the query string and obtains the target string.

在本實施例中，以查詢字串為“諾基亞電池”，目標字串為C“諾基亞n73手機原裝電池”為例進行說明。對於目標字串A“諾基亞電池”和B“諾基亞手機，贈送電池”與目標字串C“諾基亞n73手機原裝電池”的處理過程基本相同，不再詳述。In this embodiment, the query string is “Nokia battery”, and the target string is C “Nokia n73 mobile phone original battery” as an example for description. For the target string A "Nokia battery" and B "Nokia mobile phone, free battery" and the target string C "Nokia n73 mobile phone original battery" processing process is basically the same, no longer detailed.

步驟202，伺服器對所述查詢字串和目標字串分別進行分詞，獲得到構成查詢字串的分詞和構成目標字串的分詞。Step 202: The server separately separates the query string and the target string to obtain a word segment constituting the query string and a word segment constituting the target string.

這裏，令查詢字串為Q，目標字串為T，對查詢字串分詞後可得到Q1Q2...Qm，對目標字串分詞後可得到T1T2...Tn。在本實施例中，對查詢字串分詞以後得到：Q1Q2=諾基亞|電池，對目標字串分詞後得到T1T2T3T4T5=諾基亞|n73|手機|原裝|電池。Here, the query string is Q, the target string is T, Q1Q2...Qm can be obtained after segmentation of the query string, and T1T2...Tn can be obtained after segmentation of the target string. In this embodiment, after the word segmentation of the query string is obtained: Q1Q2=Nokia|Battery, after the target word segmentation, T1T2T3T4T5=Nokia|n73|Mobile|Original|Battery.

本申請案中的分詞可以是對字串任意方法的切分，可以分成語言意義上的詞，也可以是分成單字或字母、符號等等。The word segmentation in the present application may be a segmentation of any method of the string, may be divided into words in the linguistic sense, or may be divided into single words or letters, symbols, and the like.

步驟203，將查詢字串的各分詞依次與目標字串的分詞兩兩組合，獲得到多個由一個查詢字串分詞和一個目標字串分詞所構成的分詞組合；具體上，獲得到(Ti，Q1)、(Ti，Q2)...(Ti，Qm)。Step 203: Combine each word segment of the query string with the word segmentation of the target string in turn, and obtain a plurality of word segment combinations consisting of a query word segmentation word and a target word segment word segment; specifically, obtaining (Ti , Q1), (Ti, Q2)... (Ti, Qm).

本實施例中得到的分詞組合包括：(T1，Q1)、(T1，Q2)、(T2，Q1)、(T2，Q2)、(T3，Q1)、(T3，Q2)、(T4，Q1)、(T4，Q2)、(T5，Q1)、(T5，Q2)。The word segment combinations obtained in this embodiment include: (T1, Q1), (T1, Q2), (T2, Q1), (T2, Q2), (T3, Q1), (T3, Q2), (T4, Q1). ), (T4, Q2), (T5, Q1), (T5, Q2).

步驟204，查詢詞語權重表，獲得每個分詞組合的權重值；這裏，令W表示權重，則根據權重表得到的每個分詞組合的權重值為：W(T1，Q1)、W(T1，Q2)、W(T2，Q1)、W(T2，Q2)、W(T3，Q1)、W(T3，Q2)、W(T4，Q1)、W(T4，Q2)、W(T5，Q1)、W(T5，Q2)。Step 204: Query a word weight table to obtain a weight value of each word segment combination; here, let W denote a weight, and each word segment obtained according to the weight table The combined weight values are: W(T1, Q1), W(T1, Q2), W(T2, Q1), W(T2, Q2), W(T3, Q1), W(T3, Q2), W ( T4, Q1), W(T4, Q2), W(T5, Q1), W(T5, Q2).

令W(T1，Q1)=W1 W(T1，Q2)=W1’Let W(T1,Q1)=W1 W(T1,Q2)=W1’

W(T2，Q1)=W2 W(T2，Q2)=W2’W(T2,Q1)=W2 W(T2,Q2)=W2’

W(T3，Q1))=W3 W(T3，Q2)=W3’W(T3,Q1))=W3 W(T3,Q2)=W3’

W(T4，Q1)=W4 W(T4，Q2)=W4’W(T4,Q1)=W4 W(T4,Q2)=W4’

W(T5，Q1)=W5 W(T5，Q2)=W5’W(T5,Q1)=W5 W(T5,Q2)=W5’

其中，若Ti在Q中，則取Wi=0，例如，T1為諾基亞，Q1也為諾基亞，則W(T1，Q1)=W1=0，同理，W(T5，Q2)=W5’=0。Wherein, if Ti is in Q, then Wi=0, for example, T1 is Nokia, Q1 is also Nokia, then W(T1, Q1)=W1=0, similarly, W(T5, Q2)=W5'= 0.

步驟205，根據所述權重值而獲得加權詞語長度；在本實施例中，加權詞語長度為最小滑動視窗加權長度，此時，步驟205具體包括以下步驟：Step 205: Obtain a weighted word length according to the weight value. In this embodiment, the weighted word length is a minimum sliding window weighting length. In this case, step 205 specifically includes the following steps:

i)分別獲取目標字串的各個分詞與查詢字串各分詞的權重最小值；或者，分別獲取查詢字串的各個分詞與目標字串各分詞的權重最小值；由於獲取目標字串的各個分詞與查詢字串各分詞的權重最小值和獲取查詢字串的各個分詞與目標字串各分詞的權重最小值的處理過程非常相似，下面僅以獲取目標字串的各個分詞與查詢字串各分詞的權重最小值為例進行說明。i) respectively obtaining the minimum weights of each participle of the target string and each participle of the query string; or, respectively, obtaining the minimum weight of each participle of the query string and each participle of the target string; since each participle of the target string is obtained It is very similar to the processing of the minimum value of the weights of each word segmentation of the query string and the weight of each word segmentation of the target word string and the word segmentation of the target word string. The following only the word segmentation of the target word string and the word segmentation of the query string are obtained. The minimum weight is explained as an example.

具體到上述實施例，亦即需要獲取T1相對Q1和Q2的兩個權重中的最小值，T2相對Q1和Q2的兩個權重中的最小值，……這裏，假設W(T1，Q1)和W(T1，Q2)的權重最小值為W1，W(T2，Q1)和W(T2，Q2)的權重最小值為W2，W(T3，Q1)和W(T3，Q2)的權重最小值為W3，W(T4，Q1)和W(T4，Q2)的權重最小值為W4，W(T5，Q1)和W(T5，Q2)的權重最小值為W5’。Specifically to the above embodiment, that is, it is necessary to obtain the minimum of the two weights of T1 with respect to Q1 and Q2, and the minimum of T2 with respect to the two weights of Q1 and Q2, ... Here, assume W(T1, Q1) and The minimum weights of W(T1, Q2) are W1, the minimum weights of W(T2, Q1) and W(T2, Q2) are the minimum weights of W2, W(T3, Q1) and W(T3, Q2). The minimum weights for W3, W(T4, Q1) and W(T4, Q2) are W4, and the minimum weights of W(T5, Q1) and W(T5, Q2) are W5'.

ii)對各個目標字串，根據所述權重最小值而分別計算最小滑動窗口加權長度；確定每個目標字串的最小滑動視窗加權長度具體包括：最小滑動窗口加權長度，其中，W表示權重，Ti表示目標字串中的第i個的分詞，k、h分別表示目標字串最小滑動視窗的起始位置和結束位置，Qj表示查詢字串中的第j個分詞，m表示查詢字串分詞的個數。Ii) for each target string, respectively calculating a minimum sliding window weighting length according to the minimum value of the weight; determining a minimum sliding window weighting length of each target string specifically includes: a minimum sliding window weighting length Where W represents the weight, Ti represents the i-th part of the target string, k and h represent the start and end positions of the minimum sliding window of the target string, respectively, and Qj represents the j-th participle in the query string. , m represents the number of query word segmentation.

對於上述實施例，最小滑動窗口加權長度ΣWi=W1+W2+W3+W4+W5’重複上述步驟202至205，可以得到查詢字串相對各個目標字串的最小滑動視窗加權長度。For the above embodiment, the minimum sliding window weighting length ΣWi=W1+W2+W3+W4+W5' repeats the above steps 202 to 205, and the minimum sliding window weighting length of the query string with respect to each target string can be obtained.

步驟206，根據所述加權詞語長度而確定查詢字串和目標字串的匹配程度，亦即根據所述加權詞語長度對每個目標字串進行排序，並反饋給用戶終端。Step 206: Determine a matching degree of the query string and the target string according to the weighted word length, that is, sort each target string according to the weighted word length, and feed back to the user terminal.

具體上，比較各目標字串的最小滑動視窗加權長度，所述長度越小則匹配程度越高，反之，匹配程度越低，也即長度越小則排序越靠前，反之，排序越靠後。Specifically, the minimum sliding window weighting length of each target string is compared. The smaller the length, the higher the matching degree. On the contrary, the lower the matching degree, that is, the smaller the length, the higher the ranking, and vice versa. .

至此，確定了查詢字串與各目標字串之間的匹配程度。傳統的簡單的詞語長度的計算沒有考慮目標字串中的詞語跟查詢詞語的語義關聯程度，因而得到的詞語長度不能準確地反映查詢和目標的匹配程度。如“諾基亞電池”和“諾基亞n73手機原裝電池”，雖然長度差異很大，但是如果查詢詞語是“諾基亞電池”的情況下，兩者沒有很大實質區別。本申請案透過引入表示查詢字串和目標字串的語義關聯度的詞語權重，更準確地對目標字串進行排序，將與查詢字串語義相關的目標字串排在前面，反映出了各目標字串與查詢字串的匹配程度。在實際應用中應用簡單，且效果好。So far, the degree of matching between the query string and each target string is determined. The traditional simple word length calculation does not consider the semantic relevance of the words in the target string to the query words, and thus the length of the obtained words cannot accurately reflect the degree of matching between the query and the target. Such as "Nokia battery" and "Nokia n73 mobile phone original battery", although the length varies greatly, but if the query word is "Nokia battery", there is no significant difference between the two. By introducing word weights indicating the semantic relevance of the query string and the target string, the present application more accurately sorts the target string, and ranks the target string related to the query string semantics in front, reflecting each The degree to which the target string matches the query string. In practical applications, the application is simple and the effect is good.

圖3是根據本申請案實施例的另一種對搜索結果進行排序的方法流程圖，本實施例基於編輯距離計算查詢字串和目標字串之間的差異，其中，編輯距離是指從一個字串變化到另一個字串最少需要的基本操作次數，或理解為兩個字串差異部分的長度之和。通常的基本操作包括插入一個字/詞，刪除一個字/詞，替換一個字/詞，或者其他根據需要而設的操作。例如從“我愛你”變化到“我不愛她”至少需要插入一個“不”、將“你”替換成“她”兩次基本操作，因此兩者的編輯距離為2，同理，“隱形的翅膀”和“好吃的雞翅膀”編輯距離為3。圖3所示流程具體上包括以下步驟：步驟301，伺服器獲得查詢字串和目標字串。FIG. 3 is a flowchart of another method for sorting search results according to an embodiment of the present application. The present embodiment calculates a difference between a query string and a target string based on an edit distance, wherein the edit distance refers to a word from one word. The minimum number of basic operations required to change a string to another string, or the sum of the lengths of the difference between the two strings. Common basic operations include inserting a word/word, deleting a word/word, replacing a word/word, or other operations as needed. For example, from "I love you" to "I don't love her" at least need to insert a "no", replace "you" with "her" two basic operations, so the editing distance of the two is 2, the same reason, " The invisible wings and the delicious chicken wings are edited at a distance of 3. The process shown in FIG. 3 specifically includes the following steps: Step 301, the server obtains a query string and a target string.

其中，查詢字串通常是用戶輸入的，目標字串通常是伺服器經檢索後得到的與查詢字串相關的字串。例如，查詢字串是“諾基亞手機電池”，目標字串是“原裝諾基亞手機電池”和“諾基亞手機，贈送電池”。也就是說，伺服器接收用戶終端輸入的查詢字串，根據查詢字串進行搜索並獲得目標字串。The query string is usually input by the user, and the target string is usually a string related to the query string obtained by the server after being retrieved. For example, the query string is "Nokia mobile phone battery", the target string is "original Nokia mobile phone battery" and "Nokia mobile phone, free battery." That is to say, the server receives the query string input by the user terminal, searches according to the query string and obtains the target string.

本申請案實施例的目的就是判斷各目標字串與查詢字串的匹配程度。The purpose of the embodiment of the present application is to determine the degree of matching between each target string and the query string.

在本實施例中，以查詢字串為“諾基亞手機電池”，目標字串為“原裝諾基亞手機電池”為例進行說明。對於目標字串“諾基亞手機，贈送電池”，由於其與目標字串“原裝諾基亞手機電池”的處理過程基本相同，不再詳述。In this embodiment, the query string is “Nokia mobile phone battery”, and the target string is “original Nokia mobile phone battery” as an example for description. For the target string "Nokia mobile phone, free battery", because it is basically the same as the target string "original Nokia mobile phone battery", it will not be detailed.

步驟302，伺服器對所述查詢字串和目標字串分別進行分詞，得到構成查詢字串的分詞和構成目標字串的分詞。Step 302: The server separately separates the query string and the target string to obtain a word segment constituting the query string and a word segment constituting the target string.

這裏，令查詢字串為Q，目標字串為T，對查詢字串分詞後可得到Q1Q2...Qm，對目標字串分詞後可得到T1T2...Tn。在本實施例中，對查詢字串分詞以後得到：Q1Q2Q3=諾基亞|手機|電池，對目標字串分詞後得到T1T2T3=原裝|諾基亞|電池。Here, the query string is Q, the target string is T, Q1Q2...Qm can be obtained after segmentation of the query string, and T1T2...Tn can be obtained after segmentation of the target string. In this embodiment, after the word segmentation is performed, the Q1Q2Q3=Nokia|phone|battery is obtained, and the target word string is divided to obtain T1T2T3=original|Nokia|battery.

步驟303，伺服器根據所述詞語權重表，計算插入的詞語相對查詢字串各分詞的權重最小值；具體上，根據詞語權重表，獲得插入的詞語相對查詢字串各分詞的權重值，在本例中，插入了“原裝”一詞，令插入的詞為I，則可以得到插入的詞語相對查詢字串各分詞的權重值：W(I1，Q1)、W(I1，Q2)、W(I1，Q3)；計算插入的詞語相對查詢字串各分詞的權重最小值為：Step 303: The server calculates, according to the word weight table, a minimum weight of each of the inserted words relative to the word segment of the query string; specifically, according to the word weight table, obtaining the weight value of each of the inserted words relative to the query word segment, In this example, the word "original" is inserted so that the inserted word is I, and the weight value of the inserted word relative to the word segmentation of the query string can be obtained: W(I1, Q1), W(I1, Q2), W (I1, Q3); Calculate the minimum weight of the inserted words relative to the word segmentation of the query string:

步驟304，根據詞語權重表，計算刪除的詞語相對目標字串各分詞的權重最小值；具體的，根據詞語權重表，獲得刪除的詞語相對目標字串各分詞的權重值，在本例中，刪除了“手機”一詞，令刪除的詞為D，則可以得到刪除的詞語相對目標字串各分詞的權重值：W(D1，T1)、W(D1，T2)、W(D1，T3)；計算刪除的詞語相對查詢字串各分詞的權重最小值為Step 304: Calculate, according to the word weight table, a minimum weight of each word segment of the deleted word relative to the target word string; specifically, according to the word weight table, obtain the weight value of each word segment of the deleted word relative to the target word string, in this example, Delete the word "mobile phone", so that the deleted word is D, you can get the weight value of the deleted words relative to the target word segmentation: W (D1, T1), W (D1, T2), W (D1, T3) ); calculating the minimum weight of the deleted words relative to the word segmentation of the query string is

步驟305，根據所述權重最小值計算總的編輯距離，確定查詢字串和目標字串的匹配程度，亦即根據所述總的編輯距離對每個目標字串進行排序，並反饋給用戶終端。Step 305: Calculate a total edit distance according to the weight minimum value, determine a matching degree of the query string and the target string, that is, sort each target string according to the total edit distance, and feed back to the user terminal. .

具體上，對各個目標字串，分別確定總的編輯距離，對於一個目標字串的總編輯距離為：Specifically, for each target string, the total edit distance is determined, and the total edit distance for a target string is:

W_總 =W_I +W_D W _total = W _I + W _D

其中，W_總表示總的編輯距離，W_I 表示插入詞語相對查詢字串各分詞的權重最小值，W_D 表示刪除詞語相對目標字串各分詞的權重最小值；比較各目標字串的總的編輯距離，所述總的編輯距離越小則匹配程度越高，反之，匹配程度越低，也即總的編輯距離越小則排序越靠前，反之，排序越靠後。Where W _always represents the total edit distance, W _I represents the minimum weight of each word segmentation of the inserted words relative to the query string, W _D represents the minimum weight of each word segmentation of the deleted words relative to the target string; and the total of each target string is compared Editing distance, the smaller the total editing distance is, the higher the matching degree is. On the contrary, the lower the matching degree, that is, the smaller the total editing distance is, the higher the ranking is, and vice versa.

至此，確定了查詢字串與各目標字串的匹配程度。傳統的簡單的詞語距離的計算沒有考慮目標字串中的詞語跟查詢詞語的語義關聯程度，因而得到的詞語距離不能準確地反映查詢和目標的匹配程度。本申請案透過引入表示查詢字串和目標字串的語義關聯度的詞語權重，更準確地對目標字串進行排序，將與查詢字串語義相關的目標字串排在前面，反映出了各目標字串與查詢字串的匹配程度。在實際應用中應用簡單，且效果好。So far, the degree of matching between the query string and each target string is determined. The traditional simple word distance calculation does not consider the degree of semantic association between the words in the target string and the query words, and thus the resulting word distance cannot accurately reflect the degree of matching between the query and the target. By introducing word weights indicating the semantic relevance of the query string and the target string, the present application more accurately sorts the target string, and ranks the target string related to the query string semantics in front, reflecting each The degree to which the target string matches the query string. In practical applications, the application is simple and the effect is good.

需要說明的是，對於圖3所示實施例，還存在詞語替換的情況，例如將“我和你”變為“我和他”時，其中的“你”可認為是被“他”替換，這裏，對詞語替換的情況可以做如下處理：方式一：將替換操作認為是增加和刪除操作的組合，亦即認為替換操作是不存在的，例如，將“我和你”變為“我和他”時，認為是刪除了“你”，增加了“他”，亦即所有的變換都是插入和刪除操作，因而，應用圖3所示實施例可以很好的解決。It should be noted that, for the embodiment shown in FIG. 3, there is still a case of word replacement. For example, when "I and you" are changed to "I and him", "you" can be considered as being replaced by "he". Here, the case of word substitution can be handled as follows: Method 1: The replacement operation is considered as a combination of the addition and deletion operations, that is, the replacement operation is considered to be non-existent, for example, "I and you" is changed to "I and When he is, he thinks that "you" is deleted, and "he" is added, that is, all transformations are insertion and deletion operations. Therefore, the embodiment shown in Fig. 3 can be well solved.

方式二，將替換操作視為除了插入和刪除之外的第三種操作，例如，將“我和你”變為“我和他”時，認為是將“你”替換為“他”，此時，需要計算替換詞語的編輯距離的權重最小值，具體可以有兩種計算方法：a)替換詞語的編輯距離的權重最小值等於預設的固定值，如，令替換詞語的編輯距離的權重最小值固定的等於1；或者，b)令替換詞語的編輯距離等於插入詞語相對查詢字串各分詞的權最小重值與刪除詞語相對目標字串各分詞的權重最小值之和，或者，令替換詞語的編輯距離等於插入詞語相對查詢字字串各分詞的權重最小值與刪除詞語相對目標字串各分詞的權重最小值之和的平均值，或者，令替換詞語的編輯距離等於插入詞語相對查詢字串各分詞的權重最小值與刪除詞語相對目標字串各分詞的權重最小值兩種中的最大值，或其他任意組合形式。In the second way, the replacement operation is regarded as a third operation other than insertion and deletion. For example, when "I and you" is changed to "I and him", it is considered to replace "you" with "he". When calculating the minimum value of the edit distance of the replacement word, there are two calculation methods: a) the weight of the edit distance of the replacement word is equal to the preset fixed value, for example, the weight of the edit distance of the replacement word The minimum value is fixed to be equal to 1; or, b) the edit distance of the replacement word is equal to the sum of the weight minimum value of the inserted word relative to each word segment of the query word and the weight minimum value of each word segment of the deleted word relative target word string, or The edit distance of the replacement word is equal to the average of the sum of the minimum weight of each word segmentation of the inserted word relative to the query word string and the minimum value of the weight of each word segment of the deleted word relative target word string, or the edit distance of the replacement word is equal to the inserted word relative Query the minimum value of the weight of each participle of the word string and the minimum value of the weight of each participle of the deleted word relative to the target word string, or any other combination .

例如，替換詞語“他”的編輯距離=插入的“他”相對查詢字串“我和你”的各分詞的權重最小值+刪除詞語“你”相對目標字串“我和他”各分詞的權重最小值；或者，替換詞語“他”的編輯距離=(插入的“他”相對查詢字串“我和你”的各分詞的權重最小值+刪除詞語“你”相對目標字串“我和他”各分詞的權重最小值)/2。等等。For example, the edit distance of the word "he" is replaced = the minimum value of the weight of the "he" relative to the query word "I and you" of the inserted word + the word "you" relative to the target word "me and he" The minimum value of the weight; or, the edit distance of the word "he" = (the minimum value of the weight of each participle of the "he" relative query string "I and you" inserted + the word "you" relative target string "I and His "the minimum weight of each participle"/2. and many more.

在方式二的情況下，步驟305具體包括：對各個目標字串，分別確定總的編輯距離，所述總的編輯距離為：In the case of the second mode, the step 305 specifically includes: determining, for each target string, a total edit distance, where the total edit distance is:

W_總 =W_I +W_D +W_C W _total = W _I + W _D + W _C

其中，W_總表示總的編輯距離，W_I 表示插入詞語相對查詢字串各分詞的權重最小值，W_D 表示刪除詞語相對目標字串各分詞的權重最小值，W_C 表示替換詞語相對查詢字串和/或目標字串各分詞的權重最小值；比較各目標字串的總的編輯距離，所述總的編輯距離越小則匹配程度越高，反之，匹配程度越低，也即總的編輯距離越小則排序越靠前，反之，排序越靠後。Where W _always represents the total edit distance, W _I represents the minimum weight of each word segmentation of the inserted word relative to the query string, W _D represents the minimum weight of each word segmentation of the deleted word relative to the target string, and W _C represents the replacement word relative query word. The minimum weight of each word segment of the string and/or the target string; comparing the total editing distance of each target string, the smaller the total editing distance, the higher the matching degree, and vice versa, the lower the matching degree, that is, the total The smaller the edit distance, the higher the sorting. On the contrary, the lower the sort.

需要說明的是，可以交錯地根據查詢字串和目標字串計算權重，如圖3所示實施例中，對於插入的字串，根據查詢字串計算權重，對於刪除的字串，根據目標字串計算權重。It should be noted that the weights may be calculated according to the query string and the target string in an interleaved manner. In the embodiment shown in FIG. 3, for the inserted string, the weight is calculated according to the query string, and for the deleted string, according to the target word. The string calculates the weight.

需要說明的是，對於圖2和圖3所示實施例，分詞可以是對字串任意方法的切分，可以分成語言意義上的詞，也可以是分成單字或字母、符號。It should be noted that, for the embodiment shown in FIG. 2 and FIG. 3, the word segmentation may be a segmentation of any method of the string, and may be divided into words in the language sense, or may be divided into single words or letters and symbols.

需要說明的是，對於圖2和圖3所示實施例，可以對權重進行任何形式的計算或變換，比如取對數等；也可以取目標詞語對各個查詢詞語的權重的最大值、平均值或其他形式的運算作為該詞的權重(加權長度)。It should be noted that, for the embodiment shown in FIG. 2 and FIG. 3, any form of calculation or transformation of the weight may be performed, such as taking a logarithm or the like; or the maximum value, average value of the weight of the target word for each query term or Other forms of operation are used as weights (weighted lengths) for the word.

需要說明的是，對於圖2和圖3所示實施例，可以反過來將目標字串作為查詢字串，將查詢字串作為目標字串，不會產生本質區別。It should be noted that, for the embodiment shown in FIG. 2 and FIG. 3, the target string can be used as the query string and the query string as the target string without substantial difference.

需要說明的是，對於圖2和圖3所示實施例，詞語距離或長度的計算區間可以是整個字串或根據演算法選定的任意區間，如選定某字串中跟另一個字串差異的部分。It should be noted that, for the embodiment shown in FIG. 2 and FIG. 3, the calculation interval of the word distance or length may be the entire string or any interval selected according to the algorithm, such as selecting a difference between another string and another string. section.

需要說明的是，匹配方法不一定要使用最小滑動窗口或編輯距離，可以是任何關於加權詞語距離或詞語長度的計算。It should be noted that the matching method does not have to use the minimum sliding window or the editing distance, and can be any calculation about the weighted word distance or the word length.

需要說明的是，本申請案並不局限應用於檢索系統如搜索引擎，也可以應用於任何計算兩個字串匹配程度的系統。It should be noted that the present application is not limited to a retrieval system such as a search engine, and can be applied to any system for calculating the degree of matching of two strings.

本申請案還揭示了一種對搜索結果進行排序的裝置，參見圖4，具體包括：詞語權重表獲取模組401，用以計算統計樣本中每兩個詞語之間的語義關聯權重，獲得並保存詞語權重表；詞獲取模組402，用以接收用戶終端輸入的查詢字串，根據查詢字串而進行搜索並獲得目標字串；分詞模組403，用以在伺服器獲得查詢字串和目標字串後，對所述查詢字串和目標字串分別進行分詞；組合模組404，用以將查詢字串的各分詞依次與目標字串的分詞兩兩組合；查詢模組405，用以查詢所述詞語權重表，獲得每個分詞組合的權重值；匹配模組406，用以根據所述權重值獲得加權詞語長度，對每個目標字串進行排序，並反饋給用戶終端。The application also discloses an apparatus for sorting search results. Referring to FIG. 4, the method further includes: a word weight table obtaining module 401, configured to calculate a semantic association weight between each two words in the statistical sample, and obtain and save a word weighting module 402, configured to receive a query string input by a user terminal, perform a search according to the query string and obtain a target string; a word segmentation module 403, configured to obtain a query string and a target on the server After the string, the query string and the target string are separately segmented; the combination module 404 is configured to combine the word segments of the query string with the word segmentation of the target string in sequence; the query module 405 is configured to Querying the word weight table to obtain a weight value of each word segment combination; the matching module 406 is configured to obtain a weighted word length according to the weight value, sort each target word string, and feed back to the user terminal.

上述詞語權重表獲取模組401可以具體包括：樣本獲取模組，用以獲取統計樣本；The word weight table obtaining module 401 may specifically include: a sample obtaining module, configured to obtain a statistical sample;

權重計算模組，用以在查詢第二詞語時，取第一詞語與第二詞語的語義相關權重為W=1-P，其中，所述W為權重，所述P為第一詞語在第二詞語出現條件下的概率；產生模組，用以獲得所述統計樣本中每個詞語相對其他詞語的語義相關權重後，產生詞語權重表。a weight calculation module, configured to: when querying the second word, take a semantic correlation weight of the first word and the second word as W=1-P, where the W is a weight, and the P is a first word in the first The probability of occurrence of the two words; generating a module for obtaining a semantic weight of each word in the statistical sample relative to other words, and generating a word weight table.

當所述加權詞語長度為最小滑動視窗加權長度時，上述匹配模組405可以具體包括：權重最小值獲取模組，用以分別取目標字串的各個分詞在查詢字串各分詞的權重最小值；或者，分別取查詢字串的各個分詞在目標字串各分詞的權重最小值；第一計算模組，用以對各個目標字串，根據所述權重最小值分別計算最小滑動窗口加權長度；排序模組，用以比較各目標字串的最小滑動視窗加權長度，長度小則排序在前，反之，排序在後，也即長度越小時判定匹配程度越高，反之，判定匹配程度越低。When the length of the weighted word is the minimum sliding window weighting length, the matching module 405 may specifically include: The weight minimum obtaining module is configured to respectively obtain the minimum weight of each participle of the target string in the query string; or, respectively, the weight of each participle of the query string in the target string; a computing module for calculating a minimum sliding window weighting length according to the weight minimum for each target string; a sorting module for comparing the minimum sliding window weighting length of each target string, and the length is small Before, on the contrary, after sorting, that is, the smaller the length, the higher the degree of matching is determined, and conversely, the lower the degree of matching is determined.

應用圖4所示實施例，透過引入表示查詢字串和目標字串的語義關聯度的詞語權重，更準確地反映出了各目標字串與查詢字串的匹配程度。在實際應用中應用簡單，且效果好。Applying the embodiment shown in FIG. 4, by introducing word weights indicating the semantic relevance of the query string and the target string, the degree of matching between the target string and the query string is more accurately reflected. In practical applications, the application is simple and the effect is good.

本申請案實施例還提供了一種對搜索結果進行排序的裝置，參見圖5，包括：詞語權重表獲取模組501，用以計算統計樣本中每兩個詞語之間的語義關聯權重，獲得並保存詞語權重表；詞獲取模組502，用以接收用戶終端輸入的查詢字串，根據查詢字串進行搜索並獲得目標字串；分詞模組503，用以在伺服器獲得查詢字串和目標字串後，對所述查詢字串和目標字串分別進行分詞；第一權重最小值計算模組504，用以計算插入的詞語相對查詢字串各分詞的權重最小值；第二權重最小值計算模組505，用以計算刪除的詞語相對目標字串各分詞的權重最小值；匹配模組506，用以根據所述權重最小值計算總的編輯距離，對每個目標字串進行排序，並反饋給用戶終端。The embodiment of the present application further provides an apparatus for sorting search results. Referring to FIG. 5, the method includes: a word weight table obtaining module 501, configured to calculate a semantic association weight between each two words in the statistical sample, and obtain The word weighting module 502 is configured to receive a query string input by the user terminal, perform a search according to the query string, and obtain a target string; the word segmentation module 503 is configured to obtain the query string and the target on the server. After the string, the query string and the target string are respectively segmented; the first weight minimum calculation module 504 is configured to calculate the minimum weight of the inserted words relative to each part of the query string; the second weight minimum The calculation module 505 is configured to calculate a minimum weight of each of the deleted words relative to the target word segment; the matching module 506 is configured to calculate a total edit distance according to the weight minimum value, and sort each target string. And feedback to the user terminal.

上述匹配模組506可以具體包括：第一總編輯距離計算模組，用以對各個目標字串，分別確定總的編輯距離，所述總的編輯距離為：W_總 =W_I +W_D 其中，W_總表示總的編輯距離，W_I 表示插入詞語相對查詢字串各分詞的權重最小值，W_D 表示刪除詞語相對目標字串各分詞的權重最小值；排序模組，用以比較各目標字串的總的編輯距離，總的編輯距離小則排序在前，反之，排序在後，也即總的編輯距離越小時判定匹配程度越高，反之，判定匹配程度越低。The matching module 506 may specifically include: a first total editing distance calculation module, configured to determine a total editing distance for each target string, where the total editing distance is: W _total = W _I + W _D W _always indicates the total editing distance, W _I indicates the minimum weight of each word segmentation of the inserted words relative to the query string, W _D indicates the minimum weight of each word segmentation of the deleted words relative to the target string; and the ranking module is used to compare the targets. The total editing distance of the string, the total editing distance is small, the ranking is first, and vice versa, that is, the smaller the total editing distance is, the higher the matching degree is. Otherwise, the lower the matching degree is.

圖5所述裝置還可以包括：第三權重最小值計算模組，用以在計算總的編輯距離長度之前，獲取替換詞語的編輯距離的權重最小值；此時，上述匹配模組505可以具體包括：第二總編輯距離計算模組，用以對各個目標字串，分別確定總的編輯距離，所述總的編輯距離為：W_總 =W_I +W_D +W_C 其中，W_總表示總的編輯距離，W_I 表示插入詞語相對查詢字串各分詞的權重最小值，W_D 表示刪除詞語相對目標字串各分詞的權重最小值，W_C 表示替換詞語相對查詢字串和/或目標字串各分詞的權重最小值；排序模組，用以比較各目標字串的總的編輯距離，總的編輯距離小則排序在前，反之，排序在後，也即總的編輯距離越小時判定匹配程度越高，反之，判定匹配程度越低。The device of FIG. 5 may further include: a third weight minimum calculation module, configured to obtain a minimum weight of the edit distance of the replacement word before calculating the total edit distance length; at this time, the matching module 505 may be specific The method includes: a second total edit distance calculation module, configured to determine a total edit distance for each target string, where the total edit distance is: W _total = W _I + W _D + W _C where W is _always represented The total edit distance, W _I represents the minimum weight of each word segmentation of the inserted word relative to the query string, W _D represents the minimum weight of each word segmentation of the deleted word relative to the target string, and W _C represents the replacement word relative to the query string and/or target The minimum weight of each word segment of the string; the sorting module is used to compare the total editing distance of each target string, and the total editing distance is small, and the sorting is backward, that is, the total editing distance is smaller. It is determined that the degree of matching is higher, and conversely, the degree of matching is determined to be lower.

應用圖5所示裝置，透過引入表示查詢字串和目標字串的語義關聯度的詞語權重，更準確地反映出了各目標字串與查詢字串的匹配程度。在實際應用中應用簡單，且效果好。By applying the device shown in FIG. 5, the degree of matching between each target string and the query string is more accurately reflected by introducing word weights indicating the semantic relevance of the query string and the target string. In practical applications, the application is simple and the effect is good.

需要說明的是，為了描述的方便，描述以上裝置時以功能分為各種模組分別描述。當然，在實施本申請案時可以把各模組的功能在同一個或多個軟體和/或硬體中實現。It should be noted that, for the convenience of description, the above devices are described by function into various modules separately. Of course, the functions of each module can be implemented in the same software or software and/or hardware in the implementation of the present application.

需要說明的是，對於系統實施例而言，由於其基本相似於方法實施例，所以描述的比較簡單，相關之處參見方法實施例的部分說明即可。It should be noted that, for the system embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant parts can be referred to the description of the method embodiment.

需要說明的是，在本文中，諸如第一和第二等之類的關係術語僅僅用來將一個實體或者操作與另一個實體或操作區分開來，而不一定要求或者暗示這些實體或操作之間存在任何這種實際的關係或者順序。而且，術語“包括”、“包含”或者其任何其他變體意在涵蓋非排他性的包含，從而使得包括一系列要素的過程、方法、物品或者設備不僅包括那些要素，而且還包括沒有明確列出的其他要素，或者是還包括為這種過程、方法、物品或者設備所固有的要素。在沒有更多限制的情況下，由語句“包括一個……”限定的要素，並不排除在包括所述要素的過程、方法、物品或者設備中還存在另外的相同要素。It should be noted that, in this context, relational terms such as first and second are used merely to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply such entities or operations. There is any such actual relationship or order between them. Furthermore, the term "comprises" or "comprises" or "comprises" or any other variations thereof is intended to encompass a non-exclusive inclusion, such that a process, method, article, or device that comprises a plurality of elements includes not only those elements but also Other elements, or elements that are inherent to such a process, method, item, or device. An element that is defined by the phrase "comprising a ..." does not exclude the presence of additional equivalent elements in the process, method, item, or device that comprises the element.

透過以上的實施方式的描述可知，本領域的技術人員可以清楚地瞭解到本申請案可借助軟體加必需的通用硬體平臺的方式來實現。基於這樣的理解，本申請案的技術方案本質上或者說對現有技術做出貢獻的部分可以以軟體產品的形式體現出來，該電腦軟體產品可以儲存在儲存媒體中，如ROM/RAM、磁碟、光碟等，包括若干指令用以使得一台電腦設備(可以是個人電腦，伺服器，或者網路設備等)執行本申請案之各個實施例或者實施例的某些部分所述的方法。It will be apparent to those skilled in the art from the above description of the embodiments that the present application can be implemented by means of a software plus a necessary general hardware platform. Based on such understanding, the technical solution of the present application may be embodied in the form of a software product in essence or in a contribution to the prior art, and the computer software product may be stored in a storage medium such as a ROM/RAM or a disk. , optical discs, etc., including instructions for causing a computer device (which may be a personal computer, server, or network device, etc.) to perform the methods described in various embodiments of the present application or portions of the embodiments.

本申請案可用於許多通用或專用的計算系統環境或配置中。例如：個人電腦、伺服器電腦、手持設備或攜帶型設備、平板型設備、多處理器系統、基於微處理器的系統、置頂盒、可編程的消費電子設備、網路PC、小型電腦、大型電腦、包括以上任何系統或設備的分散式計算環境等等。This application can be used in many general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, handheld or portable devices, tablet devices, multiprocessor systems, microprocessor-based systems, set-top boxes, programmable consumer electronics, network PCs, small computers, large Computer, decentralized computing environment including any of the above systems or devices, and so on.

本申請案可以在由電腦執行的電腦可執行指令的一般上下文中描述，例如程式模組。一般地說，程式模組包括執行特定任務或實現特定抽象資料類型的常式、程式、物件、元件、資料結構等等。也可以在分散式計算環境中實踐本申請案，在這些分散式計算環境中，由透過通信網路而被連接的遠端處理設備來執行任務。在分散式計算環境中，程式模組可以位於包括儲存設備在內的本地和遠端電腦儲存媒體中。The application can be described in the general context of computer-executable instructions executed by a computer, such as a program module. Generally, a program module includes routines, programs, objects, components, data structures, and the like that perform particular tasks or implement particular abstract data types. The present application can also be practiced in a decentralized computing environment in which tasks are performed by remote processing devices that are coupled through a communication network. In a distributed computing environment, program modules can be located in local and remote computer storage media, including storage devices.

以上所述僅為本申請案的較佳實施例而已，並非用於限定本申請案的保護範圍。凡在本申請案的精神和原則之內所作的任何修改、等同替換、改進等，均包含在本申請案的保護範圍內。The above description is only the preferred embodiment of the present application, and is not intended to limit the scope of protection of the present application. Any modifications, equivalent substitutions, improvements, etc. made within the spirit and principles of the present application are included in the scope of the present application.

401．．．詞語權重表獲取模組401. . . Word weight table acquisition module

402．．．詞獲取模組402. . . Word acquisition module

403．．．分詞模組403. . . Word segmentation module

404．．．組合模組404. . . Combination module

405．．．查詢模組405. . . Query module

406．．．匹配模組406. . . Matching module

501．．．詞語權重表獲取模組501. . . Word weight table acquisition module

502．．．詞獲取模組502. . . Word acquisition module

503．．．分詞模組503. . . Word segmentation module

504．．．第一權重最小值計算模組504. . . First weight minimum calculation module

505．．．第二權重最小值計算模組505. . . Second weight minimum calculation module

506．．．匹配模組506. . . Matching module

為了更清楚地說明本申請案之實施例中的技術方案，下面將對實施例中所需要使用的附圖作簡單地介紹，顯而易見地，下面描述中的附圖僅僅是本申請案的一些實施例，對於本領域普通技術人員來講，在不付出創造性勞動的前提下，還可以根據這些附圖獲得其他的附圖。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings to be used in the embodiments will be briefly described below. Obviously, the drawings in the following description are only some implementations of the present application. For example, other drawings may be obtained from those of ordinary skill in the art in light of the inventive work.

圖1是根據本申請案實施例的獲得詞語權重表的流程圖；1 is a flow chart of obtaining a word weight table according to an embodiment of the present application;

圖2是根據本申請案實施例的一種對搜索結果進行排序的方法流程圖；2 is a flow chart of a method for sorting search results according to an embodiment of the present application;

圖3是根據本申請案實施例的另一種對搜索結果進行排序的方法流程圖；3 is a flow chart of another method for sorting search results according to an embodiment of the present application;

圖4是根據本申請案實施例的一種對搜索結果進行排序的裝置示意圖；4 is a schematic diagram of an apparatus for sorting search results according to an embodiment of the present application;

圖5是根據本申請案實施例的另一種對搜索結果進行排序的裝置示意圖。FIG. 5 is a schematic diagram of another apparatus for sorting search results according to an embodiment of the present application.

Claims

A method for sorting search results, wherein the server pre-calculates the semantic association weight between each two words in the statistical sample to obtain and save the word weight table, the method further comprising: the server receiving the user terminal Entering a query string, searching according to the query string and obtaining a target string; the server separately segmentes the query string and the target string, and sequentially segments the word segment of the query string with the target word The word segmentation of the string is combined; the word weight table is queried to obtain the weight value of each word segment combination; and the weighted word length is obtained according to the weight value, and each target word string is sorted according to the weighted word length, and feedback Giving the user terminal, wherein the weighted word length is a minimum sliding window weighting length; obtaining the weighted word length according to the weighting value to sort each target word string comprises: respectively taking each word segment of the target string The minimum weight of each word segment of the query string; or, each word segment of the query string is respectively scored in the target string The minimum value of the weight; for each target string, the minimum sliding window weighting length is calculated according to the minimum value of the weight; and the minimum sliding window weighting length of each target string is compared, and the length is small, the ranking is prior; The calculation of the minimum sliding window weighting length of each target string specifically includes: the minimum sliding window weighting length Where W represents the weight, Ti represents the i-th part of the target string, k and h represent the start position and end position of the minimum sliding window of the target string, respectively, and Qj represents the jth in the query string. The word segmentation, m indicates the number of word segmentation of the query string.

The method of claim 1, wherein the server pre-calculates a semantic association weight between each two words in the statistical sample to obtain a word weight table, the method comprising: obtaining, by the server, a statistical sample; Selecting the first word and the second word in the statistical sample to count the number C (first word, second word) in which the first word and the second word co-occur in the statistical sample; counting the second word in The number of occurrences ΣC(Yi, second word) appearing in the statistical sample, wherein the Yi represents each word that appears together with the second word; and the probability P of the first word under the condition of occurrence of the second word is calculated ( First word|second word>=C (first word, second word)/ΣC (Yi, second word); when querying the second word, taking the semantic relationship between the first word and the second word The weight is W=1-P, where W is a weight, the P is the probability that the first word appears under the condition of the second word; and the above steps are repeated, and each word in the statistical sample is sequentially obtained relative to other words. Semantic related weights to get to the word Weight table.

The method of claim 2, wherein the statistic The source of the sample includes any form of text or symbol, including web page text, user search logs, and user click logs.

A method for sorting search results, wherein the server pre-calculates the semantic association weight between each two words in the statistical sample to obtain and save the word weight table, the method further comprising: the server receiving the user terminal Inputting a query string, searching according to the query string and obtaining a target string; the server separately segmenting the query string and the target string; the server calculates the inserted according to the stored word weight table The minimum weight of the word relative to each participle of the query string; the server calculates the minimum weight of the deleted word relative to each participle of the target string according to the word weight table; and calculates the total editing distance according to the minimum value of the weight And sorting each target string according to the total editing distance, and feeding back to the user terminal, wherein, according to the word weight table, calculating a minimum weight of the inserted words relative to each participle of the query string The step includes: obtaining, according to the word weight table, a weight value of the inserted word relative to each word segment of the query string; and calculating the insertion The right words to the query string relative weight of each word of a minimum of Where W represents the weight, I _t represents the t-th part of the inserted string, n represents the number of inserted participles, Qj represents the j-th part of the query string, and m represents the part of the query string Number.

The method of claim 4, wherein, according to the word weight table, the step of calculating a minimum value of the weight of the deleted word relative to each participle of the target string comprises: obtaining the deleted word according to the word weight table a weight value of each participle of the target string; and a minimum weight of the word of the deleted word relative to each part of the target string is Where W represents the weight, Ti represents the i-th part of the target string, q represents the number of the target word segmentation, D _d represents the d-th part of the deleted word, and p represents the part of the deleted participle number.

The method of claim 4, wherein the total edit distance is calculated according to the weight minimum value, and the step of sorting each target string comprises: determining the total edit for each target string separately Distance, the total editing distance is: W _total = W _I + W _D where W _always represents the total editing distance, W _I represents the minimum weight of the inserted words relative to the word segmentation of the query string, W _D represents the deletion The minimum weight of the words relative to the word segmentation of the target string; and comparing the total editing distance of each target string, the total editing distance is small, and the sorting is followed by the sorting.

The method of claim 4, wherein before calculating the total edit distance length, further comprising: calculating a weight minimum value of the edit distance of the replacement word; calculating the total edit distance according to the weight minimum value The step of determining the degree of matching between the query string and the target string includes: determining, for each target string, the total edit distance, the total edit distance is: W _total = W _I + W _D + W _C, where W _always represents the total edit distance, W _I represents the minimum weight of the inserted words relative to each participle of the query string, and W _D represents the minimum weight of the deleted words relative to each part of the target string, W _C represents a minimum weight of the replacement word relative to each of the query string and/or the target word segmentation; and comparing the total edit distance of each target string, the total edit distance is less prior to sorting, and vice versa, Rear.

The method of claim 7, wherein the calculating the minimum weight of the edit distance of the replacement word comprises: making the minimum value of the edit distance of the replacement word equal to a preset fixed value, or The edit distance of the replacement word is equal to a sum of a minimum value of the weight of the inserted word relative to each participle of the query string and a minimum value of the weight of the deleted word relative to each part of the target string, or an average value, or a maximum of the two .

An apparatus for sorting search results, comprising: a word weight table obtaining module, configured to calculate a semantic association weight between each two words in a statistical sample, obtain and save a word weight table; and a word acquisition module The method is configured to receive a query string input by the user terminal, perform a search according to the query string, and obtain a target string; the word segmentation module is configured to: after the server obtains the query string and the target string, the query The word string and the target word string are respectively segmented; the combination module is configured to combine the word segments of the query string with the word segmentation of the target string in turn; the query module is configured to query the word weight table to obtain a weighting value of each word segment combination; and a matching module for obtaining a weighted word length according to the weight value, sorting each target word string, and feeding back to the user terminal, wherein when the weighted word length is minimum When the window weighting length is swiped, the matching module includes: a weight minimum obtaining module for respectively taking weights of each participle of the target string in each part of the query string a small value; or, respectively, a minimum weight of each participle of the query string in each target word segment; a first computing module, configured to calculate the minimum for each target string according to the minimum value of the weight a sliding window weighting length; and a sorting module for comparing the minimum sliding window weighting length of each target string, the length is small, the sorting is first, and vice versa, after sorting, wherein the minimum sliding window of each target string is calculated The weighted length specifically includes: the minimum sliding window weighted length Where W represents the weight, Ti represents the i-th part of the target string, k and h represent the start position and end position of the minimum sliding window of the target string, respectively, and Qj represents the jth in the query string. The word segmentation, m indicates the number of word segmentation of the query string.

The device of claim 9, wherein the word weight table acquisition module comprises: a sample acquisition module, configured to acquire the statistical sample; and a first statistical module, configured to select the statistical sample from the statistical sample a word and a second word, a second statistical module for counting the number C (first word, second word) of the first word and the second word co-occurring in the statistical sample, for counting the second word The number of occurrences ΣC(Yi, second word) in the statistical sample, wherein the Yi represents each word that appears together with the second word; the probability calculation module is configured to calculate the first word in the second word a probability P (first word|second word)=C (first word, second word)/ΣC (Yi, second word) weight calculation module under the condition of occurrence, when querying the second word, Taking the semantic correlation weight of the first word and the second word as W=1-P, wherein the W is a weight, and the P is a probability that the first word appears under the condition of the second word; and generating a module To obtain the semantic relevance of each word in the statistical sample relative to other words After re-generating the word weight table.

An apparatus for sorting search results, comprising: a word weight table obtaining module, configured to calculate a semantic association weight between each two words in a statistical sample, obtain and save a word weight table; and a word acquisition module The method is configured to receive a query string input by the user terminal, perform a search according to the query string, and obtain a target string; the word segmentation module is configured to: after the server obtains the query string and the target string, the query The word string and the target string are respectively segmented; the first weight minimum calculation module is configured to calculate a minimum weight of the inserted words relative to each word segment of the query string; and the second weight minimum calculation module is configured to calculate The minimum weight of the deleted words relative to the word segmentation of the target string; and the matching module, configured to calculate the total editing distance according to the minimum value of the weight, sort each target string, and feed back to the user terminal, wherein According to the word weight table, the step of calculating the minimum weight of the inserted words relative to each participle of the query string includes: obtaining the insertion according to the word weight table Right opposite the query word string weight value of each word; and calculating the weights of the query words inserted opposite of each word string of the minimum weight Where W represents the weight, I _t represents the t-th part of the inserted string, n represents the number of inserted participles, Qj represents the j-th part of the query string, and m represents the part of the query string Number.

The device of claim 11, wherein the matching module comprises: a first total editing distance calculation module, configured to determine the total editing distance for each target string, the total editing distance, Is: W _total = W _I + W _D where W _always represents the total edit distance, W _I represents the minimum weight of the inserted words relative to the word segmentation of the query string, and W _D represents the deleted word relative to the target string The minimum weight of each word segment; and a sorting module for comparing the total editing distance of each target string, the total editing distance is small, and the sorting is followed by the sorting.

The device of claim 11, wherein the device further comprises: a third weight minimum calculation module, configured to obtain a minimum weight of the edit distance of the replacement word before calculating the total edit distance length The matching module includes: a second total editing distance calculation module, configured to determine the total editing distance for each target string, the total editing distance is: W _total = W _I + W _D + W _C Wherein, W _always represents the total edit distance, W _I represents the minimum weight of the inserted words relative to each participle of the query string, and W _D represents the minimum weight of the deleted words relative to the word segmentation of the target string, W _C represents a minimum weight of the replacement word relative to each of the query string and/or the target word segmentation; and a sorting module for comparing the total edit distance of each target string, the total edit distance being less than Before, on the contrary, sorting is after.