TWM523902U

TWM523902U - Search engine device for collecting keyword

Info

Publication number: TWM523902U
Application number: TW105200646U
Authority: TW
Inventors: 蕭政華
Original assignee: 信義房屋仲介股份有限公司
Priority date: 2016-01-15
Filing date: 2016-01-15
Publication date: 2016-06-11

Description

Search engine device that can collect keywords

本創作關於一種搜尋引擎裝置，特別是關於一種可收集關鍵字關鍵字並提供推薦清單的搜尋引擎裝置。 This creation relates to a search engine device, and more particularly to a search engine device that collects keyword keywords and provides a list of recommendations.

隨著網際網路的普及，搜尋引擎已成為人們日常工作、生活中不可或缺的一項技術，其中「Google Suggest」這個技術更受到廣泛矚目。「Google Suggest」是當用戶輸入查詢關鍵字後，在進行搜尋之前，搜尋引擎會自動產生一個框體來顯示其他常用的相關關鍵字，以做為推薦清單供使用者進一步點選。此技術不但提醒用戶其他高度相關的關鍵字，更能幫助用戶快速找到想要找的資訊，提高搜尋效率。 With the popularity of the Internet, search engines have become an indispensable technology in people's daily work and life, and the "Google Suggest" technology has attracted more attention. "Google Suggest" is that after the user enters the query keyword, the search engine will automatically generate a frame to display other commonly used related keywords before the search, as a list of recommendations for the user to further select. This technology not only reminds users of other highly relevant keywords, but also helps users quickly find the information they are looking for and improve their search efficiency.

由於網路搜尋的便利性，頗具規模的房地產仲介業者也開始投入房地產搜尋引擎的研發，以便使用者能快速地從成千上萬筆房地產資料中找到所需的資訊，「Google Suggest」的技術也開始成為房地產搜尋引擎的重要部份。由於房地產物件數多，其內容資訊也繁雜，且特定資訊的熱門程度會隨著時間改變，因此單純以關鍵字關聯度為依據產生的推薦清單中，容易將一些對多數用戶來說意義不大的關鍵字放在框體上方，導致推薦清單無法充分推薦出當前最熱門關鍵字來符合用戶需求。舉例來說，當輸入「南港」時，在推薦清單中從上到下依序出現「南港國宅」、「南港公園」、「南港軟體園區」、「南港中研院」、「南港車站」、「南港路一段」等，但是當前多數用戶最常搜尋且最關注的「南港路一段」卻被列在推薦清單的偏下方，由於通常最上方的關鍵字最容易被用戶注意並點選，故若是熱門關鍵字沒有擺在推薦清單中的顯目位置，就容易被用戶忽略或是要花更多時間查找，使得推薦清單的便利性降低。 Due to the convenience of online search, large-scale real estate agents have also begun to invest in the research and development of real estate search engines, so that users can quickly find the information they need from thousands of real estate materials. The technology of "Google Suggest" It has also become an important part of the real estate search engine. Due to the large number of real estate objects, the content information is also complicated, and the popularity of specific information will change with time. Therefore, it is easy to make some meanings to most users in the recommendation list based on the keyword relevance. The keywords are placed above the box, so the recommendation list can't fully recommend the current hottest keywords to meet the user's needs. For example, when When entering "Nangang", "Nangang National House", "Nangang Park", "Nangang Software Park", "Nangang Institute of Research", "Nangang Station", "Nangang Road Section", etc. appear in the list of recommendations from top to bottom. However, the “Nangang Road Section” that most users are most frequently searching for and most concerned about is listed below the recommendation list. Since the topmost keyword is usually the most noticeable and clicked by the user, if the popular keyword is not placed, In the prominent position in the recommendation list, it is easy to be ignored by the user or it takes more time to find, which makes the convenience of the recommendation list lower.

因此，如何針對收集用戶常使用的熱門關鍵字，以產生更具精確性與便利性的推薦清單，成為本領域刻不容緩的問題。 Therefore, how to collect popular keywords that users often use to produce more accurate and convenient recommendation lists has become an urgent issue in the field.

本創作申請人鑑於習知技術中的不足，經過悉心試驗與研究，並一本鍥而不捨之精神，終構思出本創作「可收集關鍵字的搜尋引擎裝置」，且能夠克服先前技術的不足，以下為本創作之簡要說明。 In view of the deficiencies in the prior art, the author of this creation has carefully tested and researched and has a perseverance spirit to finally conceive the creation of a search engine device that can collect keywords and overcome the deficiencies of the prior art. A brief description of this creation.

本創作的目的在於針對房地產物件資訊，記錄用戶所點選或搜尋的字串及對應的累積點選次數、累積搜尋次數及關聯之房地產物件數等屬性資料，藉由屬性資料大小及其變化來辨識出關鍵字的熱門程度與代表性，並可據此附加不同權重於推薦清單中的關鍵字，以決定並即時更新關鍵字在推薦清單中的順序，來產生更精確且便利的推薦清單，提高搜尋效率。 The purpose of this creation is to record the attributes selected by the user or the corresponding number of cumulative clicks, the cumulative number of searches and the number of associated real estate objects, based on the real estate object information, by the size of the attribute data and its changes. Identify the popularity and representativeness of the keyword, and add different weights to the keywords in the recommendation list to determine and instantly update the order of the keywords in the recommendation list to produce a more accurate and convenient recommendation list. Improve search efficiency.

本創作提供一種可收集關鍵字的搜尋引擎裝置，包括：一資料庫，被配置用於儲存複數第一關鍵字、以及每個第一關鍵字則具有一屬性資料，該屬性資料至少包含有一累計被點選次數與一累積被搜尋次數；一分析模組，被配置用於分析一資料來源中的一字串，並在分析出該字串具有房地產相關性時，將該字串定義成該第一關鍵字並儲存至該資料庫中；一關連性模組，被配置用於利用一詞彙間關聯性分析技術，建立起該複數第一關鍵字之間的關聯性，使得每個第一關鍵字相關於複數第二關鍵字，並記錄在一關連性資料中，該複數第二關鍵字來自於該第一關鍵字，而沿用該第一關鍵字所具有的該屬性資料；一排序模組，被配置用於依據各該複數第二關鍵字之該累計被點選次數或/和該累積被搜尋次數而決定每個第一關鍵字相關的各該複數第二關鍵字之一初始排列順序；一通訊模組，接收來自一用戶所輸入之一搜尋字串；以及一搜尋模組，被配置用於接收該搜尋字串，並找出符合該搜尋字串的該第一關鍵字以及被該關連性模組定義為有關該複數第二關鍵字，以依據該排序模組所決定的該初始排列順序顯示該複數第二關鍵字做為一推薦清單。 The present invention provides a search engine device that can collect keywords, including: a database configured to store a plurality of first keywords, and each of the first keywords having an attribute data, the attribute data including at least one accumulated The number of times selected and the number of times the accumulated search is performed; an analysis module configured to analyze a string in a data source and analyze the string When there is real estate relevance, the string is defined as the first keyword and stored in the database; a related module is configured to establish the first number by using an inter-lexical correlation analysis technique Correlation between keywords, such that each first keyword is related to a plurality of second keywords, and is recorded in a related information, the plural second keyword is derived from the first keyword, and the first keyword is used a property data of a keyword; a sorting module configured to determine each first keyword according to the cumulative number of times of the plurality of the second keywords or/and the cumulative number of times of the search Corresponding one of each of the plurality of second keywords is initially arranged; a communication module receives a search string input from a user; and a search module configured to receive the search string and find Determining the first keyword that matches the search string and defining the second keyword by the relevance module to display the plural second keyword according to the initial ranking order determined by the sorting module for A list of recommendations.

1‧‧‧搜尋引擎裝置 1‧‧‧Search engine unit

10‧‧‧資料庫 10‧‧‧Database

11‧‧‧房地產物件資料庫 11‧‧ Real estate object database

12‧‧‧房地產相關詞彙資料庫 12‧‧‧Real estate related vocabulary database

15‧‧‧欄位 15‧‧‧ field

16‧‧‧搜尋框 16‧‧‧Search box

17‧‧‧字串 17‧‧‧ string

18‧‧‧推薦清單 18‧‧‧Recommended list

20‧‧‧分析模組 20‧‧‧Analysis module

30‧‧‧關聯性模組 30‧‧‧Affinity module

35‧‧‧排序模組 35‧‧‧Sorting module

40‧‧‧通訊模組 40‧‧‧Communication module

50‧‧‧搜尋模組 50‧‧‧Search Module

X₀~X_n、Y₀~Y₂‧‧‧關鍵字 X ₀ ~X _n , Y ₀ ~Y ₂ ‧‧‧ keywords

D‧‧‧語意距離 D‧‧‧ semantic distance

第一圖是本創作之一實施例之搜尋引擎裝置之方塊圖。 The first figure is a block diagram of a search engine device of one embodiment of the present creation.

第二圖是本創作之一實施例之搜尋狀態之示意圖。 The second figure is a schematic diagram of the search state of one embodiment of the present creation.

第三圖是本創作之一實施例之推薦清單點選狀態之示意圖。 The third figure is a schematic diagram of the selection list selection state of one embodiment of the present creation.

第四圖是本創作之一實施例之關鍵字暫訂排列順序之示意圖。 The fourth figure is a schematic diagram of the keyword tentative arrangement order of one embodiment of the present creation.

第五圖是本創作之一實施例之推薦清單之示意圖。 The fifth figure is a schematic diagram of a recommended list of one embodiment of the present creation.

有關本創作之技術內容、特點及功效，藉由以下較佳實施例的詳細說明將可清楚的呈現。 The technical content, features, and effects of the present invention will be apparent from the following detailed description of the preferred embodiments.

請參閱第一圖，第一圖是本創作之一實施例之搜尋引擎裝置 1之示意圖。搜尋引擎裝置1包含：資料庫10、分析模組20、關聯性模組30、排序模組35、通訊模組40以及搜尋模組50。 Please refer to the first figure, which is a search engine device of one embodiment of the present creation. 1 schematic diagram. The search engine device 1 includes a database 10, an analysis module 20, an association module 30, a sorting module 35, a communication module 40, and a search module 50.

資料庫10更包含：房地產物件資料庫11以及房地產相關詞彙資料庫12。房地產物件資料庫11記錄複數筆房地產物件資料，更詳細來說，記錄包括物件描述、地址、社區名及/或物件簡稱。房地產相關詞彙資料庫12記錄有複數個房地產相關詞彙，其中該複數個房地產相關詞彙主要是依據經驗法則，並透過統計分析得知哪些名詞數於房地產相關詞彙，例如：縣名、市名、鄉名、鎮名、區名、里名、街道路名、知名地標名、公共設施名、已知社區名以及已知物件簡稱，做為分析模組20之分析依據。更具體來說，在房地產物件資料庫11中特定的名詞出現頻率特別高，系統就會標示房地產相關詞彙，否則從一開始就不會被輸入到房地產物件資料庫11。 The database 10 further includes: a real estate object database 11 and a real estate related vocabulary database 12. The real estate object database 11 records a plurality of real estate object materials. In more detail, the records include object descriptions, addresses, community names, and/or object abbreviations. The real estate related vocabulary database 12 records a plurality of real estate related vocabulary, wherein the plurality of real estate related vocabularies are mainly based on the rule of thumb, and through statistical analysis, which nouns are related to real estate related vocabulary, for example: county name, city name, township Name, town name, district name, liname, street name, well-known landmark name, public facility name, known community name and known object short name are used as the analysis basis of analysis module 20. More specifically, the specific nouns in the real estate object database 11 appear particularly frequently, and the system will indicate real estate related vocabulary, otherwise it will not be input to the real estate object database 11 from the beginning.

本創作之分析模組20，是用於分析來自於資料來源中的字串。該資料來源可以是房地產物件資料庫11所記錄的房地產物件資料或通訊模組40接收來自用戶的字串。分析模組20根據房地產相關詞彙資料庫12所記錄的房地產相關詞彙，比對來自房地產物件資料庫11或通訊模組40的字串，以分析該字串是否具有房地產相關性。當該字串具有房地產相關性時，分析模組20將該字串定義成第一關鍵字，並將第一關鍵字儲存至資料庫10中，而第一關鍵字包括地址、社區名或物件簡稱。 The analysis module 20 of the present invention is for analyzing strings from data sources. The data source may be real estate object data recorded by the real estate object database 11 or the communication module 40 receives the string from the user. The analysis module 20 compares the strings from the real estate object database 11 or the communication module 40 according to the real estate related vocabulary recorded in the real estate related vocabulary database 12 to analyze whether the string has real estate relevance. When the string has real estate relevance, the analysis module 20 defines the string as a first keyword and stores the first keyword in the database 10, and the first keyword includes an address, a community name, or an object. Abbreviation.

資料庫10儲存複數個第一關鍵字，每個第一關鍵字具有一屬性資料。該屬性資料至少包含有一累計被點選次數與一累積被搜尋次數。請參閱第二圖，第二圖是本創作之一實施例之搜尋狀態之示意圖。在第二圖中，當「大安」字串17在欄位15被輸入，然後選擇搜尋框16，則做為第一關鍵字的「大安」的累積被搜尋次數會增加一次，並記錄在資料庫10。請參閱第三圖，第三圖是本創作之一實施例之推薦清單點選狀態之示意圖。在第三圖中，當「大安」在欄位15被輸入，則產生推薦清單18，列出複數個推薦字串。用戶點選「大安國宅」後，做為第一關鍵字的「大安國宅」的累積被點選次數會增加一次，並記錄在資料庫10。 The database 10 stores a plurality of first keywords, each of which has an attribute data. The attribute data includes at least one cumulative number of times selected and one accumulated number of times of searching. Please refer to the second figure, which is a schematic diagram of the search state of one embodiment of the present creation. In the second figure, when the "Daan" string 17 is entered in field 15, and then the search box 16 is selected, The accumulation of "Daan" of a keyword is increased by the number of searches and recorded in the database 10. Please refer to the third figure. The third figure is a schematic diagram of the recommended list point selection state in one embodiment of the present creation. In the third figure, when "Daan" is entered in field 15, a list of recommendations 18 is generated listing a plurality of recommended strings. After the user clicks on "Da'an National House", the cumulative number of clicks of "Da'an National House" as the first keyword will be increased once and recorded in the database 10.

關聯性模組30是用於利用一詞彙間關聯性分析技術，建立起上述複數個第一關鍵字之間的關聯性(語意距離)。所謂詞彙間關聯性分析技術，可採用習知之KL散度(Kullback-Leibler Divergence)或正規化Google距離(Normalize Google Distance)等語意分析演算法來計算。茲以正規化Google距離來舉例說明，其語意距離公式如下。 The association module 30 is configured to establish an association (speech distance) between the plurality of first keywords by using an inter-lexical correlation analysis technique. The so-called inter-linguistic correlation analysis technique can be calculated by a semantic analysis algorithm such as a conventional KuLback-Leibler Divergence or a Normalize Google Distance. Here is an example of normalizing Google distance. The semantic distance formula is as follows.

上述語意距離公式中，共同出現次數X係在搜尋總數為N筆的房地產物件資料中，搜尋任兩筆關鍵字A、B，獲得在同一筆房地產物件資料中共同出現A、B的結果數，個別出現次數Y、Z係在N筆的該房地產物件資料中，分別搜尋該兩筆關鍵字A、B，獲得A、B在該房地產物件資料中出現的二搜尋結果數。也就是說，若要計算關鍵字「南港」(A)與「南宅」(B)的語意距離D，則經由關聯性模組30根據記錄在房地產物件資料庫11中，從總數為8,058,044,651(N)筆房地產關聯文本中分別出現「南港」的次數46,700,000(Y)與「南宅」的次數12,200,000(Z)，並檢索共同出現「南港」與「南宅」的次數2,630,000(X)，經過上述公式計算則獲得兩關鍵字的語意距離D為0.44305。當語意距離D的數值越大則代表兩筆關鍵字的關聯性越低，D數值越趨近於零則代表兩筆關鍵字關聯性越高，即兩筆關鍵字幾乎都是同時出現。此外，若搜尋結果數(Y或Z)接近搜尋總數N，則代表此關鍵字不具代表性，故關聯性模組30會預先除去該搜尋結果數接近該搜尋總數之關鍵字，不計算其語意距離。 In the above semantic distance formula, the number of co-occurrences X is searched for the total number of N real estate documents, searching for two keywords A and B, and obtaining the number of results of A and B co-occurring in the same real estate object data. The number of occurrences Y and Z are in the real estate object data of N, respectively searching for the two keywords A and B, and obtaining the number of second search results that A and B appear in the real estate object data. That is to say, if the semantic distance D of the keywords "Nangang" (A) and "South House" (B) is to be calculated, it is recorded in the real estate object database 11 via the association module 30, and the total number is 8,058,044,651 ( N) The number of "Southern Port" in the real estate related text is 46,700,000 (Y) and the number of "Southern House" is 12,200,000 (Z), and the number of occurrences of "Nangang" and "Southern House" is 2,630,000 (X). The above formula calculates two keywords The semantic distance D is 0.44305. The higher the value of the semantic distance D is, the lower the relevance of the two keywords is. The closer the D value is to zero, the higher the relevance of the two keywords, that is, the two keywords almost all appear at the same time. In addition, if the number of search results (Y or Z) is close to the total number N of searches, it means that the keyword is not representative, so the association module 30 removes the keyword whose number of search results is close to the total number of searches, without calculating the semantics. distance.

當計算出複數個第一關鍵字之間的關聯性後，關聯性模組30將與第一關鍵字相關的關鍵字定義為第二關鍵字，並記錄在一關連性資料中，其中該第二關鍵字是來自於該第一關鍵字，且沿用該第一關鍵字所具有的屬性資料，以便之後對第二關鍵字進行比對與加權計算。 After calculating the association between the plurality of first keywords, the association module 30 defines the keyword related to the first keyword as the second keyword, and records the information in a related information, where the The second keyword is from the first keyword, and the attribute data possessed by the first keyword is used, so that the second keyword is compared and weighted later.

關聯性模組30重複上述方式，可根據語意距離建立每筆第一關鍵字與第二關鍵字間的關聯性資料。然後，排序模組35將每筆第一關鍵字與第二關鍵字(通常為複數個)，依照語意距離大小予以排序成一暫訂排列順序。請參閱第四圖，第四圖是本創作之一實施例之關鍵字暫訂排列順序之示意圖。從第四圖可以看到分別由頂點為第一關鍵字X₀及Y₀之兩個暫訂排列順序X₀~X_n以及Y₀~Y₂不相連，這代表關聯性模組30所處理的關鍵字群也會有彼此完全不相關的狀況，故分別屬於不同暫訂排列順序。在暫訂排列順序中，任兩關鍵字皆有其語意距離，第一關鍵字X₀與第二關鍵字X₁的語意距離為D(X₀,X₁)，第一關鍵字X₀與第二關鍵字X₂的語意距離為D(X₀,X₂)，第一關鍵字X₀與第二關鍵字X_n的語意距離為D(X₀,X_n)，以此類推。為了簡化圖示內容，在此並不列出所有的語意距離。此外，雖然在第四圖中，暫訂排列順序為一直列，但這只是一種顯示狀態，實際上並不拘泥於任何排列方式，只要能表示各關鍵字間的語意距離即可。排序模組35可將關鍵字關聯序列X₀~X_n中，與第一關鍵字「語意距離較接近」(關聯性較高)的第二關鍵字群(即X₁~X_n)列入同一暫訂排列順序。需要特別指出的是，所謂的「語意距離較接近」是指語意距離小於一預設距離閾值的狀況，故在一個暫訂排列順序中，語意距離小於預設距離閾值的第二關鍵字通常有複數個。預設距離閾值可由熟悉本領域技藝人士適當設定。 The association module 30 repeats the above manner, and can establish the association data between each of the first keyword and the second keyword according to the semantic distance. Then, the sorting module 35 sorts each of the first keyword and the second keyword (usually a plurality of characters) into a tentative arrangement order according to the semantic distance. Please refer to the fourth figure. The fourth figure is a schematic diagram of the keyword tentative arrangement order of one embodiment of the present creation. It can be seen from the fourth figure that the two tentative arrangement orders X ₀ ~X _n and Y ₀ ~Y ₂ which are respectively the vertices of the first keywords X ₀ and Y ₀ are not connected, which represents the processing by the association module 30. The keyword groups will also be completely unrelated to each other, so they belong to different provisional order. In the provisional arrangement order, any two keywords have their semantic distance, the semantic distance of the first keyword X ₀ and the second keyword X ₁ is D(X ₀ , X ₁ ), and the first keyword X ₀ and The semantic distance of the second keyword X ₂ is D(X ₀ , X ₂ ), the semantic distance of the first keyword X ₀ and the second keyword X _n is D(X ₀ , X _n ), and so on. In order to simplify the illustration, all semantic distances are not listed here. In addition, although in the fourth figure, the tentative order is a continuous column, this is only a display state, and is not limited to any arrangement, as long as the semantic distance between the keywords can be expressed. The sorting module 35 may include the second keyword group (ie, X ₁ ~X _n ) of the keyword association sequence X ₀ ~X _n with the first keyword "closer semantic distance" (highly correlated) The same provisional order. It should be specially pointed out that the so-called "closer semantic distance" refers to a situation in which the semantic distance is less than a preset distance threshold, so in a provisional arrangement order, the second keyword having a semantic distance less than the preset distance threshold usually has Multiple. The preset distance threshold can be appropriately set by those skilled in the art.

當產生暫訂排列順序後，排序模組35根據複數個第二關鍵字之累計被點選次數或/和累積被搜尋次數的大小，附加第一權重於暫訂排列順序中的各語意距離，藉由變更在該暫訂排列順序中的複數個第二關鍵字的順序，產生一初始排列順序。舉例來說，在暫訂排列順序X₀~X_n中，第二關鍵字X₁、X₂與第一關鍵字X₀的語意距離分別為D(X₀,X₁)=0.3、D(X₀,X₂)=0.4，因此X₁排在X₂前面。但是，當第二關鍵字X₁的累計被點選次數(及/或累積被搜尋次數)為10，第二關鍵字X₂的累計被點選次數(及/或累積被搜尋次數)為1000，若搜尋引擎裝置所預設的次數閾值為100，超過次數閾值所附加的第一權重為語意距離值-0.15，則D(X₀,X₂)的值會變成0.25，使得第二關鍵字X₁、X₂的順序對調，產生初始排列順序。上述累計被點選次數可以與累積被搜尋次數合併計算，也可以單獨計算，皆可由熟悉本領域技藝人士適當設定。上述次數閾值與第一權重可由熟悉本領域技藝人士適當設定。此外，次數閾值也可以設定為複數個值，而複數個次數閾值可對應複數個第一權重值。例如，當次數閾值超過500，則附加之一第一權重值為-0.2，當次數閾值超過1000，則附加之一第一權重值為-0.25。若加權計算後語意距離小於零，則視為零，不再改變語意距離。若有複數個第二關鍵字在加權計算後皆被視為零，則以這些第二關鍵字在暫訂排列順序中的原本順序來排列，產生初始排列順序。初始排列順序會因為累計被點選次數與累計被搜尋次數的增加，即時進行同步更新，以便產生最新推薦清單。藉此，用戶可從推薦清單中獲得當前最熱門關鍵字的推薦。 After the provisional ordering sequence is generated, the sorting module 35 adds the first weight to the semantic distances in the tentative arrangement order according to the cumulative number of times of the plurality of second keywords or/and the number of accumulated search times. An initial ranking order is generated by changing the order of the plurality of second keywords in the provisional arrangement order. For example, in the provisional arrangement order X ₀ ~X _n , the semantic distances of the second keyword X ₁ , X ₂ and the first keyword X ₀ are D(X ₀ , X ₁ )=0.3, D, respectively. X ₀ , X ₂ ) = 0.4, so X ₁ is in front of X ₂ . However, when the cumulative number of times of the second keyword X ₁ is counted (and/or the number of accumulated searched times) is 10, the cumulative number of times the second keyword X ₂ is counted (and/or the number of accumulated searched times) is 1000. If the threshold number of times preset by the search engine device is 100, and the first weight attached to the threshold of the number of times exceeds the semantic distance value -0.15, the value of D(X ₀ , X ₂ ) becomes 0.25, so that the second keyword The order of X ₁ and X ₂ is reversed, resulting in an initial ranking order. The above-mentioned cumulative number of clicks may be combined with the cumulative number of times of search, or may be calculated separately, and may be appropriately set by those skilled in the art. The above number of thresholds and first weights can be appropriately set by those skilled in the art. In addition, the number threshold may also be set to a plurality of values, and the plurality of thresholds may correspond to a plurality of first weight values. For example, when the number of times threshold exceeds 500, one of the first weight values is appended to -0.2, and when the number of times threshold exceeds 1000, one of the first weight values is appended to -0.25. If the semantic distance is less than zero after the weighting calculation, it is regarded as zero, and the semantic distance is no longer changed. If a plurality of second keywords are regarded as zero after the weighting calculation, the second keywords are arranged in the original order in the provisional arrangement order to generate an initial ranking order. The initial ranking order will be updated synchronously in order to generate the latest recommended list due to the increase in the number of times the cumulative number of clicks and the number of accumulated searches. In this way, the user can get a recommendation of the current hottest keyword from the recommendation list.

通訊模組40接收來自用戶所輸入之搜尋字串，並將搜尋字串傳送至搜尋模組50。搜尋模組50接收該搜尋字串後，找出符合該搜尋字串的第一關鍵字以及被該關連性模組定義為有關之複數個第二關鍵字，以依據排序模組35所決定的初始排列順序顯示複數個第二關鍵字做為推薦清單，藉由通訊模組40提供給用戶進一步點選。當搜尋模組50確認無符合該搜尋字串的第一關鍵字時，則將該搜尋字串以及該累積被搜尋次數儲存至資料庫10。 The communication module 40 receives the search string input from the user and transmits the search string to the search module 50. After receiving the search string, the search module 50 finds a first keyword that matches the search string and a plurality of second keywords that are defined by the affinity module as being determined according to the ranking module 35. The initial arrangement order displays a plurality of second keywords as a recommendation list, and the communication module 40 provides the user with further selection. When the search module 50 confirms that there is no first keyword that matches the search string, the search string and the accumulated searched number of times are stored in the database 10.

本創作除了上述依據累積被點選次數與累積被搜尋次數來產生推薦清單的技術之外，更可以進一步根據關聯數量來產生推薦清單。與累積被點選次數與累積被搜尋次數相比，關聯數量可以避免新關鍵字在被剛建立之初，由於尚未被點擊過，在數量上相對多卻不容易優先被關聯到的問題。也就是說，本創作之關鍵字屬性資料，更包括一關聯數量，該關聯數量係指特定關鍵字相關於多少筆資料，即搜尋該筆關鍵字時，可搜尋到的房地產物件數。 In addition to the above-described techniques for generating a recommendation list based on the cumulative number of times of being selected and the number of times of being searched, the creation of the recommendation list may be further generated based on the number of associations. Compared with the cumulative number of times of being selected and the number of times of being searched, the number of associations can avoid the problem that new keywords are not yet clicked, but are relatively large in number but not easily prioritized. That is to say, the keyword attribute data of the present creation further includes an associated quantity, which refers to the number of pieces of data related to a particular keyword, that is, the number of real estate objects that can be searched for when searching for the keyword.

當搜尋模組50確認有符合搜尋字串的第一關鍵字時，搜尋模組50自該推薦清單中去除該關聯數量大於一預設數量之該第二關鍵字，以防止不具代表性之關鍵字被推薦，降低搜尋效率。這是因為特定關鍵字的關聯數量過大，代表即使以此關鍵字搜尋，也會因為資料太多，不容易找到所需的房地產物件。又，該預設數量可由熟悉本領域技藝人士適當設定。排序模組35也可以根據關聯數量大小，附加一第二權重於語意距離，藉由變更在暫訂排列順序中的複數個第二關鍵字的順序，產生初始排列順序。舉例來說，在暫訂排列順序X₀~X_n中，第二關鍵字X₁、X₂與第一關鍵字X₀的語意距離分別為D(X₀,X₁)=0.2、D(X₀,X₂)=0.3，因此X₁排在X₂前面。但是，當第二關鍵字X₁的關聯數量為300，第二關鍵字X₂的關聯數量為20，若搜尋引擎裝置所預設的關聯數量閾值為100，超過關聯數量閾值所附加的第二權重為語意距離值0.15，則D(X₀,X₁)的值經加權計算後會變成0.35，使得第二關鍵字X₁、X₂的順序對調，產生初始排列順序。又，關聯數量閾值與第二權重可由熟悉本領域技藝人士適當設定。 When the search module 50 confirms that there is a first keyword that matches the search string, the search module 50 removes the second keyword whose number of associations is greater than a preset amount from the recommendation list to prevent a non-representative key. Words are recommended to reduce search efficiency. This is because the number of associations for a particular keyword is too large, which means that even if you search for this keyword, it will not be easy to find the real estate object you need because there is too much data. Again, the preset number can be suitably set by those skilled in the art. The sorting module 35 may also add a second weight to the semantic distance according to the number of associations, and generate an initial sorting order by changing the order of the plurality of second keywords in the tentative sorting order. For example, in the provisional arrangement order X ₀ ~X _n , the semantic distances of the second keyword X ₁ , X ₂ and the first keyword X ₀ are D(X ₀ , X ₁ )=0.2, D, respectively. X ₀ , X ₂ ) = 0.3, so X ₁ is in front of X ₂ . However, when the number of associations of the second keyword X ₁ is 300 and the number of associations of the second keyword X ₂ is 20, if the threshold number of associations preset by the search engine device is 100, the second number attached to the threshold number of associations is exceeded. The weight is a semantic distance value of 0.15, and the value of D(X ₀ , X ₁ ) is 0.35 after being weighted, so that the order of the second keywords X ₁ and X ₂ is reversed, and an initial ranking order is generated. Again, the associated quantity threshold and the second weight can be appropriately set by those skilled in the art.

此外，當搜尋模組50確認有符合該搜尋字串的該第一關鍵字時，也可以將複數個第二關鍵字與對應之複數個關聯數量共同列入推薦清單，直接呈現給用戶，有助於用戶決定點選。請參閱第五圖，第五圖是本創作之一實施例之推薦清單之示意圖。在第五圖中，推薦清單中的各關鍵字皆附有其對應的關聯數量。當然，除了關聯數量以外，也可以呈現屬性資料中的累積被點選次數、累積被搜尋次數或累積被點選次數與累積被搜尋次數的總和。 In addition, when the search module 50 confirms that there is a first keyword that matches the search string, the plurality of second keywords may be included in the recommended list together with the corresponding plurality of associated numbers, and presented directly to the user. Help users decide to click. Please refer to the fifth figure, which is a schematic diagram of a recommended list of one embodiment of the present creation. In the fifth figure, each keyword in the recommendation list is accompanied by its corresponding associated quantity. Of course, in addition to the associated number, the sum of the number of times of the selected points in the attribute data, the number of accumulated searches, or the number of accumulated points selected and the number of accumulated searches may be presented.

綜上所述，本創作能針對房地產物件資料，記錄用戶所點選或搜尋的字串及對應的累積點選次數、累積搜尋次數及關聯之房地產物件數等屬性資料，藉由屬性資料大小及其變化來辨識出關鍵字的熱門程度與代表性，並可據此附加不同權重於推薦清單中的關鍵字，以決定並即時更新關鍵字在推薦清單中的順序，來產生更精確且便利的推薦清單，提高搜尋效率，達到幫助使用者在不熟悉房地產相關詞彙的情況下，盡快找道他想查看的物件(例如，不熟悉南宅其實就是南港國宅)。 In summary, the creation can record the attribute data selected by the user or the corresponding number of cumulative clicks, the cumulative number of searches and the number of related real estate objects, based on the real estate object data, by the attribute data size and The change identifies the popularity and representativeness of the keyword, and can add different weights to the keywords in the recommendation list to determine and instantly update the order of the keywords in the recommendation list to produce more accurate and convenient Recommend lists to improve search efficiency and help users find him as soon as possible without being familiar with real estate-related vocabulary The object you want to view (for example, unfamiliar with the South House is actually the Nangang National House).

以上所述者，僅為本創作之較佳實施例，不能以此限定本創作實施之範圍，凡依本創作申請專利範圍及說明書內容所作之簡單的等效變化與修飾，皆仍屬本創作所涵蓋之範圍內。 The above is only the preferred embodiment of the present invention, and the scope of the present invention cannot be limited thereto. The simple equivalent changes and modifications made according to the scope of the patent application and the contents of the manual are still the present creation. Within the scope covered.

1‧‧‧搜尋引擎裝置 1‧‧‧Search engine unit

10‧‧‧資料庫 10‧‧‧Database

11‧‧‧房地產物件資料庫 11‧‧ Real estate object database

20‧‧‧分析模組 20‧‧‧Analysis module

30‧‧‧關聯性模組 30‧‧‧Affinity module

35‧‧‧排序模組 35‧‧‧Sorting module

40‧‧‧通訊模組 40‧‧‧Communication module

50‧‧‧搜尋模組 50‧‧‧Search Module

Claims

A search engine device capable of collecting keywords, the device comprising: a database configured to store a plurality of first keywords, and each of the first keywords having an attribute data, the attribute data including at least one accumulated The number of clicks and a cumulative number of searches; an analysis module configured to analyze a string in a source of data, and when the string is analyzed for real estate relevance, the string is defined as the number a keyword is stored in the database; an association module is configured to establish an association between the plural first keywords by using an inter-lexical correlation analysis technique, so that each first key The word is related to the plural second keyword, and is recorded in an association data, the plural second keyword is from the first keyword, and the attribute data possessed by the first keyword is used; a sorting module And configured to determine, according to the accumulated number of selected points of the plural second keywords or/and the accumulated number of searched times, one of each of the plural second keywords related to each first keyword is initially determined. Arranging a sequence; a communication module receiving a search string input from a user; and a search module configured to receive the search string and find the first keyword that matches the search string And the related module is defined as the second keyword corresponding to the plural number, and the second keyword is displayed as a recommendation list according to the initial ranking order determined by the sorting module.

The search engine device of claim 1, wherein the database further comprises a real estate related vocabulary database, and the plurality of real estate related words are recorded, wherein the plurality of real estate related words Real estate related terms include: a county name, a city name, a township name, a town name, a district name, a li name, a street name, a well-known landmark name, a public facility name, a known community name, and A known object abbreviation is used as the analysis basis for the analysis module.

The search engine device of claim 1, wherein the data source is a real estate object database or the communication module, and the real estate object database record includes at least one object description, one address, and a community name. And at least two of the real estate object materials in the group formed by the short name, the plurality of first keywords include at least one address, a community name or an object short name.

The search engine device of claim 1, wherein the connection module calculates a plurality of semantic distances between the plurality of first keywords by a semantic analysis algorithm, according to the plurality of semantic distances Establishing an association between each first keyword and the plural second keyword.

The search engine device of claim 4, wherein the sorting module sorts each of the first keywords and the plurality of second keywords into a tentative order according to the plurality of semantic distances.

The search engine device of claim 5, wherein the sorting module adds a first weight according to the cumulative number of selected points of the plurality of second keywords or/and the size of the accumulated searched times The initial ranking order is generated by changing the order of the plurality of second keywords in the provisional arrangement order for each of the semantic distances.

The search engine device of claim 1, wherein when the search module confirms that the first keyword does not match the search string, storing the search string and the accumulated searched number to the database.

The search engine device of claim 1, wherein the search module further records a specific second keyword selected in the recommendation list, and records the cumulative selection of the specific second keyword. The number of times, so that the sorting module updates the initial sorting order.

The search engine device of claim 1, wherein the attribute data further includes an associated quantity, the number of real estate objects that can be searched for when searching for the keyword.

The search engine device of claim 9, wherein when the search module confirms that the first keyword matches the search string, the search module removes the number of associations from the recommended list by more than one a preset number of the second keywords; the sorting module adds a second weight to each of the semantic distances according to the size of the association, by changing the plurality of second keywords in the provisional arrangement order Sequence, the initial ranking order is generated; when the search module confirms that the first keyword matches the search string, the search module lists the plurality of second keywords together with the corresponding plurality of associated numbers Enter the list of recommendations.