HK1185681A

HK1185681A - Trading platform based method and device for structured information search

Info

Publication number: HK1185681A
Application number: HK13112987.5A
Authority: HK
Inventors: 陈旭; 陈智强; 顾海杰; 王德胜; 何亮
Original assignee: 阿里巴巴集团控股有限公司
Filing date: 2013-11-21
Publication date: 2014-02-21

Description

Structured information searching method and device based on transaction platform

Technical Field

The present application relates to the technical field of transaction platform data processing, and in particular, to a structured information search method and a structured information search apparatus based on a transaction platform.

Background

Under the open network environment of the Internet, based on the network communication technology, the buying and selling parties can conspire to carry out various trade activities through the transaction platform, and realize the online shopping of consumers, the online transaction and the online electronic payment among merchants, various business activities, transaction activities, financial activities and related comprehensive service activities. Currently, the transaction platform can be generally divided into Business-to-Business (B2B), Business-to-consumer (B2C), consumer-to-consumer (C2C), and so on. In recent years, domestic transaction platforms have been rapidly developed, and transaction platforms (commonly called shopping websites) in various B2B, C2C and B2C modes, such as Taobao, Dangguan, Suoyamason, Racket, Jingdong mart, and the like, have been approved and accepted by users.

The supplier publishes the supplied product information on the trading platform to form a product information set, and if the published product information is more, the supplier can further group the product information according to certain rules. The shop window information of the corresponding product may also be set for the buyer to browse.

When a buyer purchases a product, the buyer can search and screen satisfactory suppliers at a transaction platform to conduct a transaction. By adopting the prior art, a buyer can search in a search bar based on the supplier ID under the condition of knowing the supplier ID in advance to obtain the related information of the product operated by the corresponding supplier. However, in most cases, the buyer can only use the customized query term to search the product concerned or desired to be purchased by himself on the transaction platform, and then find the information of the corresponding supplier through the product information to obtain the related information of the product operated by the supplier. Moreover, the suppliers obtained in this way are often not the best suppliers, and further manual screening by buyers is required.

Therefore, one technical problem that needs to be urgently solved by those skilled in the art is: a novel structured information search mechanism based on a transaction platform is provided to provide personalized search function for buyers, so that the buyers can quickly and simply search the best suppliers needed by the buyers.

Disclosure of Invention

The application aims to provide a structured information searching method and device based on a transaction platform, which are used for providing a personalized searching function for buyers, so that the buyers can quickly and simply search the optimal suppliers needed by the buyers.

In order to solve the above problem, the present application discloses a structured information search method based on a trading platform, which includes:

receiving search requirement information;

segmenting the search requirement information to obtain a keyword field;

searching candidate structured information matched with the keyword field in a preset structured information base;

calculating text similarity in the candidate structured information by adopting the keyword field, and acquiring characteristic attribute parameters of the candidate structured information;

calculating the score of the candidate structured information according to the text similarity and the characteristic attribute parameters;

and sorting according to the scores of the candidate structured information.

Preferably, the search requirement information includes a product information keyword, and the product information keyword includes: product information keywords submitted by a user;

or the product information keywords submitted by the user and the product information keywords generated by the background.

Preferably, the keyword field obtained after word segmentation includes: a keyword phrase field and a keyword word field.

Preferably, the preset structured information library includes a plurality of structured information index files, and the step of searching the candidate structured information matched with the keyword field in the preset structured information library includes:

querying a structured information index file by using the keyword phrase field, and extracting K pieces of most similar structured information as candidate structured information; wherein K is a preset number threshold;

and if the structured information returned by querying the structured information index file by adopting the keyword phrase field is L, and L is less than K, further querying the index file by adopting the keyword phrase field, extracting the most similar K-L structured information, and forming candidate structured information by the L structured information and the K-L structured information.

Preferably, the keyword field obtained after word segmentation only comprises a keyword word field, and the preset structured information base comprises a plurality of structured information index files; the step of searching candidate structured information matched with the keyword field in a preset structured information base comprises the following steps:

querying a structured information index file by adopting the keyword word field, and extracting K pieces of most similar structured information as candidate structured information; and K is a preset number threshold.

Preferably, the structured information is product grouping information of a supplier, the product grouping information of the supplier includes main keywords grouped by the supplier, and the step of calculating text similarity in the candidate structured information by using keyword fields includes:

calculating a first text similarity, wherein the first text similarity is the text similarity of the keyword word field and a main keyword of a supplier group in the product group information of candidate suppliers;

calculating a second text similarity, wherein the second text similarity is the text similarity of the keyword phrase field and main keywords grouped by suppliers in the product grouping information of corresponding candidate suppliers;

and generating the text similarity of the product grouping information of the corresponding candidate supplier according to the first text similarity and the second text similarity.

Preferably, the structured information is product grouping information of a supplier, the product grouping information of the supplier includes a main keyword grouped by the supplier and an auxiliary keyword grouped by the supplier, and the step of calculating text similarity in the candidate structured information by using a keyword field includes:

calculating a third text similarity, wherein the third text similarity is the text similarity between the keyword word field and auxiliary keywords grouped by suppliers in the product grouping information of corresponding candidate suppliers;

calculating a fourth text similarity, wherein the fourth text similarity is the text similarity between the keyword phrase field and auxiliary keywords grouped by suppliers in the product grouping information of corresponding candidate suppliers;

and generating the text similarity of the product grouping information of the corresponding candidate supplier according to the first text similarity, the second text similarity, the third text similarity and the fourth text similarity.

Preferably, the first text similarity is calculated using the following formula:

wherein, weight (W)_i) Showing the weight corresponding to the attribute of the keyword word field;

percent(W_i) Product of keyword word field in supplierThe percentage of the product group information;

NameOrComment(W_i) A weight indicating that a keyword word field matching the primary keyword of the vendor group belongs to the product information keyword, a weight name (wi) whose value is that a keyword word field matching the primary keyword of the vendor group belongs to the product information keyword submitted by the user, or a weight Comment (W) whose value is that a keyword word field matching the primary keyword of the vendor group belongs to the product information keyword generated in the background_i)；

And/or, calculating the second text similarity by adopting the following formula:

among them, weight (PH)_i) Representing the weight corresponding to the attribute of the keyword phrase field;

percent(PH_i) Representing a percentage of the product group information of the vendor that is occupied by the keyword phrase field;

NameOrComment(PH_i) The weight indicating that the keyword phrase field matching the main keyword of the supplier group belongs to the product information keyword, and the keyword phrase field whose value is matched with the main keyword of the supplier group belongs to the product information submitted by the userWeight Name (PH) of keyword_i) Or, the keyword phrase field matching the primary keywords of the vendor group belongs to the weight Comment (PH) of the product information keywords generated in the background_i)；

And/or, calculating the third text similarity by adopting the following formula:

KeywordsSimilarity＝ProductServiceWordSimilarity*W_p+(1-W_p)*Similarity

wherein, W_pIs the configured weight of the auxiliary keywords of the provider group,^Similaritythe first text similarity;

and/or, calculating the fourth text similarity by adopting the following formula:

PhrasesSimilarity＝ProductServicePhraseSimilarity*W_p+(1-W_p)*Similarity

wherein, W_pIs the configured weight of the auxiliary keywords of the provider group,^Similarityis the second text similarity.

Preferably, the following formula is adopted to generate the text similarity of the product grouping information of the corresponding candidate supplier according to the first text similarity, the second text similarity, the third text similarity and the fourth text similarity:

TextSimilarity＝KeywordsSimilarity*W₁+PhrasesSimilarity*W₂

wherein, W₁And W₂Respectively configured weights.

Preferably, the characteristic attribute parameters include a dominance coefficient, which is a score of a dominance degree of the product grouping information of each supplier; the step of calculating the score of the candidate structured information according to the text similarity and the characteristic attribute parameters is calculated by adopting the following formula:

Similarity＝TextSimilarity*(1-W_m) + main coefficient of operation W_m

Wherein, W_mIs the weight of the dominant coefficient.

Preferably, the feature attribute parameters further include a common index score of the candidate structured information;

the common index score is calculated by adopting the following formula:

Score_public＝∑₁P₁*W₁

wherein, P₁Is the liveness of the supplier, P₂Is the offer responsiveness of the supplier, W₁、W₂Are the weights that each occupies.

Preferably, the search requirement information further includes: user requirement information for the supplier type, user requirement information for the supplier size, user requirement information for the main market of the supplier, user requirement information for the product category, and/or, user requirement information for the minimum quantity of the supplier;

the characteristic attribute parameters further comprise personalized index scores of the candidate structured information;

the personalized index score is calculated by the following formula:

Score_personalized＝∑_iScore_i

wherein the Score is_iThe method comprises the matching degree of the type of a supplier and the corresponding user requirement information, the matching degree of the scale of the supplier and the corresponding user requirement information, the matching degree of the main market of the supplier and the corresponding user requirement information, the matching degree of the product category of the supplier and the corresponding user requirement information, and/or the matching degree of the minimum quantitative quantity of the supplier and the corresponding user requirement information.

Preferably, the step of calculating the score of the candidate structured information according to the text similarity and the characteristic attribute parameter further includes:

for the score value Simiarity x (1-W) calculated from the text Similarity and the dominant coefficient_m) + main coefficient of operation W_mIs normalized to Score'_match；

For the characteristic attribute parameter Score_customIs normalized to Score'_customWherein, Score_custom＝Score_public+Score_personalized，Score_publicScore for common index, Score_personalizedScoring the personalized index;

and calculating the final score of the candidate structured information by adopting the following formula according to the text similarity and the characteristic attribute parameters after the normalization processing:

Score_total＝λScore’_custom+(1-λ)Score’_match，

where λ is a configured parameter.

Preferably, the normalization process is performed by using the following formula:

wherein, X' is the result after X normalization processing.

Preferably, the attributes of the keyword word field include core word attributes and product word attributes, and the step of sorting according to the score of the candidate structured information includes:

(1) extracting the first M candidate structured information with the highest score, placing the first N positions which meet the automatic recommendation condition in the top priority, and placing all the following M-N positions as non-automatic recommendation; m and N are positive integers;

wherein the automatic recommendation condition is as follows: if the matched keyword field contains a phrase, setting the candidate structured information with the dominant coefficient larger than a first threshold value as automatic recommendation; if only words exist in the matched keyword fields, the candidate structural information is set as automatic recommendation, wherein the candidate structural information has the core word attribute, the occupation ratio of the product word attribute in the product grouping information of the suppliers is greater than a second threshold value, and the dominant coefficient is greater than a first threshold value;

the front N bits and the rear M-N bits are respectively sorted according to the following rules:

(2) if the matched keyword field contains words and phrases, entering (3), and if the matched keyword field does not contain phrases, entering (8);

(3) when (2) is the same, the priority is that the number of phrases is large;

(4) in the case of (3) being the same, the phrase longest length takes precedence;

(5) if (4) is the same, priority is given to the fact that the number of words is large;

(6) in the case of (5) being the same, the score value is high priority;

(7) if (6) is the same, the letter order takes precedence, and the step is entered into (11);

(8) when (2) is the same, priority is given to the fact that the number of words is large;

(9) in the case of (8) being the same, the score is high priority;

(10) in the case where (9) is the same, the alphabetical order takes precedence.

16. The method of claim 1, further comprising:

and displaying the candidate structural information to a user according to the sequence.

The embodiment of the present application further discloses a structured information search device based on the trading platform, which includes:

the requirement receiving module is used for receiving search requirement information;

the word segmentation module is used for segmenting the search requirement information to obtain a keyword field;

the candidate structured information searching module is used for searching candidate structured information matched with the keyword field in a preset structured information base;

the similarity calculation module is used for calculating text similarity in the candidate structured information by adopting the keyword fields;

a characteristic attribute parameter obtaining module, configured to obtain a characteristic attribute parameter of the candidate structured information;

the score calculation module is used for calculating the score of the candidate structured information according to the text similarity and the characteristic attribute parameters;

and the sorting module is used for sorting according to the score of the candidate structural information.

Preferably, the preset structured information library includes a plurality of structured information index files, and the candidate structured information search module includes:

the phrase searching submodule is used for inquiring a structured information index file by adopting the keyword phrase field and extracting the most similar K pieces of structured information as candidate structured information; wherein K is a preset number threshold;

and the word searching submodule is used for further adopting the keyword word field to search the index file when the number of the structural information returned by adopting the keyword phrase field to search the index file is L and the L is less than the K, extracting the most similar K-L pieces of structural information, and forming the candidate structural information by the L pieces of structural information and the K-L pieces of structural information.

Preferably, the keyword field obtained after word segmentation only comprises a keyword word field, and the preset structured information base comprises a plurality of structured information index files; the candidate structured information search module comprises:

the word matching sub-module is used for inquiring the structured information index file by adopting the keyword word field and extracting the most similar K pieces of structured information as candidate structured information; and K is a preset number threshold.

Preferably, the structured information is product grouping information of a supplier, the product grouping information of the supplier includes a main keyword grouped by the supplier, and the similarity calculation module includes:

the first text similarity calculation operator module is used for calculating first text similarity, and the first text similarity is the text similarity between the keyword word field and main keywords of a supplier group in the product group information of candidate suppliers;

the second text similarity calculation operator module is used for calculating second text similarity, and the second text similarity is the text similarity between the keyword phrase field and main keywords grouped by suppliers in the product grouping information of corresponding candidate suppliers;

and the first comprehensive submodule is used for generating the text similarity of the product grouping information of the corresponding candidate supplier according to the first text similarity and the second text similarity.

Preferably, the structured information is product grouping information of a supplier, the product grouping information of the supplier includes a primary keyword of the supplier grouping and a secondary keyword of the supplier grouping, and the similarity calculation module includes:

the third text similarity calculation operator module is used for calculating third text similarity, and the third text similarity is the text similarity between the keyword word field and auxiliary keywords grouped by suppliers in the product grouping information of corresponding candidate suppliers;

the fourth text similarity calculation operator module is used for calculating fourth text similarity, and the fourth text similarity is the text similarity between the keyword phrase field and auxiliary keywords grouped by suppliers in the product grouping information of corresponding candidate suppliers;

and the second comprehensive submodule is used for generating the text similarity of the product grouping information of the corresponding candidate supplier according to the first text similarity, the second text similarity, the third text similarity and the fourth text similarity.

Preferably, the characteristic attribute parameters include a dominance coefficient, and the dominance coefficient is a score of a dominance degree of the product grouping information of each supplier.

Preferably, the feature attribute parameters further include a common index score and a personalized index score of the candidate structured information.

Preferably, the device further comprises:

and the display module is used for displaying the candidate structural information to a user according to the sequence.

Compared with the prior art, the method has the following advantages:

the method and the system realize the personalized search function of the buyer and enable the buyer to quickly and simply search the required optimal suppliers by receiving the search requirement information (RFQ, Request For solicitation, which is filled in by the buyer on an electronic commerce website and relates to the product information to be purchased by the buyer, including product information keywords, personalized indexes and the like), calculating the comprehensive score of the similarity between the search requirement information of the user and the supplier information and sequencing the comprehensive score, and calculating a batch of optimal suppliers meeting the requirements of the buyer For the buyer at the background.

Drawings

FIG. 1 is a flow chart of steps of an embodiment of a structured information search method based on a trading platform according to the present application;

fig. 2 is a block diagram of a structured information search device based on a trading platform according to an embodiment of the present application.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present application more comprehensible, the present application is described in further detail with reference to the accompanying drawings and the detailed description.

One of the core concepts of the embodiments of the present application is to calculate and sort a comprehensive score of similarity between search requirement information of a user and supplier information by receiving search requirement information (RFQ, Request For solicitation, product information about a product to be purchased by a buyer on an e-commerce website, including product information keywords, personalized indexes, etc.) richer than a general search provided by the user, and calculate a batch of best suppliers meeting the requirements of the buyer For the buyer in the background.

Referring to fig. 1, a flowchart illustrating steps of an embodiment of a structured information search method based on a trading platform according to the present application is shown, which may specifically include the following steps:

step 101, receiving search requirement information;

in a specific implementation, a user (buyer) may submit product information (RFQ) to be purchased in the foreground, including product information keywords, personalized indicators, and the like. In a preferred embodiment of the present application, the search requirement information may include a product information keyword, and the product information keyword may include: product information keywords submitted by a user; or the product information keywords submitted by the user and the product information keywords generated by the background.

As an example of a specific application of the embodiment of the present application in a trading platform, the search requirement information may include the contents of the RFQ field as shown in the following table:

in the embodiment of the application, a user can only submit basic product information keywords as search requirement information, and according to the product information keywords submitted by the user, a background can process the product information keywords into more standard product information keywords according to some rules, and then the two types of product information keywords are processed together. The user may also submit more personalized indexes as search requirement information as in the above example, and in the subsequent search processing, the personalized indexes are used as the conditions for further screening by the provider.

102, segmenting the search requirement information to obtain a keyword field;

as is well known, english is a unit of word, and words are separated by spaces, while chinese is a unit of word, and all words in a sentence can be connected to describe a meaning. For example, the English sentence I am a student, in Chinese, is: "I am a student". The computer can simply know that a student is a word by means of a space, but it cannot be easily understood that two words "learn" and "give birth" together to represent a word. The Chinese character sequence is cut into meaningful words, namely Chinese word segmentation. For example, i am a student, and the result of the word segmentation is: i am a student.

Some common word segmentation methods are presented below:

1. the word segmentation method based on character string matching comprises the following steps: the method is characterized in that a Chinese character string to be analyzed is matched with a vocabulary entry in a preset machine dictionary according to a certain strategy, and if a certain character string is found in the dictionary, the matching is successful (a word is identified). In the actually used word segmentation system, mechanical word segmentation is used as an initial segmentation means, and various other language information is used to further improve the accuracy of segmentation.

2. The word segmentation method based on feature scanning or mark segmentation comprises the following steps: the method is characterized in that some words with obvious characteristics are preferentially identified and segmented in a character string to be analyzed, the words are used as breakpoints, an original character string can be segmented into smaller strings, and then mechanical segmentation is carried out, so that the matching error rate is reduced; or combining word segmentation and part of speech tagging, providing help for word decision by utilizing rich part of speech information, and detecting and adjusting word segmentation results in the tagging process, thereby improving the segmentation accuracy.

3. Understanding-based word segmentation method: the method is to enable a computer to simulate the understanding of sentences by a human so as to achieve the effect of recognizing words. The basic idea is to analyze syntax and semantics while segmenting words, and to process ambiguity phenomenon by using syntax information and semantic information. It generally comprises three parts: word segmentation subsystem, syntax semantic subsystem, and master control part. Under the coordination of the master control part, the word segmentation subsystem can obtain syntactic and semantic information of related words, sentences and the like to judge word segmentation ambiguity, namely the word segmentation subsystem simulates the process of understanding sentences by people. This word segmentation method requires the use of a large amount of linguistic knowledge and information.

4. The word segmentation method based on statistics comprises the following steps: the word co-occurrence frequency or probability of adjacent co-occurrence of the characters in the Chinese information can better reflect the credibility of the formed words, so that the frequency of the combination of the adjacent co-occurrence characters in the Chinese data can be counted, the co-occurrence information of the adjacent co-occurrence characters can be calculated, and the adjacent co-occurrence probability of the two Chinese characters X, Y can be calculated. The mutual presentation information can reflect the closeness degree of the combination relation between the Chinese characters. When the degree of closeness is above a certain threshold, it is considered that the word group may constitute a word. The method only needs to count the word group frequency in the corpus and does not need to segment the dictionary.

In the embodiment of the application, the word segmentation can be performed according to the key field in the RFQ information input by the foreground. If the category information indicated in the RFQ contains chemical categories or medical categories, performing special processing in word segmentation processing, performing word segmentation processing separately from other industries, and mainly performing word segmentation by using a chemical dictionary; if the category information in the RFQ is empty, the default buyer has no requirement for the category. The word segmentation result has a series of words and a series of phrases, i.e. the keyword field obtained after the word segmentation may include: a keyword phrase field and a keyword word field. Of course, in practice, the keyword field obtained after the word segmentation may also include only the keyword word field.

103, searching candidate structured information matched with the keyword field in a preset structured information base;

in a specific implementation, a supplier publishes supplied product information on a trading platform to form a product information set, and if the issued product information is more, the supplier can further group the product information according to a certain rule, that is, the product grouping information of the supplier is formed, generally speaking, one supplier has a plurality of product grouping information. The product group information of the supplier is structured information, which is different from the data source of the general retrieval (such as the general data source adopted by the search engine of google, baidu, etc.). The product grouping information of the supplier is a description of the supplier and the product to be sold, and generally mainly comprises the following components: the key words of the supplier group, the type of the supplier, the scale of the supplier, the category of the product provided by the supplier, the product key words provided by the supplier, and the like.

The preset structured information library may include a plurality of structured information index files, and the index files may be generated by using inverted indexes. It is well known that inverted indexing results from the need to look up records based on the values of attributes in practical applications. Each entry in such an index table includes an attribute value and the address of the record having the attribute value. Since the attribute value is not determined by the record but the position of the record is determined by the attribute value, it is called an inverted index. In a specific application of the trading platform, the inverted index can be used for storing product grouping information (structured information) of a supplier in a license form.

In a preferred embodiment of the present application, when the keyword field obtained after the word segmentation includes a keyword phrase field and a keyword word field, the step 103 may specifically include the following sub-steps:

substep S11, adopting the keyword phrase field to inquire a structured information index file, and extracting K pieces of most similar structured information as candidate structured information; wherein K is a preset number threshold;

and a substep S12, if the number of the structural information returned by the structural information index file queried by the keyword phrase field is L, and L is less than K, further querying the index file by the keyword phrase field, extracting K-L pieces of most similar structural information, and forming candidate structural information by the L pieces of structural information and the K-L pieces of structural information.

For example, the supplier information file is queried according to the result of the word segmentation to obtain the product grouping information of a batch of suppliers. The supplier information is in a grouping unit, each product grouping contains some text information, and the supplier information is stored in a disk in a license form. With the present embodiment, the query process can be divided into two steps:

1) firstly, inquiring an index file according to an input keyword phrase field, and returning the most similar product grouping information of TOP-K suppliers, wherein TOP-K is a rated threshold value;

2) and if the number of providers queried by the keyword phrase field is less than TOP-K, querying the index file by using the keyword phrase field, and finally returning the product grouping information of the TOP-K providers (including the product grouping information of the providers in the step 1). After the query is finished, the query is terminated no matter whether the number of the results is less than TOP-K.

As another preferred embodiment of the present application, if the keyword field obtained after the word segmentation only includes a keyword word field, the step 103 may include the following sub-steps:

substep S13, adopting the keyword word field to inquire the structured information index file, and extracting the most similar K pieces of structured information as candidate structured information; and K is a preset number threshold.

Of course, the form of storing the structural information in the preset structural information library and the corresponding manner of searching for the candidate structural information matched with the keyword field are only used as examples, and those skilled in the art may arbitrarily select the candidate structural information according to the actual situation, which need not be limited in this application.

Step 104, calculating text similarity in the candidate structured information by adopting the keyword field;

in a preferred embodiment of the present application, the structured information is product grouping information of a supplier, the product grouping information of the supplier includes a primary keyword of a supplier grouping, and the step 104 may include the following sub-steps:

substep S21, calculating a first text similarity, wherein the first text similarity is the text similarity between the keyword word field and the main keywords grouped by suppliers in the product grouping information of the candidate suppliers;

substep S22, calculating a second text similarity, wherein the second text similarity is the text similarity between the keyword phrase field and the main keywords grouped by suppliers in the product grouping information of corresponding candidate suppliers;

and a substep S22, generating the text similarity of the product grouping information of the corresponding candidate supplier according to the first text similarity and the second text similarity.

In another preferred embodiment of the present application, the structured information is product grouping information of a vendor, the product grouping information of the vendor may include a primary keyword of a vendor group and a secondary keyword of the vendor group, and the step 104 may specifically include the following sub-steps:

substep S31, calculating a first text similarity, wherein the first text similarity is the text similarity between the keyword word field and the main keywords grouped by suppliers in the product grouping information of the candidate suppliers;

substep S32, calculating a second text similarity, wherein the second text similarity is the text similarity between the keyword phrase field and the main keywords grouped by suppliers in the product grouping information of corresponding candidate suppliers;

substep S33, calculating a third text similarity, wherein the third text similarity is the text similarity between the keyword word field and auxiliary keywords grouped by suppliers in the product grouping information of corresponding candidate suppliers;

substep S34, calculating a fourth text similarity, wherein the fourth text similarity is the text similarity between the keyword phrase field and auxiliary keywords grouped by suppliers in the product grouping information of corresponding candidate suppliers;

and a substep S35, generating the text similarity of the product grouping information of the corresponding candidate supplier according to the first text similarity, the second text similarity, the third text similarity and the fourth text similarity.

As a specific application example of the embodiment of the present application, the following calculation method may be adopted to calculate the text similarity:

first, the RFQ _ Name and RFQ _ Comment contained in RFQ are first subjected to word segmentation and plural-to-singular conversion to obtain a series of word fields and phrase fields, and the text similarity is divided into two parts, namely, keyword word field similarity and keyword phrase field similarity. The similarity calculation methods for these two parts will be described separately below.

Keyword word field similarity:

the similarity of the keywords mainly comprises two parts: one part is similarity of primary keywords (word field) of the provider group and RFQ, and the other part is similarity of secondary keywords (word field) of the provider group, such as product/service (product/service) keyword of the provider) and RFQ.

1) Similarity of primary keywords and RFQ of vendor group (first text similarity):

the calculation formula is as follows:

percent(W_i) Represents the percentage of the keyword word field in the product group information of the supplier;

NameOrComment(W_i) A weight indicating that a keyword word field matching the primary keyword of the vendor group belongs to the product information keyword, and a weight Name (W) whose value is that a keyword word field matching the primary keyword of the vendor group belongs to the product information keyword submitted by the user_i) Or, the keyword word field matching the primary keyword of the vendor group belongs to a weight Comment (W) of the product information keyword generated in the background_i) (ii) a Name indicates the weight of the title subject of RFQ, comment indicates the weight of a field that is manually added by the operator, and here different weights can be configured according to these two different cases.

2) Similarity weighting of product/service auxiliary keywords of the provider group (third text similarity):

if the provider provides an auxiliary keyword, such as product/service information, the product/service information can be considered as the correction information of the similarity between the main keyword and the RFQ grouped by the provider, namely, the similarity between the product/service and the keyword word field of the RFQ _ Name and RFQ _ Commment is calculated and is marked as product serviceWordSimilarity, and the method is the same as the method for calculating the similarity between the main keyword and the RFQ grouped by the provider;

the calculation formula is as follows:

KeywordsSimilarity＝ProductServiceWordSimilarity*W_p+(1-W_p)*Similarity

wherein, W_pThe configuration weight of the product service can be changed in a configuration file, and Similarity is a first text Similarity; the similarity of the default product/service keyword information match may be set to 0 when chemical and medical industries are encountered.

Thus far, the keyword similarity score calculated above is the keyword word field similarity score of the RFQ with the current product group of the current supplier.

(II) similarity of keyword phrase field

The similarity of the keyword phrase field is the same as that of the keyword word field, and mainly comprises two parts: one part is the similarity between the primary keyword set (phrase field) of the supplier group and the RFQ, and the other part is the similarity between the auxiliary keyword set (phrase field, such as product/service keyword set) of the supplier group and the RFQ.

(3) Similarity of primary keyword group and RFQ of vendor group (second text similarity):

the calculation formula is as follows:

NameOrComment(PH_i) A weight indicating that a keyword phrase field matching the primary keyword of the supplier group belongs to the product information keyword, and a weight Name (PH) whose value is that a keyword phrase field matching the primary keyword of the supplier group belongs to the product information keyword submitted by the user_i) Or, the keyword phrase field matching the primary keywords of the vendor group belongs to the weight Comment (PH) of the product information keywords generated in the background_i) The Name indicates the weight of the title subject of the RFQ, and the comment indicates the weight of the field manually added by the operator, where different weights can be configured according to these two different cases.

4) Similarity weighting of product/service auxiliary keywords of the provider group (fourth text similarity):

if the provider provides the product/service information, the product/service information can be used as the correction information of the similarity between the main keyword group and the RFQ grouped by the provider, namely, the similarity between the product/service and the keyword phrase fields of the RFQ _ Name and the RFQ _ Comment is calculated and is marked as the similarity between the product/service and the RFQ _ Comment, and the method is the same as the method for calculating the similarity between the provider phrase and the RFQ;

the calculation formula is as follows:

PhrasesSimilarity-ProductServicePhraseSimilarity*W_p+(1-W_p)*Similarity

wherein, W_pThe configuration weight of product/service can be changed in the configuration file, and Similarity is the second text Similarity. When chemical and medical industries are encountered, the similarity of the default product/service phrase information matching is0。

Thus far, the phrasesim identity calculated above is the keyword phrase field similarity score of the RFQ with the current product group of the current vendor.

In summary, the total text similarity is:

TextSimilarity＝KeywordsSimilarity*W₁+PhrasesSimilarity*W₂

wherein, W₁And W₂Respectively configured weights.

Of course, the above calculation method of text similarity is only used as an example, and any calculation method is feasible for those skilled in the art according to the actual situation, and the present application is not limited thereto.

105, acquiring characteristic attribute parameters of the candidate structured information;

in a preferred embodiment of the present application, the characteristic attribute parameter may include a marketing factor, which is a score of a marketing degree of product grouping information of each supplier.

The similarity algorithm of the embodiment of the application mainly comprises two dimensions: the text similarity and the main operation coefficient, and the similarity score is a comprehensive investigation result of the text similarity and the main operation coefficient; the text similarity is divided from the dimension of the matched keyword type, and may include the text similarity of the keyword word field and the text similarity of the keyword phrase field; the division is from the dimension of the text source, and the primary keyword match which can be divided into a supplier group is matched with the auxiliary keyword (product/service text information) of the supplier group. The main coefficient in this step is a comprehensive index of the ratio of the grouped products to the ratio of the grouped products in the showcase, and in practice, the main coefficient can be extracted from the supplier information file.

Step 106, calculating the score of the candidate structured information according to the text similarity and the characteristic attribute parameters;

in a specific implementation, each product group of each supplier has a dominance coefficient as a characteristic attribute parameter, which represents a score of the dominance degree of the product group, in this case, the score of the candidate structured information may be calculated by using the following formula:

Similarity＝TextSimilarity*(1-W_m) + main coefficient of operation W_m

Wherein, W_mIs the weight of the configured dominant coefficient and TextSimilarity is the text similarity score in the example of step 104.

In a preferred embodiment of the present application, the feature attribute parameters may further include a common index score and a personalized index score of the candidate structured information.

Specifically, the common index score may be calculated using the following formula:

Score_public＝∑₁P₁*W₁

The personalized index score may be calculated by the following formula:

Score_personalized＝∑_iScore_i

in practice, the search requirement information may further include: user requirement information for the supplier type, user requirement information for the supplier size, user requirement information for the main market of the supplier, user requirement information for the product category, and/or, user requirement information for the minimum quantity of the supplier; in this case, the Score_iIt may include the matching degree of the type of the supplier with the information required by the corresponding user, the matching degree of the scale of the supplier with the information required by the corresponding user, and the matching degree of the supplierThe matching degree of the main market and the corresponding user requirement information, the matching degree of the product category of the supplier and the corresponding user requirement information, and/or the matching degree of the minimum starting quantity of the supplier and the corresponding user requirement information.

In a preferred embodiment of the present application, the step 106 may further include the following sub-steps:

substep S41, calculating a score value Similarity (1-W) based on the text Similarity and the dominant coefficient_m) + main coefficient of operation W_mIs normalized to Score'_match；

In a specific implementation, the normalization process may be performed by using the following formula:

wherein, X' is the result after X normalization processing.

Substep S42, Score for feature attribute parameter_customIs normalized to Score'_customWherein, Score_custom＝Score_public+Score_personalized，Score_publicScore for common index, Score_personalizedScoring the personalized index;

and a substep S43, calculating a score of the final candidate structured information by using the following formula for the text similarity and the characteristic attribute parameters after the normalization processing:

Score_total＝λScore’_custom+(1-λ)Score’_match，

where λ is a configured parameter.

In this step, the personalized index scores may be calculated for the product group information of the candidate suppliers, respectively. The personalized index can be set as the liveness of the supplier, the response of the quoted price, the type of the supplier, the scale of the supplier and the like, and the calculation scoring strategy is to score the supplier groups meeting the corresponding dimension requirements of the buyers. In this embodiment, the personalized index may be divided into two parts, namely a public part and a personalized part, wherein the public part is supplier liveness and offer responsiveness, and the personalized part is supplier type, supplier scale, and the like. And calculating the public part score Scorepubic and the three personalized part scores Scorespersized in turn, so that the score Scoreconsumom of the personalized index ordering is Scorepubic + Scorespersized.

The specific calculation method of the personalized index score is as follows:

1) common part

The public index has two indexes, namely supplier activity and supplier price reporting responsiveness. The calculation method is as follows: assume that supplier A has a supplier liveness of P₁Response of quote is P₂The score for that supplier a in the common metrics section is: score_public＝∑₁P₁*W₁Wherein W is₁、W₂The weights occupied by the two are configurable and adjustable.

2) Personalized part

The indexes of the part mainly comprise supplier types, supplier sizes and the like. The corresponding dimension requirements of the buyers are met and the suppliers are added with points, so the Score_personalized＝∑_iScore_i

3) And combining the results of the previous steps to calculate a total score and sorting the total score.

The final calculated total score is a comprehensive examination of similarity and personalized index, and each supplier represents the supplier according to the group with the highest score in the product groups of the series of suppliers. Assuming that the calculated text similarity is Scorematch, the result of normalization processing on Scorematch is Score' match. The calculation process of the total score is: after the results of the normalization processing of scorechtom and Scorematch are Score 'custom and Score' match, the final Score Scoretotal ═ λ Score 'custom + (1- λ) Score' match, where λ is configurable through a configuration file.

Of course, the setting of the above feature attribute parameters and the calculation of the score of the candidate structured information are only used as examples, and those skilled in the art may set other feature attribute parameters and adopt other calculation methods of the score of the candidate structured information, which is not limited in this application.

And 107, sorting according to the scores of the candidate structured information.

In a specific implementation, the attributes may be configured for the keyword word field and the keyword phrase field, for example, the attributes of the keyword word field may include a core word attribute and a product word attribute, in which case, the step 107 may specifically be ordered according to the following rules:

for example, M is 200, N is 50, and the first threshold is 0.1.

(3) when (2) is the same, the priority is that the number of phrases is large;

(6) in the case of (5) being the same, the score value is high priority;

(9) in the case of (8) being the same, the score is high priority;

Of course, the above sorting method is only used as an example, and any sorting method is feasible for those skilled in the art according to the actual situation, and the application is not limited to this.

In a specific implementation, the embodiment of the present application may further include the following steps:

and step 108, displaying the candidate structural information to the user according to the sequence.

It is noted that, for simplicity of explanation, the method embodiments are described as a series of acts or combination of acts, but those skilled in the art will appreciate that the present application is not limited by the order of acts, as some steps may, in accordance with the present application, occur in other orders and concurrently. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

Referring to fig. 2, a block diagram of a structured information search apparatus based on a trading platform according to an embodiment of the present application is shown, which may specifically include the following modules:

a requirement receiving module 201, configured to receive search requirement information;

a word segmentation module 202, configured to segment words of the search requirement information to obtain a keyword field;

a candidate structured information searching module 203, configured to search, in a preset structured information base, candidate structured information matched with the keyword field;

a similarity calculation module 204, configured to calculate text similarity in the candidate structured information by using the keyword field;

a feature attribute parameter obtaining module 205, configured to obtain a feature attribute parameter of the candidate structured information;

a score calculation module 206, configured to calculate a score of the candidate structured information according to the text similarity and the feature attribute parameter;

and the sorting module 207 is used for sorting according to the score of the candidate structural information.

In a specific implementation, the embodiment of the present application may further include the following modules:

and a presentation module 208, configured to present the candidate structured information to the user in order.

In a preferred embodiment of the present application, the search requirement information may include a product information keyword, and the product information keyword includes: product information keywords submitted by a user;

In a specific implementation, the keyword field obtained after the word segmentation may include: a keyword phrase field and a keyword word field.

In this case, the preset structured information library includes a plurality of structured information index files, and the candidate structured information search module 203 may include the following sub-modules:

In another preferred embodiment of the present application, the keyword field obtained after the word segmentation only includes a keyword word field, and the preset structured information library includes a plurality of structured information index files; in this case, the candidate structured information lookup module 203 may include the following sub-modules:

In a specific application of the transaction platform, the structured information may be product grouping information of a vendor, the product grouping information of the vendor includes a primary keyword of a vendor group, and the similarity calculation module 204 may specifically include the following sub-modules:

In another preferred embodiment of the present application, the structured information may be product grouping information of a vendor, and the product grouping information of the vendor may include a primary keyword of the vendor group and a secondary keyword of the vendor group, in which case, the similarity calculation module 204 may specifically include the following sub-modules:

As an example of specific application of the embodiment of the present application, the following formula may be adopted to calculate the first text similarity:

NameOrComment(W_i) Weights representing keyword word fields matching the primary keywords of the supplier grouping as belonging to product information keywords, valued as grouped with the supplierThe keyword word field for primary keyword match belongs to the weight Name (W) of the product information keyword submitted by the user_i) Or, the keyword word field matching the primary keyword of the vendor group belongs to a weight Comment (W) of the product information keyword generated in the background_i)；

And/or, the second text similarity may be calculated using the following formula:

among them, weight (PH)_i) Showing the weight corresponding to the attribute of the keyword phrase field;

NameOrComment(PH_i) A weight indicating that a keyword phrase field matching the primary keyword of the supplier group belongs to the product information keyword, and a weight Name (PH) whose value is that a keyword phrase field matching the primary keyword of the supplier group belongs to the product information keyword submitted by the user_i) Or, the keyword phrase field matching the primary keywords of the vendor group belongs to the weight Comment (PH) of the product information keywords generated in the background_i)；

And/or, the third text similarity may be calculated using the following formula:

KeywordsSimilarity＝ProductServiceWordSimilarity*W_p(1-W_p)*Similarity

and/or, the fourth text similarity may be calculated using the following formula:

PhrasesSimilarity＝ProductServicePhraseSimilarity*W_p+(1-W_p)*Similarity

The text similarity of the product grouping information of the corresponding candidate supplier can be generated according to the first text similarity, the second text similarity, the third text similarity and the fourth text similarity by adopting the following formula:

TextSimilarity＝KeywordsSimilarity*W₁+PhrasesSimilarity*W₂

wherein, W₁And W₂Respectively configured weights.

As an example of specific application of the embodiment of the present application, the characteristic attribute parameter may include a marketing coefficient, which is a score of marketing degree of product grouping information of each supplier; in this case, the score of the candidate structured information may be calculated from the text similarity and the feature attribute parameter using the following formula:

Similarty＝TextSimilarity*(1-W_m) + main coefficient of operation W_m

Wherein, W_mIs the weight of the dominant coefficient.

More preferably, the feature attribute parameters may further include a common index score and a personalized index score of the candidate structured information.

The common index score may be calculated using the following formula:

Scorepublic＝∑₁P₁*W₁

The personalized index score may be calculated by the following formula:

Score_personalized＝∑_iScore_i

for example, the search requirement information may further include: user requirement information for the vendor type, for the vendor size, for the main market of the vendor, and/or for the product category; in this case, then the Score_iThe matching degree of the type of the supplier and the corresponding user requirement information, the matching degree of the scale of the supplier and the corresponding user requirement information, the matching degree of the main market of the supplier and the corresponding user requirement information, and/or the matching degree of the product category of the supplier and the corresponding user requirement information can be included.

In the embodiment of the present application, the score of the candidate structured information may be further calculated according to the text similarity and the feature attribute parameter in the following manner:

Wherein, the normalization process can be performed by adopting the following formula:

wherein, X' is the result after X normalization processing.

Score_total＝λScore’_custom+(1-λ)Score’_match，

where λ is a configured parameter.

In a specific implementation, the attributes of the keyword word field may include core word attributes and product word attributes, and the sorting according to the score of the candidate structured information may be performed according to the following rules:

(3) when (2) is the same, the priority is that the number of phrases is large;

(6) in the case of (5) being the same, the score value is high priority;

(9) in the case of (8) being the same, the score is high priority;

For the device embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.

The application is operational with numerous general purpose or special purpose computing system environments or configurations. For example: personal computers, server computers, hand-held or portable devices, tablet-type devices, multiprocessor systems, microprocessor-based systems, set top boxes, programmable consumer electronics, network PCs, minicomputers, mainframe computers, distributed computing environments that include any of the above systems or devices, and the like.

The application may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The application may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. The term "comprising", without further limitation, means that the element so defined is not excluded from the group consisting of additional identical elements in the process, method, article, or apparatus that comprises the element.

The structured information search method based on the trading platform and the structured information search device based on the trading platform provided by the application are introduced in detail, specific examples are applied in the text to explain the principle and the implementation mode of the application, and the description of the above embodiments is only used for helping to understand the method and the core idea of the application; meanwhile, for a person skilled in the art, according to the idea of the present application, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present application.

Claims

1. A structured information searching method based on a trading platform is characterized by comprising the following steps:

receiving search requirement information;

segmenting the search requirement information to obtain a keyword field;

and sorting according to the scores of the candidate structured information.

2. The method of claim 1, wherein the search requirement information comprises a product information keyword, the product information keyword comprising: product information keywords submitted by a user;

3. The method of claim 2, wherein the keyword fields obtained after the word segmentation comprise: a keyword phrase field and a keyword word field.

4. The method according to claim 3, wherein the preset structured information base comprises a plurality of structured information index files, and the step of searching the preset structured information base for candidate structured information matching with the keyword field comprises:

5. The method according to claim 2, wherein the keyword field obtained after the word segmentation only comprises a keyword word field, and the preset structured information base comprises a plurality of structured information index files; the step of searching candidate structured information matched with the keyword field in a preset structured information base comprises the following steps:

6. The method according to claim 2, 3 or 4, wherein the structured information is product grouping information of a supplier, the product grouping information of the supplier comprises main keywords of the supplier grouping, and the step of calculating text similarity in the candidate structured information by using keyword fields comprises:

7. The method according to claim 2, 3 or 4, wherein the structured information is product grouping information of a supplier, the product grouping information of the supplier comprises main keywords of the supplier grouping and auxiliary keywords of the supplier grouping, and the step of calculating text similarity in the candidate structured information by using the keyword field comprises:

8. The method of claim 7, wherein the first text similarity is calculated using the following formula:

percent(w_i) Product grouping representing keyword word fields at a vendorPercentage of information;

NameOrComment(W_i) A weight indicating that a keyword word field matching the primary keyword of the vendor group belongs to the product information keyword, and a weight Name (W) whose value is that a keyword word field matching the primary keyword of the vendor group belongs to the product information keyword submitted by the user_i) Or, the keyword word field matching the primary keyword of the vendor group belongs to a weight Comment (W) of the product information keyword generated in the background_i)；

KeywordsSimilarity＝ProductServiceWordSimilarity*W_p+(1-W_p)*Similarity

PhrasesSimilarity＝ProductServicePhraseSimilarity*W_p+(1-W_p)*Similarity

9. The method of claim 8, wherein the text similarity of the product grouping information of the corresponding candidate supplier is generated according to the first text similarity, the second text similarity, the third text similarity and the fourth text similarity using the following formula:

TextSimilarity＝KeywordsSimilarity*W₁+PhrasesSimilarity*W₂

wherein, W₁And W₂Respectively configured weights.

10. The method of claim 9, wherein the characteristic attribute parameters include a marketing factor that is a score of how much each supplier's product group information is hosted; the step of calculating the score of the candidate structured information according to the text similarity and the characteristic attribute parameters is calculated by adopting the following formula:

Similarity＝TextSimilarity*(1-W_m) + main coefficient of operation W_m

Wherein, W_mIs the weight of the dominant coefficient.

11. The method according to claim 8, 9 or 10, wherein the feature attribute parameters further comprise a common index score of the candidate structured information;

the common index score is calculated by adopting the following formula:

Score_public＝∑₁P₁*W₁

12. The method of claim 11, wherein the searching for the need information further comprises: user requirement information for the supplier type, user requirement information for the supplier size, user requirement information for the main market of the supplier, user requirement information for the product category, and/or, user requirement information for the minimum quantity of the supplier;

the personalized index score is calculated by the following formula:

Score_personalized＝∑_iScore_i

13. The method of claim 12, wherein the step of calculating the score of the candidate structured information according to text similarity and feature attribute parameters further comprises:

Score_total＝λScore’_custom+(1-λ)Score’_match，

where λ is a configured parameter.

14. The method of claim 13, wherein the normalization process is performed using the following equation:

wherein, X' is the result after X normalization processing.

15. The method of claim 12, 13 or 14, wherein the attributes of the keyword word field include core word attributes and product word attributes, and wherein the step of ranking according to the score of the candidate structured information comprises:

(3) when (2) is the same, the priority is that the number of phrases is large;

(6) in the case of (5) being the same, the score value is high priority;

(9) in the case of (8) being the same, the score is high priority;

16. The method of claim 1, further comprising:

17. A structured information search device based on a trading platform is characterized by comprising:

18. The apparatus of claim 17, wherein the search requirement information comprises a product information keyword, the product information keyword comprising: product information keywords submitted by a user;

19. The apparatus of claim 18, wherein the keyword fields obtained after the word segmentation comprise: a keyword phrase field and a keyword word field.

20. The apparatus according to claim 19, wherein the preset structured information library includes a plurality of structured information index files, and the candidate structured information search module includes:

21. The apparatus according to claim 18, wherein the keyword field obtained after the word segmentation only includes a keyword word field, and the preset structured information library includes a plurality of structured information index files; the candidate structured information search module comprises:

22. The apparatus according to claim 18, 19 or 20, wherein the structured information is product grouping information of a vendor, the product grouping information of the vendor comprises a main keyword of a vendor group, and the similarity calculation module comprises:

23. The apparatus according to claim 18, 19 or 20, wherein the structured information is product grouping information of a vendor, the product grouping information of the vendor comprises a primary keyword of a vendor group and a secondary keyword of the vendor group, and the similarity calculation module comprises:

24. The apparatus of claim 23, wherein the characteristic attribute parameters comprise a marketing factor that is a score of how much each supplier's product grouping information is hosted.

25. The apparatus of claim 24, wherein the feature attribute parameters further comprise a public index score and a personalized index score of the candidate structured information.

26. The apparatus of claim 17, further comprising: