CN116523473A

CN116523473A - Similar enterprise-based item matching method, device, equipment and medium

Info

Publication number: CN116523473A
Application number: CN202310778173.5A
Authority: CN
Inventors: 刘宪锋; 阳晓; 杨阿磊; 彭俊; 潘妮娜; 肖涛
Original assignee: Hunan Shiniu Network Technology Co ltd
Current assignee: Hunan Shiniu Network Technology Co ltd
Priority date: 2023-06-29
Filing date: 2023-06-29
Publication date: 2023-08-01
Anticipated expiration: 2043-06-29
Also published as: CN116523473B

Abstract

The method comprises the steps of firstly obtaining public internet data of a target enterprise, then obtaining various policy files and related public files of industries of the target enterprise on the internet, further analyzing all enterprises of the industries and forming an enterprise list, simultaneously obtaining reporting condition data met by reporting policy items of all enterprises in the enterprise list, carrying out data normalization processing on all the obtained original data to form low-dimensional feature vector data, mapping the low-dimensional feature vector data to a high-dimensional feature space based on a support vector machine by using a kernel function, directly calculating the inner product difference of the high-dimensional new data after centering and support vectors of the high-dimensional feature space in the high-dimensional feature space, determining the similar enterprises of the target enterprise, outputting matching item recommendation data of the target enterprise based on the policy items obtained by the similar enterprises, and greatly improving item matching accuracy.

Description

Similar enterprise-based item matching method, device, equipment and medium

Technical Field

The invention belongs to the technical field of big data processing, and relates to a project matching method, device, equipment and medium based on similar enterprises.

Background

With the development of internet technology, information on the internet grows exponentially, which undoubtedly increases the difficulty of obtaining accurate and useful information. In the field of enterprise recommendation, in some scenarios, one or more enterprises similar to the enterprise name needs to be acquired according to the enterprise name, which needs to make corresponding recommendation according to the enterprise related information on the internet. In the related art of the conventional enterprise recommendation, characteristic information of an enterprise is generally obtained according to public information (information such as the number of social security people, registration of industry and commerce, bidding, etc.) or manually entered enterprise information (financial statement, operation information, etc.) on the internet, and then the enterprise recommendation is performed based on a recommendation algorithm according to the characteristic information. The accurate matching of the policy and the enterprise is an important index for both the research direction of an economic manager and the strategic analysis of the enterprise. The current general policy-enterprise matching method is label matching, the policy interpretation content is subjected to labeling and mapping treatment, and enterprise portrait is formed by labeling the qualification condition of the enterprise and is matched through an algorithm.

Because the willingness of enterprises to fill in and deliver actual operation data is low, the enterprise information disclosed on the Internet is limited and the public enterprise information also enters and exits with the actual operation condition of the enterprises. The traditional label matching method still has the technical problem of insufficient item matching accuracy.

Disclosure of Invention

Aiming at the problems in the traditional method, the invention provides a similar enterprise-based item matching method, a similar enterprise-based item matching device, computer equipment and a computer readable storage medium, which can greatly improve the item matching accuracy.

In order to achieve the above object, the embodiment of the present invention adopts the following technical scheme:

in one aspect, a method for matching items based on similar enterprises is provided, including the steps of:

acquiring public internet data corresponding to the name of a target enterprise and storing the public internet data into a first dimension array;

acquiring various policy files and various related public files published on various official networks, analyzing various policy files and various related public files, obtaining public enterprise lists and reporting condition data of various policy items, and storing the public enterprise lists and the reporting condition data into a second dimension array;

carrying out data normalization processing on the original data in the first dimension array and the second dimension array to obtain normalized low-dimensional feature vectors, and mapping the normalized low-dimensional feature vectors into a high-dimensional feature space based on a support vector machine by using a kernel function;

in the high-dimensional feature space, carrying out centering processing on the high-dimensional new data corresponding to the low-dimensional feature vector, and calculating the inner product difference of the high-dimensional new data after centering and the support vector of the high-dimensional feature space;

Removing enterprises corresponding to the high-dimensional new data with negative internal accumulation difference, and determining the enterprises corresponding to the high-dimensional new data with positive internal accumulation difference as similar enterprises of the target enterprises;

collecting the obtained policy items of each similar enterprise and counting the times of obtaining the obtained policy items of each similar enterprise;

and calculating the recommendation index of each obtained policy item according to the times of each similar enterprise for each obtained policy item and the total number of each similar enterprise, and outputting the recommendation index as matched item recommendation data of the target enterprise.

In one embodiment, the public internet data includes industry affiliated with the business, years of establishment, practitioners, tax qualifications, registered funds, real-life capital, business nature, registered address, software copyright quantity, trademark quantity, certificate quantity, work copyright quantity, utility model patent quantity, utility model authorization quantity, design count, utility model publication quantity, type of marketing dimension, and bidding data.

In one embodiment, normalization methods employed in the data normalization process include min-max normalization, Z-score normalization, mean variance normalization, decimal scaling normalization, or Log function conversion.

In one embodiment, the kernel function is an RBF kernel function.

In one embodiment, the method further comprises the steps of:

and outputting policy project data of the matched target enterprise by utilizing a collaborative filtering algorithm according to similar enterprises.

In one embodiment, the method further comprises the steps of:

and respectively generating the recommendation documents matched with the similar enterprises according to the recommendation documents of the target enterprises.

In one embodiment, the method further comprises the steps of:

according to the set star-level distribution interval, distributing recommended star-level labels for each obtained policy item according to the sequence from high to low of the recommendation index of each obtained policy item in the matched item recommendation data;

and (5) recommending and outputting according to the recommendation star level of each obtained policy item from high to low.

On the other hand, still provide a project matching device based on similar enterprise, include:

the first acquisition module is used for acquiring public internet data corresponding to the name of the target enterprise and storing the public internet data into a first dimension array;

the second acquisition module is used for acquiring each policy file and each related public file published on each official network, analyzing each policy file and each related public file, obtaining a public enterprise list and reporting condition data of each policy item, and storing the public enterprise list and reporting condition data into a second dimension array;

The normalization mapping module is used for carrying out data normalization processing on the original data in the first dimension array and the second dimension array to obtain normalized low-dimensional feature vectors and mapping the normalized low-dimensional feature vectors into a high-dimensional feature space based on a support vector machine by using a kernel function;

the score calculation module is used for carrying out centering processing on the high-dimensional new data corresponding to the low-dimensional feature vector in the high-dimensional feature space and calculating the inner product difference of the high-dimensional new data after centering and the support vector of the high-dimensional feature space;

the enterprise determining module is used for eliminating enterprises corresponding to the high-dimensional new data with negative internal accumulation difference, and determining the enterprises corresponding to the high-dimensional new data with positive internal accumulation difference as similar enterprises of the target enterprises;

the project acquisition module is used for acquiring the obtained policy projects of each similar enterprise and counting the times of obtaining the obtained policy projects of each similar enterprise;

and the matching output module is used for calculating the recommendation index of each obtained policy item according to the times of obtaining each obtained policy item by each similar enterprise and the total number of each similar enterprise, and outputting the recommendation index of the obtained policy item as matching item recommendation data of the target enterprise.

In yet another aspect, a computer device is provided, including a memory storing a computer program and a processor implementing the steps of the similar enterprise-based item matching method described above when the computer program is executed by the processor.

In yet another aspect, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs the steps of the similar enterprise-based item matching method described above.

One of the above technical solutions has the following advantages and beneficial effects:

according to the item matching method, device, equipment and medium based on similar enterprises, through firstly acquiring the public internet data of the target enterprises, then acquiring various policy files and related public files of industries of the online target enterprises, further analyzing all enterprises of the industries and forming an enterprise list, simultaneously acquiring declaration condition data met by declaration policy items of all enterprises in the enterprise list, then carrying out data normalization processing on all the acquired original data to form low-dimensional feature vector data, mapping the low-dimensional feature vector data to a high-dimensional feature space based on a support vector machine by utilizing a kernel function, directly calculating the inner product difference of the support vector of the high-dimensional feature space and the high-dimensional feature space after centering, removing the enterprise corresponding to the high-dimensional new data with the inner product difference as negative, and outputting similar enterprise recommendation data of the ordered target enterprises, thereby realizing enterprise recommendation processing on the target enterprises. Finally, acquiring obtained policy-obtained items obtained by similar enterprises and calculating recommendation indexes of the obtained policy-obtained items based on the obtained policy-obtained items, so as to obtain matching item recommendation data of the target enterprises.

Compared with the traditional method, the technical scheme adds declaration conditions of the enterprise obtained through reverse deduction into the characteristic information of the enterprise to effectively enrich the characteristic information of the enterprise, and avoids directly calculating the distance from high-dimensional new data to the optimal hyperplane in the high-dimensional characteristic space in the data classification prediction process, but directly uses the support vector to simplify classification calculation, and recommends policy items which can be matched by a target enterprise based on similar enterprises, so that quick and accurate enterprise item matching recommendation is finally realized, and the aim of greatly improving the enterprise item matching accuracy is fulfilled.

Drawings

In order to more clearly illustrate the technical solutions of embodiments or conventional techniques of the present application, the drawings required for the descriptions of the embodiments or conventional techniques will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a flow diagram of a similar enterprise-based project matching method in one embodiment;

FIG. 2 is a schematic diagram of an application flow of a similar enterprise-based item matching method in one embodiment;

FIG. 3 is a flow chart of a method for matching items based on similar enterprises in another embodiment;

FIG. 4 is a flow chart of a similar enterprise-based project matching method in yet another embodiment;

FIG. 5 is a flow chart of a similar enterprise-based item matching method in yet another embodiment;

FIG. 6 is a schematic block diagram of a similar enterprise-based project matching apparatus in one embodiment.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be further described in detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the present application.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the description of the present application is for the purpose of describing particular embodiments only and is not intended to be limiting of the application.

It is noted that reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.

Those skilled in the art will appreciate that the embodiments described herein may be combined with other embodiments. The term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.

In the research design of the invention, the inventor researches and discovers that each organization officer network can publish a policy file supported by an enterprise declaration policy and a related public file of the policy, and can obtain a policy item obtained by the enterprise by analyzing the policy file and the related public file of the policy (the principle is that the policy file can analyze all related data of declaration conditions; by determining the required conditions of the policy items, the conditions of the enterprises obtaining the policy items can be deduced reversely. The conditions of the enterprise are added into the characteristic information of the enterprise, so that the characteristic information of the enterprise can be effectively enriched, and the accuracy of enterprise recommendation can be effectively improved. For example:

1. the enterprise A obtains the identification of the national enterprise technical center from the public documents related to the official network.

2. The policy reporting condition of the national center for enterprise technology is known: the research and development cost in the last year is more than or equal to 1500 ten thousand, and the number of staff is more than or equal to 300.

3. Thereby deriving the following: the annual research and development cost of the enterprise A is more than or equal to 1500 ten thousand, and the number of staff is more than or equal to 300.

4. The 3 rd characteristic information with more dimensionalities than the traditional method is obtained, so that enterprise portraits are enriched, and the accuracy of enterprise recommendation can be improved.

Embodiments of the present invention will be described in detail below with reference to the attached drawings in the drawings of the embodiments of the present invention.

Referring to fig. 1, in one embodiment, a similar enterprise-based item matching method is provided, which includes the following processing steps S11 to S17:

s11, acquiring public internet data corresponding to the name of the target enterprise and storing the public internet data into a first dimension array.

It will be appreciated that the target enterprise may be, but is not limited to, an enterprise that the user is currently interested in, wants to find or search for, or wants to know which policy items can be declared for analysis. The public internet data refers to public information related to the enterprise disclosed on the internet by the target enterprise, such as, but not limited to, enterprise business registration information, industry information, area information, enterprise insurer information, and establishment years, etc., and can be used to generate an enterprise portrait of the target enterprise.

Specifically, the enterprise recommendation device may, but not limited to, obtain public internet data corresponding to the name of the target enterprise through manners such as enterprise name search and collection, crawler crawling, user input or uploading, and then may store the obtained public internet data into a preconfigured first dimension array, so that a processing algorithm can be quickly invoked in subsequent processes such as data calculation, classification processing, and the like.

S12, acquiring each policy file and each related public file published on each official network, analyzing each policy file and each related public file, obtaining a public enterprise list and reporting condition data of each policy item, and storing the public enterprise list and reporting condition data into a second dimension array.

It will be appreciated that each policy document and each associated public document may be a public on-line, enterprise-reportable policy-supported policy notification and policy-associated public document for each department of each region. The policy items obtained by the enterprise in the policy formula file can be known by the enterprise recommendation device resolving the policy file and the policy-related public file. Moreover, the declaration condition of the policy item is judged by the enterprise policy equipment, and the actual condition of the enterprise obtaining the policy item can be deduced reversely. The actual conditions of the enterprise are added into the characteristic information of the enterprise, so that the characteristic information of the enterprise can be enriched more effectively, and the accuracy of enterprise recommendation is effectively guaranteed.

Specifically, the enterprise recommendation device may, but not limited to, obtain, by means of web search, crawling, user input or uploading, each policy file published on each web and each related public file, then parse the obtained files, and store the obtained enterprise list and reporting condition data of each corresponding policy item into a second dimension array configured in advance, so that a processing algorithm in the subsequent processes of data calculation, classification processing and the like can be quickly invoked.

S13, carrying out data normalization processing on the original data in the first dimension array and the second dimension array to obtain normalized low-dimensional feature vectors, and mapping the normalized low-dimensional feature vectors into a high-dimensional feature space based on a support vector machine by using a kernel function.

It will be appreciated that data normalization is a process of scaling data to a specific range in order to eliminate dimensional differences between different features so that the data can better accommodate the processing of various machine learning algorithms, and in this embodiment, by employing data normalization processing to form low-dimensional feature vectors for enterprises, each of which contains multi-dimensional features, each dimension of which represents a sample feature of the data, these low-dimensional feature vectors can be expressed as representations of the enterprises. In the field of big data, different data normalization modes are suitable for different data conditions, so that the proper existing data normalization mode can be selected to perform the data normalization processing according to the type and format of the original data, the data use requirement of a machine learning algorithm and the like, and the accuracy and stability of the machine learning algorithm can be improved.

For nonlinear classification or regression problems in machine learning algorithms such as support vector machines, kernel functions are commonly used function tools for mapping low-dimensional data into a high-dimensional feature space, so that the original linear inseparable problem is converted into a linear inseparable problem. In the present embodiment, a kernel function is also employed to realize the mapping process of the low-dimensional feature vector to the high-dimensional feature space.

S14, in the high-dimensional feature space, centering the high-dimensional new data corresponding to the low-dimensional feature vector, and calculating the inner product difference of the high-dimensional new data after centering and the support vector of the high-dimensional feature space.

It will be appreciated that after mapping the raw data to a new high-dimensional feature space by a kernel function, the data can be classified in the new high-dimensional feature space using a linear classifier (e.g., a support vector machine). In this way, even if the raw data is non-linearly separable in the low-dimensional space, a linear decision boundary can be found in the high-dimensional feature space, i.e., an optimal classification hyperplane is constructed, so that the distances from the hyperplane to the data points of different classes in the high-dimensional feature space are maximized.

Specifically, the general process of solving the optimal classification hyperplane in the high-dimensional feature space may be: calculating the average value of all samples in the feature space; subtracting the average value from each sample to obtain a centralized sample matrix; solving a covariance matrix of the sample matrix; solving eigenvectors and eigenvalues of the covariance matrix; selecting a feature vector with the largest feature value as a hyperplane normal vector; solving bias parameters according to a distance formula from the sample point to the hyperplane; obtaining a final expression of the optimal classification hyperplane; and classifying the input data sample by using the optimal classification hyperplane.

Then, the inventor finds that in a distance formula from a sample point to an optimal classification hyperplane in a high-dimensional feature space, the modulus of the hyperplane normal vector is difficult to calculate; the support vector is the data point closest to the optimal classification hyperplane, the distance between the data point and the optimal classification hyperplane is 1, and for any input sample point, the distance between the data point and the optimal classification hyperplane can be simplified into the inner product difference between the sample point and the support vector; therefore, the support vector can be directly used for simplifying the classification calculation process, and the distance from the high-dimensional new data to the optimal classification hyperplane is avoided being directly calculated. Thus, the process steps for classifying new data using a support vector machine are improved as follows: and centralizing the high-dimensional new data, calculating the inner product of the high-dimensional new data and the support vector, calculating the inner product difference, judging the positive and negative of the inner product difference, and classifying the high-dimensional new data according to the positive and negative.

And S15, removing enterprises corresponding to the high-dimensional new data with negative internal accumulation difference, and determining the enterprises corresponding to the high-dimensional new data with positive internal accumulation difference as similar enterprises of the target enterprises.

Specifically, the corresponding enterprises with the scores (i.e. the inner product differences) being negative numbers are rejected out of the queues of the similar enterprises, the positive samples (i.e. the corresponding enterprises with the inner product differences being positive numbers) can be selected to be ranked from high to low according to the values of the scores, and the similar enterprises of the target enterprises and the similarity of the similar enterprises can be obtained, and the obtained product can be used as the recommendation data of the matched projects to be output, so that the product is convenient for users to use.

S16, acquiring the obtained policy items of each similar enterprise and counting the times of obtaining the obtained policy items of each similar enterprise.

It will be appreciated that once obtained policy items refer to policy items that have been obtained by similar enterprises, and that once obtained policy items of similar enterprises can be automatically collected by a common crawler tool, and collected once obtained policy item data is stored in a configured item database for standby. After the obtained policy items of the similar enterprises are collected, the similar enterprises can be grouped according to the item types and counted to obtain the times of obtaining each obtained policy item respectively, namely the times of obtaining each obtained policy item by the similar enterprises are counted.

S17, calculating the recommendation index of each obtained policy item according to the times of each similar enterprise for each obtained policy item and the total number of each similar enterprise, and outputting the recommendation index as the matched item recommendation data of the target enterprise.

Specifically, after statistics of the number of times of obtaining the similar enterprises for each obtained policy item, the recommendation index of each obtained policy item can be calculated according to a recommendation index calculation formula, for example, the recommendation index formula of each obtained policy item can be:

recommendation index = number of acquisitions per acquired policy item/total number of similar businesses 。

Wherein, the total number of similar enterprises refers to the number of all similar enterprises of the target enterprises obtained in the previous step. And finally, outputting the calculated recommendation index of each obtained policy item as the matching item recommendation data of the target enterprise, so as to achieve the aim of recommending the matched policy item to the target enterprise. The output data format of the matching project recommendation data may be, but is not limited to, a data sheet, a similar business map, or other form of document, as long as it is convenient for the user to use.

According to the item matching method based on the similar enterprises, after the public internet data of the target enterprises are acquired, various policy files and related public files of industries to which the target enterprises belong on the internet are acquired, then all enterprises of the industries are analyzed and form an enterprise list, meanwhile, declaration condition data met by declaration policy items of all enterprises in the enterprise list are acquired, then all the acquired original data are subjected to data normalization processing to form low-dimensional feature vector data and are mapped to a high-dimensional feature space based on a support vector machine by using a kernel function, internal product differences of the high-dimensional new data after centering and support vectors of the high-dimensional feature space are directly calculated in the high-dimensional feature space, the enterprises corresponding to the high-dimensional new data with negative internal product differences are removed, and similar enterprise recommendation data of the ordered target enterprises are output, so that enterprise recommendation processing about the target enterprises is realized. Finally, acquiring obtained policy-obtained items obtained by similar enterprises and calculating recommendation indexes of the obtained policy-obtained items based on the obtained policy-obtained items, so as to obtain matching item recommendation data of the target enterprises.

Compared with the traditional method, the technical scheme adds declaration conditions of the enterprise obtained through reverse deduction into the characteristic information of the enterprise to effectively enrich the characteristic information of the enterprise, and avoids directly calculating the distance from high-dimensional new data to the optimal hyperplane in the high-dimensional characteristic space in the data classification prediction process, but directly uses the support vector to simplify classification calculation, and recommends policy items which can be matched by a target enterprise based on similar enterprises, so that quick and accurate enterprise item matching recommendation is finally realized, and the aim of greatly improving the enterprise item matching accuracy is fulfilled. Similarly, the above method may also be used to recommend target businesses to policy items.

In one embodiment, the public internet data includes business affiliated industries, established years, practitioners, tax payer qualifications, register funds, real-life capital, business properties, register addresses, software copyright quantity, trademark quantity, certificate quantity, work copyright quantity, utility model patent quantity, utility model authorization quantity, appearance design count, utility model publication quantity, marketing type dimension, and bidding data. Therefore, the low-dimensional feature vectors can also be respectively corresponding to the data of the enterprise in calendar years, for example, the low-dimensional feature vectors can comprise the number of practitioners, registered funds, real-life capital, registered addresses, the number of software copyright, the number of trademarks, the number of certificates, the number of work copyright, the number of utility model patents, the number of utility model authorization, the number of appearance design and the number of utility model publications, and the like, and the reporting condition data of the enterprise can be increased through the basic internet data, so that the enterprise portrait can be effectively enriched, and the accuracy of enterprise recommendation is further improved.

In one embodiment, the normalization methods employed in the data normalization process may include min-max normalization, Z-score normalization, mean variance normalization, fractional scaling normalization, or Log function conversion.

Specifically, different normalization modes are suitable for different data conditions, and the accuracy and stability of the machine learning algorithm can be improved by selecting a proper normalization mode. Wherein, min-Max normalization (Min-Max Scaling): also known as dispersion normalization, is used to linearly map the raw data into intervals of 0, 1. Z-score normalization: for normalizing the data by its mean and standard deviation such that the data meets a standard normal distribution. Decimal scaling normalization: for scaling the data by moving the position of the decimal point such that the absolute value of the data is less than 1.Log function conversion: for the characteristics of larger data value and larger variation range, the data can be subjected to logarithmic transformation, so that the data more accords with normal distribution. Mean variance normalization (Standard Scaling): the data is subtracted from the mean and then divided by the standard deviation so that the data meets the standard normal distribution.

Taking the min-max normalization as an example: taking out a maximum value max and a minimum value min from the daily feature data in the two arrays; for each feature data xNormalization is performed using the following formula:

；

after all the features are normalized, a new normalized array is obtained for subsequent processing.

For another example, mean variance normalization, the normalization calculation formula is as follows:

；

wherein,,xthe original data is represented by a representation of the original data,mean value of raw data ∈>Representing the standard deviation of the original data,zrepresenting the converted data. Concrete embodimentsThe normalization process of (2) can be divided into the following steps: calculating the mean value and standard deviation of the original data; normalizing each original data, including subtracting the average value from the original value to obtain a difference value; dividing the difference by the standard deviation to obtain a normalized value; after the standardized values of all the data are obtained, the method can be used for carrying out subsequent classification, clustering, regression analysis and other treatments. Taking two dimensions of practitioner and registered funds as examples, taking data of six enterprises in years, and converting the data into the following enterprise feature matrix [ [100 50 80 200 90 150 ]], [150 80 120 400 70 200]]And carrying out normalization processing by the algorithm.

The specific application implementation of other normalization methods can be understood by referring to the processing flow of the normalization method. By adopting the data normalization processing method, the normalization processing with the original data can be efficiently and accurately completed.

In one embodiment, the kernel function is an RBF kernel function. It is understood that the RBF (Radial Basis Function) kernel function is a commonly used kernel function for supporting nonlinear classification or regression problems in machine learning algorithms such as vector machines (SVMs). The definition of the RBF kernel function is as follows: for two samples in the input spaceAnd->The output value of the RBF kernel function is:

；

wherein,,is the bandwidth parameter (bandwidth) of the RBF kernel function,/and/or>Representation vector->And->The euclidean distance between them,μmean value of raw data ∈>Representing the standard deviation of the original data,zrepresenting the converted data. The RBF kernel function is characterized in that the original input space can be mapped to a high-dimensional characteristic space, so that the original linear inseparable problem is changed into the linear inseparable problem. Meanwhile, because of the self-similarity of the RBF kernel function (namely, each sample point has the maximum similarity with the RBF kernel function), the RBF kernel function has a better effect on the problem of processing space-time data and the like with self-similarity property.

When using the RBF kernel function, the bandwidth parameter needs to be adjustedIs typically selected using a cross-validation method or the like. If->If the distance between the kernel function and the object is larger, the change of the output value of the kernel function along with the distance is smoother, and the kernel function has weaker localization characteristics; if- >And if the output value of the kernel function is smaller, the output value of the kernel function is more sensitive to the change of the distance, and the localization characteristic is stronger. Therefore, the flexible selection can be performed according to actual use needs.

Specifically, an example of spatial mapping with an RBF kernel function is as follows: in this example, there is a two-dimensional sample dataset X and its corresponding class label Y. For data set X in this example, a kernel matrix between every two samples is calculated. First, a suitable one needs to be selectedThe value can be tried differently +.>Values, e.g./>. Then, an RBF kernel matrix is calculated, and the obtained RBF kernel matrix is as follows:

[[1.0.60653066 0.60653066 0.00033546 0.00012341]

[0.60653066 1. 0.13533528 0.00183156 0.00067067]

[0.60653066 0.13533528 1. 0.00091188 0.00033546]

[0.00033546 0.00183156 0.00091188 1. 0.60653066]

[0.00012341 0.00067067 0.00033546 0.60653066 1. ]]

and the mapping processing of all data is completed by analogy, namely the original data can be rapidly mapped into a new high-dimensional feature space through the RBF kernel function.

Further, after the mapping of the low-dimensional feature vector into the high-dimensional feature space based on the support vector machine is completed, the optimal classification hyperplane can be solved, for example:

calculating the average value of all samples in the high-dimensional feature space and recording the average value asThe method comprises the steps of carrying out a first treatment on the surface of the Taking the above RBF core matrix as an example (same applies in this embodiment), then +.>。

Subtracting for each sampleObtaining a centralized sample matrix X;

X = [[0.69697399 0.30182766 0.30182766 -0.29990674 -0.28989119]

[0.30182766 0.69697399 0.05683738 0.116866 0.42786001]

[0.30182766 0.05683738 0.69697399 -0.28801512 -0.28989119]

[-0.29990674 0.116866 -0.28801512 0.69697399 0.30182766]

[-0.28989119 0.42786001 -0.28989119 0.30182766 0.69697399]]

solving a covariance matrix XX' of the sample matrix X;

XX' = [[0.48358984 0.17124736 0.17124736 0. 0.]

[0.17124736 0.48358984 0.02399147 0.04655467 0.18360223]

[0.17124736 0.02399147 0.48358984 0.0.]

[0. 0.04655467 0. 0.48358984 0.17124736]

[0. 0.18360223 0. 0.17124736 0.48358984]]

The eigenvectors and eigenvalues of covariance matrix XX' are found, noting the sign of the eigenvectors.

Feature vector:

[-0.40824829 -0.40824829 0.81649658 0. 0.]

[-0.40824829 0.81649658 0.40824829 0. 0.]

[-0.40824829 0.40824829 -0.81649658 0. 0.]

[ 0. 0. 0. 0. 1.]

characteristic value:

[2.41597182 1.41597182 1.41597182 0.48358984 0.48358984]

selecting the feature vector with the largest feature value as the hyperplane normal vectorw。

w= [-0.40824829 -0.40824829 0.81649658]

From sample pointsxFormula of distance to hyperplaneIs provided with->Find bias parameters +.>. Such asb= -1.2267573. Wherein (1)>Representing vectorswIs->Norms.

Finally, the optimal classification hyperplane expression obtained is:

；

wherein,,x ₁ 、x ₂ andx ₃ representing three sample points on the optimal classification hyperplane, respectively.

After the optimal classification hyperplane is obtained, it can be used to classify new data, specifically for example:

1. for new data samplesxCentering and subtracting the average valueμObtaining a centralized samplex'。

2. Calculation ofx' hyperplane normal vectorwIs added with the bias parameterbObtainingx' the equation for the hyperplane distance to optimal classification:。

3. if it isThe new data sample is located in the positive direction of the hyperplane and classified as a positive sample; if->The new data sample is in the hyperplane negative direction and classified as a negative sample.

4. Because in the high-dimensional feature space, the hyperplane normal vector in the hyperplane distance formulawIs a mold of (2)Difficult to calculate, support vectors can be used to simplify the calculation. Support vectors are the data points closest to the hyperplane, and their distance to the hyperplane is 1. Then for any new data x' its distance to the hyperplanedCan be expressed as:

；

wherein,,is a support vector.

5. Therefore, direct calculationAnd->If the difference is that of>0, then the new data is a positive sample, otherwise it is a negative sample.

6. Finally, due towIs a hyperplane normal vector, thenThe distance from the support vector to the hyperplane is calculated according to the hyperplane equation.

Finally, the step of classifying new data using a support vector machine may become: 1) Centralizing new data; 2) Calculating the inner product of the new data and the support vector; 3) Calculating an inner accumulation difference and judging the positive and negative of the difference; 4) And classifying the new data according to the positive and negative. This avoids directly calculating the distance of new data to the hyperplane, simplifying the classification calculation process. As shown in fig. 2, a flow chart of an implementation of the method for matching items of similar enterprises is shown, where the similarity threshold may be selected to be 0.

In another embodiment, the kernel function may also be selected as a linear kernel function, a polynomial kernel function, a Sigmoid kernel function, or a Laplace kernel function. It will be appreciated that in addition to RBF kernel functions, there may be other kernel functions that may be used to support nonlinear classification or regression problems in machine learning algorithms such as vector machines, some of which are more suited for a particular dataset or task than RBF kernel functions, such as the following commonly used kernel functions:

Linear Kernel function (Linear Kernel): and carrying out inner product operation on two vectors in the input space, and being applicable to the situation that data is linearly separable or approximately linearly separable.

Polynomial kernel function (Polynomial Kernel): and performing inner product operation after polynomial expansion on two vectors in the input space, and being suitable for the condition that the data has certain nonlinear characteristics.

Sigmoid Kernel function (Sigmoid Kernel): and carrying out Sigmoid function transformation on two vectors in the input space, and then carrying out inner product operation, so that the method is suitable for the characteristic that symmetric distribution exists in a data set.

Laplace Kernel function (Laplace Kernel): the Euclidean distance between two vectors in the input space is substituted into the Laplace function to perform inner product operation, and the method is suitable for the condition that noise or abnormal values exist in the data set.

In addition to the above several kernel functions, some other kernel functions may be used, such as ANOVA kernel functions and Bessel kernel functions. In practical application, the most suitable kernel function can be selected according to the characteristics of the data set and the requirements of the task, so that a better processing effect is achieved.

In one embodiment, as shown in fig. 3, the above method may further include the following processing step S18:

And S18, outputting policy project data of the matched target enterprises by utilizing a collaborative filtering algorithm according to similar enterprises.

It can be understood that after obtaining similar enterprise recommendation data of the target enterprise, the enterprise recommendation device can also be instructed to predict policy items possibly meeting the conditions of the target enterprise according to the similar enterprise recommendation data by utilizing a collaborative filtering algorithm, filter out the items declared by the target enterprise, obtain policy items possibly meeting the conditions and not declared by the target enterprise, and output the policy items as policy items matched with the target enterprise, namely, directly recommend policy items which are also suitable for declared by the target enterprise to a user, thereby further improving the application range of enterprise recommendation.

In particular, according to the similar enterprise recommendation data (which may be given in the form of a list of similar enterprises), the obtained policy item cases of all similar enterprises may be represented by using a matrix, where a row represents a similar enterprise, a column represents a policy item, and each element of the matrix represents whether a binary variable of the item is obtained (0 represents that it is not obtained, and 1 represents that it is obtained).

Project prediction: and predicting the policy items possibly meeting the conditions of the target enterprise according to the statistical data of the policy items obtained by the similar enterprises. The method specifically comprises the steps of finding front K (flexibly valued according to actual needs) similar enterprises with highest similarity with a target enterprise, calculating weighted average similarity between the front K similar enterprises and the target enterprise, and weighting and averaging statistical data of policy items corresponding to the similar enterprises according to the size of the weighted average similarity. Finally, filtering out items that have been declared by the target enterprise, outputting predicted eligible unreported items, which may be implemented using existing collaborative filtering algorithms, such as:

Assuming that there are 3 similar enterprises B, C and D, they have obtained policy item data as follows:

the similarity between the target enterprise and the similar enterprise is respectively 0.9, 0.6 and 0.8, and the prediction data of the target enterprise are obtained by weighted average of the similarity:

since the target enterprise has obtained items E2 and E3, these items are filtered out, and finally the predicted eligible unreported items are output as E1 and E4.

In one embodiment, as shown in fig. 4, the above method may further include the following processing step S19:

s19, according to the recommendation documents of the target enterprises, respectively generating recommendation documents matched with the similar enterprises.

It can be understood that after the similar enterprise recommendation data of the target enterprise is obtained, the enterprise recommendation device can be further instructed to generate each recommendation document corresponding to each similar enterprise according to the recommendation document which is pre-manufactured for the target enterprise, for example, the targeted marketing popularization document, so that the recommendation document which can be directly applied to the similar enterprises is obtained, the application range of enterprise recommendation is further improved, and the execution efficiency and accuracy of the automatic marketing task are improved.

In addition, based on the obtained similar enterprise recommendation data (list) of the target enterprise, intelligent marketing tasks can be developed, for example, personalized marketing plans are generated according to the unreported item list of the similar enterprise, so that the similar enterprise is encouraged to declare the items and the declaration success rate is improved. For example, marketing plan mail or text messages may be sent to similar businesses, project-related marketing analysis intelligence data and planning advice may be provided, reporting success case data and experience information may be introduced to similar businesses, or sponsors may provide some preferential measures to similar businesses to entice them to report these projects. For example, the related data of the similar enterprises can be deeply analyzed by using the existing intelligent marketing platform so as to know the client requirements and consumption habits of the similar enterprises, and marketing strategy adjustment suggestions of the similar enterprises are output according to analysis results.

In one embodiment, as shown in fig. 5, the above method may further include the following processing steps S20 and S21:

s20, according to the set star-level distribution interval, distributing recommended star-level labels to each obtained policy item according to the sequence from high to low of the recommendation index of each obtained policy item in the matched item recommendation data;

s21, recommendation output is carried out according to the recommendation star level of each obtained policy item from high to low.

It will be appreciated that in the above embodiment, in order to more intuitively display the recommendation degree of each obtained policy item, each corresponding obtained policy item may be respectively assigned a recommendation value tag, such as, but not limited to, a recommendation percentage value, a recommendation ten value, or a recommendation level, in order of the recommendation index from high to low. In this embodiment, a "five-star" recommendation ranking method is used to display the recommendation level of each obtained policy item.

Specifically, the recommendation indexes of all the policy items meeting the requirements (i.e. the obtained policy items) can be divided into 5 level intervals according to the order of the indexes from high to low, for example, the recommendation indexes are respectively 45.1, 52.5, 60.8, 79.0, 82.3 and 98.6, then the indexes of 45.1 and 52.5 are less than 55, and can be divided into a star grade, the indexes of 60.8 are divided into 56 to 70 levels, the indexes of 60.8 are divided into two star grades, the indexes of 79.0 are divided into 71 to 80 levels, the indexes of 82.0 are divided into three star grades, the indexes of 82.3 are divided into 81 to 90 levels, the indexes of 98.6 are divided into four star grades, so that the recommended star grade labels which the recommendation indexes belong to can be distributed according to the level intervals of the recommendation indexes, and finally the recommended display effect of matching items is better.

It should be understood that, although the steps in the flowcharts of fig. 1 to 5 are shown in order as indicated by the arrows, these steps are not necessarily performed in order as indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Furthermore, at least a portion of the steps of fig. 1-5 may include multiple sub-steps or stages that are not necessarily performed at the same time, but may be performed at different times, nor does the order in which the sub-steps or stages are performed necessarily occur sequentially, but may be performed alternately or alternately with at least a portion of the sub-steps or stages of other steps or other steps.

Referring to fig. 6, in one embodiment, a similar enterprise-based item matching apparatus 100 is provided, including a first obtaining module 11, a second obtaining module 12, a normalization mapping module 13, a score calculating module 14, an enterprise determining module 15, an item collecting module 16, and a matching output module 17. The first obtaining module 11 is configured to obtain public internet data corresponding to a name of a target enterprise, and store the public internet data in the first dimension array. The second obtaining module 12 is configured to obtain each policy file and each related public file published on each official network, parse each policy file and each related public file, obtain a public enterprise list and reporting condition data of each policy item, and store the public enterprise list and the reporting condition data in the second dimension array. The normalization mapping module 13 is configured to perform data normalization processing on the raw data in the first dimension array and the second dimension array, obtain a normalized low-dimensional feature vector, and map the normalized low-dimensional feature vector to a high-dimensional feature space based on a support vector machine by using a kernel function. The score calculating module 14 is configured to perform a centering process on the high-dimensional new data corresponding to the low-dimensional feature vector in the high-dimensional feature space, and calculate an inner product difference between the centered high-dimensional new data and the support vector of the high-dimensional feature space. The enterprise determining module 15 is configured to reject the enterprise corresponding to the high-dimensional new data with the negative internal accumulation difference, and determine the enterprise corresponding to the high-dimensional new data with the positive internal accumulation difference as a similar enterprise of the target enterprise. The project collection module 16 is used for collecting the obtained policy projects of each similar enterprise and counting the times of obtaining each obtained policy project by each similar enterprise. The matching output module 16 is configured to calculate a recommendation index of each obtained policy item according to the number of times each obtained policy item is obtained by each similar enterprise and the total number of similar enterprises, and output matching item recommendation data of the target enterprise.

According to the item matching device 100 based on similar enterprises, after the public internet data of the target enterprise are acquired, various policy files and related public files of industries to which the target enterprise belongs are acquired, all enterprises of the industries are analyzed and form an enterprise list, reporting condition data met by reporting policy items of all enterprises in the enterprise list are acquired, all the acquired original data are subjected to data normalization processing to form low-dimensional feature vector data and mapped to a high-dimensional feature space based on a support vector machine by using a kernel function, the inner product difference of the high-dimensional new data after centering and the support vector of the high-dimensional feature space is directly calculated in the high-dimensional feature space, the enterprise corresponding to the high-dimensional new data with the inner product difference being negative is removed, and the similar enterprise recommended data of the ordered target enterprise is output, so that the enterprise recommended processing about the target enterprise is realized. Finally, acquiring obtained policy-obtained items obtained by similar enterprises and calculating recommendation indexes of the obtained policy-obtained items based on the obtained policy-obtained items, so as to obtain matching item recommendation data of the target enterprises.

In one embodiment, the public internet data includes business affiliated industries, established years, practitioners, tax payer qualifications, register funds, real-life capital, business properties, register addresses, software copyright quantity, trademark quantity, certificate quantity, work copyright quantity, utility model patent quantity, utility model authorization quantity, appearance design count, utility model publication quantity, marketing type dimension, and bidding data.

In one embodiment, the kernel function is an RBF kernel function.

In one embodiment, the similar enterprise-based item matching apparatus 100 may further include an item matching module for outputting policy item data of the matching target enterprise according to the similar enterprise using a collaborative filtering algorithm.

In one embodiment, the project matching device 100 based on similar enterprises further includes a recommendation generation module, configured to generate recommendation documents matching each similar enterprise according to the recommendation documents of the target enterprise.

In one embodiment, the item matching module may be further configured to allocate a recommended star label to each obtained policy item according to a set star allocation interval and in an order from high to low of a recommendation index of each obtained policy item in the matching item recommendation data; and (5) recommending and outputting according to the recommendation star level of each obtained policy item from high to low.

For specific limitations of the similar-enterprise-based item matching apparatus 100, reference may be made to the corresponding limitations of the similar-enterprise-based item matching method hereinabove, and no further description is given here. The various modules in the similar enterprise-based item matching apparatus 100 described above may be implemented in whole or in part in software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a device with a data processing function, or may be stored in a memory of the device in software, so that the processor may call and execute operations corresponding to the above modules, where the device may be, but is not limited to, various data computing and processing devices existing in the art.

In one embodiment, there is also provided a computer device including a memory and a processor, the memory storing a computer program, the processor implementing the following processing steps when executing the computer program: acquiring public internet data corresponding to the name of a target enterprise and storing the public internet data into a first dimension array; acquiring various policy files and various related public files published on various official networks, analyzing various policy files and various related public files, obtaining public enterprise lists and reporting condition data of various policy items, and storing the public enterprise lists and the reporting condition data into a second dimension array; carrying out data normalization processing on the original data in the first dimension array and the second dimension array to obtain normalized low-dimensional feature vectors, and mapping the normalized low-dimensional feature vectors into a high-dimensional feature space based on a support vector machine by using a kernel function; in the high-dimensional feature space, carrying out centering processing on the high-dimensional new data corresponding to the low-dimensional feature vector, and calculating the inner product difference of the high-dimensional new data after centering and the support vector of the high-dimensional feature space; removing enterprises corresponding to the high-dimensional new data with negative internal accumulation difference, and determining the enterprises corresponding to the high-dimensional new data with positive internal accumulation difference as similar enterprises of the target enterprises; collecting the obtained policy items of each similar enterprise and counting the times of obtaining the obtained policy items of each similar enterprise; and calculating the recommendation index of each obtained policy item according to the times of each similar enterprise for each obtained policy item and the total number of each similar enterprise, and outputting the recommendation index as matched item recommendation data of the target enterprise.

It will be appreciated that the above-mentioned computer device may include other software and hardware components not listed in the specification besides the above-mentioned memory and processor, and may be specifically determined according to the model of the specific computer device in different application scenarios, and the detailed description will not be listed in any way.

In one embodiment, the processor, when executing the computer program, may further implement the steps or sub-steps added to the embodiments of the similar enterprise-based item matching method described above.

In one embodiment, there is also provided a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the following processing steps: acquiring public internet data corresponding to the name of a target enterprise and storing the public internet data into a first dimension array; acquiring various policy files and various related public files published on various official networks, analyzing various policy files and various related public files, obtaining public enterprise lists and reporting condition data of various policy items, and storing the public enterprise lists and the reporting condition data into a second dimension array; carrying out data normalization processing on the original data in the first dimension array and the second dimension array to obtain normalized low-dimensional feature vectors, and mapping the normalized low-dimensional feature vectors into a high-dimensional feature space based on a support vector machine by using a kernel function; in the high-dimensional feature space, carrying out centering processing on the high-dimensional new data corresponding to the low-dimensional feature vector, and calculating the inner product difference of the high-dimensional new data after centering and the support vector of the high-dimensional feature space; removing enterprises corresponding to the high-dimensional new data with negative internal accumulation difference, and determining the enterprises corresponding to the high-dimensional new data with positive internal accumulation difference as similar enterprises of the target enterprises; collecting the obtained policy items of each similar enterprise and counting the times of obtaining the obtained policy items of each similar enterprise; and calculating the recommendation index of each obtained policy item according to the times of each similar enterprise for each obtained policy item and the total number of each similar enterprise, and outputting the recommendation index as matched item recommendation data of the target enterprise.

In one embodiment, the computer program, when executed by the processor, may further implement the steps or sub-steps added to the embodiments of the similar enterprise-based item matching method described above.

Those skilled in the art will appreciate that implementing all or part of the above-described methods may be accomplished by way of a computer program, which may be stored on a non-transitory computer readable storage medium and which, when executed, may comprise the steps of the above-described embodiments of the methods. Any reference to memory, storage, database, or other medium used in the various embodiments provided herein may include non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus dynamic random access memory (Rambus DRAM, RDRAM for short), and interface dynamic random access memory (DRDRAM), among others.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples represent only a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, and are intended to be within the scope of the present application. The scope of the patent is therefore intended to be covered by the appended claims.

Claims

1. The project matching method based on the similar enterprises is characterized by comprising the following steps:

acquiring each policy file and each related public file published on each official network, analyzing each policy file and each related public file, obtaining a public enterprise list and reporting condition data of each policy item, and storing the public enterprise list and reporting condition data into a second dimension array;

removing enterprises corresponding to the high-dimensional new data with the negative internal accumulation difference, and determining the enterprises corresponding to the high-dimensional new data with the positive internal accumulation difference as similar enterprises of the target enterprises;

collecting obtained policy items of each similar enterprise and counting the times of obtaining the obtained policy items of each similar enterprise;

and calculating the recommendation index of each obtained policy item according to the times of obtaining each obtained policy item by each similar enterprise and the total number of each similar enterprise, and outputting the recommendation index as the matched item recommendation data of the target enterprise.

2. The method of claim 1, wherein the public internet data includes business affiliated business, established years, practitioners, tax payer qualifications, registered funds, real-life capital, business nature, registered addresses, software copyright quantity, trademark quantity, certificate quantity, work copyright quantity, utility model patent quantity, utility model authorization quantity, appearance design count, utility model publication number, marketing type dimension, and bidding data.

3. The method of claim 1 or 2, wherein the normalization method used in the data normalization process includes min-max normalization, Z-score normalization, mean variance normalization, decimal normalization, or Log function conversion.

4. The similar enterprise-based item matching method of claim 3, wherein the kernel function is an RBF kernel function.

5. The similar enterprise-based item matching method of claim 3, further comprising the steps of:

and outputting policy project data matched with the target enterprise by utilizing a collaborative filtering algorithm according to the similar enterprises.

6. The similar enterprise-based item matching method of claim 3, further comprising the steps of:

7. The similar enterprise-based item matching method of claim 1, further comprising the steps of:

according to the set star-level distribution interval, distributing recommended star-level labels to each obtained policy item according to the sequence from high to low of the recommendation index of each obtained policy item in the matched item recommendation data;

And performing recommendation output according to the recommendation star level of each obtained policy item from high to low.

8. An item matching device based on similar enterprises, comprising:

the enterprise determining module is used for eliminating the enterprise corresponding to the high-dimensional new data with the negative internal accumulation difference, and determining the enterprise corresponding to the high-dimensional new data with the positive internal accumulation difference as a similar enterprise of the target enterprise;

The project acquisition module is used for acquiring the acquired policy projects of each similar enterprise and counting the times of acquiring each acquired policy project by each similar enterprise;

and the matching output module is used for calculating the recommendation index of each obtained policy item according to the times of obtaining each obtained policy item by each similar enterprise and the total number of the similar enterprises, and outputting the recommendation data of the matching item of the target enterprise.

9. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the similar enterprise-based item matching method of any one of claims 1 to 7 when the computer program is executed.

10. A computer readable storage medium having stored thereon a computer program, wherein the computer program when executed by a processor implements the steps of the similar enterprise-based item matching method of any of claims 1 to 7.