CN109801094A

CN109801094A - The method and system of prediction model are recommended in a kind of business analysis management

Info

Publication number: CN109801094A
Application number: CN201811495254.XA
Authority: CN
Inventors: 王涵; 孔晶; 闫骏; 龚雪沅
Original assignee: Zhuhai Zhongke Advanced Technology Research Institute Co Ltd
Current assignee: Zhuhai Zhongke Advanced Technology Research Institute Co Ltd
Priority date: 2018-12-07
Filing date: 2018-12-07
Publication date: 2019-05-24
Anticipated expiration: 2038-12-07
Also published as: CN109801094B

Abstract

The present invention relates to the method and system that prediction model is recommended in a kind of business analysis management, for realizing: corresponding data are acquired according to the decision-making management person of commercial management；Use error function equilibrium data deviation；Discrete raw data set is created, parallel computation raw data set obtains dimension coefficient output result；The result of output is integrated, as measurement index；The recommendation regulations of refinement are shown；It is predicted, export result and is visualized using ARIMA machine learning time series algorithm using multiple dimension coefficients of the multithreads computing to initial data.The invention has the benefit that customizable, the subjective factor of coordinated balance goods themselves and comment customer, and the environmental impact factor being related to；Trend prediction is provided for manager's operation and measure result of implementation is recommended to examine；By entire analysis system integration；Realize making full use of for cpu resource, and the promotion to mass data arithmetic speed.

Description

The method and system of prediction model are recommended in a kind of business analysis management

Technical field

The present invention relates to the method and system that prediction model is recommended in a kind of business analysis management, belong to computer field.

Background technique

As when production is expanded in brand, commodity and service arrives a not only content, product information gradually shows diversification, with Going deep into for the various line upper mounting plates such as Social Media website, on-line off-line data become the key data source of businessman's decision, example Such as, for hotel and holiday village management industry, these information are possible to include customer to room, service, communal facility, dining room, purchase The satisfaction of many indexs such as object；For retail trade, these information are possible to each comprising drinks, daily necessities, dilated food etc. Type commodity, it might even be possible to be refined to all kinds of commodity of different particular brands；For logistic industry, these information are likely to be same Transportation demand of a kind of kinds of goods in the same period for different regions.How integral data is effectively utilized, excavates useful letter Breath, it is the main object of the present invention that the decision for different businessmans and the reasonable science of manager's customization, which is recommended, at any time.For example, for Hotel industry, recommendation can be that, for the macroeconomic income for improving hotel, the aspect which should most pay close attention to is room matter Amount, followed by serves food, to improve public praise of the customer to room quality, should most be concerned with attitude and hotel is whole Publicity, or be purchase Pepsi Cola to improve inland of China tourist to the satisfaction of food；For commodity retail industry, quotient Chalk and sheet package sale are helped to improve the income level of businessman's entirety by family；For logistic industry, for fragile article Packing and transporting route can compare save the cost to Guangzhou again from Jiangxi to Fujian.

Moreover, due to the diversity of data, the flood tide of data, single machine computing machine learning model speed is excessively slow, And cpu access blocking is be easy to cause to waste with cpu resource, therefore, in face of the data source of magnanimity complexity, multi-core CPU is made full use of, Carry out the calculating of multicore thread parallel.

Instantly most recommender system carries out reasonable commodity and service recommendation both for client, and for recommender system Apply the commodity decision in manager actually rare.Traditional businessman's correlation analysis system can only provide evidence support, for The biggish database analysis speed of data volume is slower, and can not be truly realized intelligent recommendation and prediction.And engineering is only used only It practises, and will be unable to accomplish to provide manager numerically reasonable recommendation cause, and probably having ignored may association Item combination, so that decision recommendation more effectively can not be carried out comprehensively.Moreover, traditional recommender system or business analysis mould Type ignores the subjective factor that there may be, for example, this crowd of client is optimism group, it is generally inclined to the evaluation of the commodity It is high.Therefore, provided business analysis result accuracy is poor.

Summary of the invention

The present invention provides the method and system that prediction model is recommended in a kind of business analysis management, by using error balance Function pre-processes initial data, then writes the survey of the Ariori correlation based on multithreads computing using Python Quantity algorithm will carry out phase between the vector of same event different latitude in conjunction with the correlation rule of linear two vector of algebraic manipulation The analysis of closing property.Meanwhile it being based on multithreads computing, the present invention trains raw data set using Xgboost machine learning algorithm, And it obtains the prediction result thread parallel calculating present invention and establishes ARIMA settling time sequential forecasting models, and initial data is made To be input in machine learning model, selected part data are trained and test respectively, to quickly and effectively be manager Prediction in real time is provided.In addition, it is contemplated that the data inputted in correlation calculations process there may be subjective factor or environment because Error caused by element in dependency analysis process of the present invention, introduces an error function, coordinated balance goods themselves with comment By the subjective factor of customer, and the environmental impact factor being related to.

Technical solution of the present invention includes a kind of method of business analysis management recommendation prediction model, which is characterized in that institute Method is stated the following steps are included: S100, acquires corresponding data according to the decision-making management person of commercial management；S200, to acquisition number According to being pre-processed, further, error function equilibrium data deviation is used；S300, creates discrete raw data set, and use is multi-thread Journey is based on Ariori algorithm and the multiple raw data sets of Xgboost machine learning algorithm parallel computation, obtains multiple dimension coefficients Export result；S400 integrates the result of step S300 output, using integrated results as measurement index to recommendation item Example is refined；The step S400 recommendation regulations refined are shown, further, by Xgboost machine learning by S500 The obtained coefficient of accuracy of algorithm is as validity measurement standard；S600, using multithreads computing to initial data Multiple dimension coefficients are predicted using ARIMA machine learning time series algorithm, are exported result and are visualized.

The method for recommending prediction model according to the business analysis management, wherein acquires corresponding data in step S100 Specifically include: data are crawled or are read out from specified data library, wherein the data obtained include but is not limited to meet the requirements Relevant formula data.

Recommend the method for prediction model according to the business analysis management, wherein step S200 is specifically included: S201, right Missing values in acquisition data are substituted using average value, and delete repetition values；S202, using error function γ [i, x]=μ+ b_x+b_i+q_i*p_xData deviation caused by objective factor and environmental factor is balanced, wherein μ is market mean value, b_xIt is brought for user Subjective bias, b_iFor deviation brought by commodity, q_i*p_xFor interactive relation deviation.

The method for recommending prediction model according to the business analysis management, wherein the step S300 is specifically included: Acquisition data are carried out discretization, obtain multiple raw data sets by S301；Multiple raw data sets are used multithreading by S302 Parallel computation support coefficient and reliability coefficient filter out the data for meeting support and believability threshold；S303 is based on Ariori algorithm computing rule carries out the related coefficient between dimension to the data set that the step S302 is exported and calculates, right Related coefficient carries out the number average correlation coefficient of increment processing and all phase relation combinations of relevant calculation, and wherein related coefficient combines For the combination of at least two related coefficients；S304, using Xgboost gradient boosting algorithm to each dimension of average correlation coefficient It is handled, obtains the significance index between dimension；S305, the support coefficient that the step S301~S304 is obtained, Relative coefficient and important coefficient are integrated into relationship measurement index.

Recommend the method for prediction model according to the business analysis management, wherein step S302 is specifically included: the first son Data set and composition of relations are generated, and is sent to the second sub thread；Second sub thread calculates between all dimensions of text file Support correlation and confidence level correlation, and it is sent to third sub thread；Third sub thread is according to preset corresponding support The threshold value of correlation and confidence level filters out the combination greater than the threshold value, and is exported by father's thread.

Recommend the method for prediction model according to the business analysis management, wherein the related coefficient of step S302 is specific Are as follows:

Wherein X, Y are different item collections.

Recommend the method for prediction model according to the business analysis management, wherein step S305 is specifically included: using R (i, x)=[S (i, x)-ν]+[r (i, x)-ν]+[I (i, x) -1] calculated relationship measurement index, by confidence level target it is discrete be 0 He 1；WhenWhen, C (i, x)=1；WhenWhen, C (i, x)=0；Wherein S (i, x) be support coefficient, r (i, It x) is relative coefficient, I (i, x) important coefficient, R (i, x) is relationship measurement index, and C (i, x) is reliability coefficient.

Recommend the method for prediction model according to the business analysis management, wherein step S600 is specifically included: using son Initial data is carried out first derivation by thread in each dimension, it is visualized with line chart, if obtained image is more Gently, then without otherwise carrying out the derivation of higher order in derivation；Function auto-correlation function after calculating derivation, partial autocorrelation function And visualize, it obtains determining parameter；Each dimension data collection of initial data is input in ARIMA time series, to each Time series forecasting in dimension.

Technical solution of the present invention further includes a kind of pre- for realizing the business analysis management recommendation of above-mentioned any power method The system for surveying model, which includes: data acquisition module, for acquiring corresponding number according to the decision-making management person of commercial management According to；Data preprocessing module further, uses error function equilibrium data deviation for pre-processing to acquisition data；Phase Closing property computing module is based on Ariori algorithm and Xgboost machine learning using multithreading for creating discrete raw data set The multiple raw data sets of algorithm parallel computation obtain multiple dimension coefficient output results；Measurement index refines module, for institute The result for stating the output of correlation calculations module is integrated, and is refined using integrated results as measurement index to recommendation regulations； Recommendation regulations display module is shown for the measurement index to be refined the recommendation regulations that module is refined, further, will The obtained coefficient of accuracy of Xgboost machine learning algorithm is as validity measurement standard；Prediction module, for using multi-thread Multiple dimension coefficients of initial data are predicted in journey parallel computation using ARIMA machine learning time series algorithm, are exported As a result it and is visualized.

Recommend the system of prediction model according to the business analysis management, wherein correlation calculations module specifically includes: Data Discretization module carries out discretization for that will acquire data, obtains multiple raw data sets；Ariori correlation calculations mould Block filters out for multiple raw data sets to be used multithreads computing support coefficient and reliability coefficient and meets branch The data of degree of holding and believability threshold；Related coefficient computing module, based on Ariori algorithm computing rule to the Ariori phase The related coefficient that the data set of closing property computing module output carries out between dimension is calculated, and carries out increment processing to related coefficient And the number average correlation coefficient of all phase relation combinations of relevant calculation, wherein related coefficient group is combined at least two related coefficients Combination；Xgboost correlation calculations module, using Xgboost gradient boosting algorithm to each dimension of average correlation coefficient into Row processing, obtains the significance index between dimension；Correlativity calculation result integrates module, according to the Data Discretization mould The support system that block, Ariori correlation calculations module, related coefficient computing module and Xgboost correlation calculations module obtain Number, relative coefficient and important coefficient are integrated into relationship measurement index.

The invention has the benefit that being directed to decision-making management person, business analysis recommender system is formulated；A balance is introduced to miss Difference function carries out data prediction, the subjective factor of coordinated balance goods themselves and comment customer, and the environment being related to Influence factor；Combined data science, applied linear algebra correlation, Apriori proposed algorithm and Xgboost machine learning pair Initial data association analysis, so that decision recommendation is reasonable；Using ARIMA machine learning algorithm were carried out to data the time Sequence prediction, manager's operation provides trend prediction and measure result of implementation is recommended to examine for after；By entire analysis system one Body, is input to that consequently recommended regulations provide, the data of every recommendation are supported and time series forecasting result from data； The present invention does on the basis of multicore thread parallel with excellent Apriori model, Xgboost algorithm and ARIMA and conjecture model Change upgrading, realizes making full use of for cpu resource, and the promotion to mass data arithmetic speed.

Detailed description of the invention

Fig. 1 show the method overview flow chart of embodiment according to the present invention；

Fig. 2 show the system structure diagram of embodiment according to the present invention；

Fig. 3 show the general illustration of embodiment according to the present invention；

Fig. 4 show the Apriori correlation parallel computation schematic diagram of embodiment according to the present invention；

Fig. 5 show the Xgboost correlation parallel computation schematic diagram of embodiment according to the present invention；

Fig. 6 show the time series forecasting parallel computation schematic diagram of embodiment according to the present invention；

Fig. 7 show the important coefficient figure of the combination of correlation obtained by the Apriori of embodiment according to the present invention.

Specific embodiment

Technical side of the invention includes a kind of method and system of business analysis management recommendation prediction model, is related to data Correlation analysis and machine learning, recommender system and prediction model.Below with reference to embodiment and attached drawing to design of the invention, Specific structure and the technical effect of generation carry out clear, complete description, to be completely understood by the purpose of the present invention, scheme and effect Fruit.

It should be noted that unless otherwise specified, when a certain feature referred to as " fixation ", " connection " are in another feature, It can directly fix, be connected to another feature, and can also fix, be connected to another feature indirectly.In addition, this The descriptions such as the upper and lower, left and right used in open are only the mutual alignment pass relative to each component part of the disclosure in attached drawing For system.The "an" of used singular, " described " and "the" are also intended to including most forms in the disclosure, are removed Non- context clearly expresses other meaning.In addition, unless otherwise defined, all technical and scientific terms used herein It is identical as the normally understood meaning of those skilled in the art.Term used in the description is intended merely to describe herein Specific embodiment is not intended to be limiting of the invention.Term as used herein "and/or" includes one or more relevant The arbitrary combination of listed item.

It will be appreciated that though various elements, but this may be described using term first, second, third, etc. in the disclosure A little elements should not necessarily be limited by these terms.These terms are only used to for same type of element being distinguished from each other out.For example, not departing from In the case where disclosure range, first element can also be referred to as second element, and similarly, second element can also be referred to as One element.The use of provided in this article any and all example or exemplary language (" such as ", " such as ") is intended merely to more Illustrate the embodiment of the present invention well, and unless the context requires otherwise, otherwise the scope of the present invention will not be applied and be limited.

Fig. 1 show the method overview flow chart of embodiment according to the present invention.It is specifically included: S100, according to business The decision-making management person of management acquires corresponding data；S200 pre-processes acquisition data, further, uses error function Equilibrium data deviation；S300 creates discrete raw data set, is based on Ariori algorithm and Xgboost engineering using multithreading The multiple raw data sets of algorithm parallel computation are practised, multiple dimension coefficient output results are obtained；S400 exports the step S300 Result integrated, recommendation regulations are refined using integrated results as measurement index；S500 mentions the step S400 The recommendation regulations of refining are shown, further, using the obtained coefficient of accuracy of Xgboost machine learning algorithm as validity Measurement standard；S600 uses the ARIMA machine learning time using multiple dimension coefficients of the multithreads computing to initial data Sequence algorithm is predicted, is exported result and is visualized.

Fig. 2 show the system structure diagram of embodiment according to the present invention.It is specifically included: data acquisition module, is used According to the corresponding data of the decision-making management person of commercial management acquisition；Data preprocessing module, it is pre- for being carried out to acquisition data Processing, further, uses error function equilibrium data deviation；Correlation calculations module makes for creating discrete raw data set Ariori algorithm and the multiple raw data sets of Xgboost machine learning algorithm parallel computation are based on multithreading, obtain multiple dimensions It spends coefficient and exports result；Measurement index refines module, and the result for exporting to the correlation calculations module is integrated, will Integrated results refine recommendation regulations as measurement index；Recommendation regulations display module, for mentioning the measurement index The recommendation regulations that refining module is refined are shown, and further, the obtained coefficient of accuracy of Xgboost machine learning algorithm is made For validity measurement standard；Prediction module, for using multithreads computing to use multiple dimension coefficients of initial data ARIMA machine learning time series algorithm is predicted, is exported result and is visualized.

Fig. 3 show the general illustration of embodiment according to the present invention.It is specific as follows:

(1) initial data inputs, and Python is compiled platform and mysql administration interface using pymysql Python module It is connected, and extracts commodity, service, businessman, brand relevant formula data from SQL database using sql command row.Example Such as: purchase situation of the customer to commodity；Evaluation score of the customer to commodity different attribute (color, material etc.).

(2) data prediction replaces missing values using average value, deletes repetition values etc..Utilize error function (1), balance Data deviation caused by objective factor and environmental factor, wherein γ_(i,x)To handle the data obtained, μ is market mean value, b_xFor with Family bring subjective bias, b_iFor deviation brought by commodity, q_i*p_xFor the two interactive relation deviation.Such as: somewhere Samsung Grade hotel evaluates on the net to be commonly 3.4 points of (mean value) (μ=+ 3.4), certain client is 7 grades of members of this area's average hotel (q_i=+0.7) (totally 10 grades), the hotel are that the client has been free room business upgrading (p_x=+1) (interaction is denoted as 1, no interaction note For 0), which is higher than 0.5 point of (b to the marking of this hotel_x=+0.5), but hotel marking average value is lower than μ 1 and divides (b_i=- 1).Final customer marking γ as a result,_(i,x)=3.4+0.5-1+0.7*1=3.6 divides

γ_(i,x)=μ+b_x+b_i+q_i*p_x(formula 1),

(3) the Apriori correlation analysis based on multithreads computing

Fig. 4 show the Apriori correlation parallel computation schematic diagram of embodiment according to the present invention.As shown in figure 4, It is carried out in Linux unix system, by Fork function caching sub thread and father's thread.

Data Discretization module: by consecutive data set γ discretization, and multiple raw data sets are established.For example, sub thread One: will (0,1] in continuous data it is discrete be 1, other are as 0, as first data set；Sub thread two: will (1,2] in For continuous data as 1, other are 0, as second data set；Sub thread m: will (m-1, m] continuous data as 1, other It is 0, as than the m-th data collection.Sub thread carries out simultaneously, saves the time, carries out sequencing operation using wait function, advanced Father's thread is carried out after temper thread, finally, m data collection is merged in father's thread, is denoted as input data M, and use protection process, Guarantee that parent process is still gone on smoothly after stop.

In Apriori correlation calculations module, the present invention is write using Python and is established based on multithreads computing Apriori correlation calculations model.One main purpose of sub thread is to generate data set and composition of relations；Sub thread two is used Support correlation and confidence level correlation between calculating all dimensions of text file；Sub thread is third is that according to the actual situation Threshold tau and μ in relation to support and confidence level are set, the combination greater than the threshold value is filtered out.Assuming that have n dimension, three lines For journey operating mode as shown in Fig. 2, in sub thread one, each item of initial data M is candidate item, and generating item number is 1 not repeat Candidate 1 item collection C1, every to generate one, this is transmitted to sub thread two by pipeline Pipe_C1, and sub thread two is wanted to count receiving For 2 candidate combinations item manifold when, be returned to " 0 " signal of one Signal_0 of sub thread, pass through control wait and transmitting signal " 0 " or " 1 " controls the progress of two threads, and by wait function control thread two sub thread one receive pipe it After carry out, by Signal_0 and wait function control sub thread one stop after receiving sub thread two.Likewise, sub thread The confidence level c_1 that C1 is combined and support s_1 are passed to sub thread three by two-way piping Pipe_sc1, by controlling wait And signal " 0 " or " 1 " is transmitted to control the progress of two threads.It is equally identical side between sub thread three and sub thread one Formula transmits Pipe_L1 and signal Signal_1, starts together with wait function to transmit data set L1 and transmitting signal " 1 " Sub thread one.Finally, items are Candidate Set in L (n-1), and thread one generates the not repeated combination that item number is n, and thread two calculates The confidence level and support respectively combined, thread three filter out the combination greater than threshold value, and thread is worked as with this finally obtains data set Ln It is exported by father's thread.

Apriori algorithm is used for two or more dimension correlation calculations, and realization is by calculating two or more things The simultaneous probability of part (support coefficient), under the premise of occurring with individual event, the probability (confidence level of another event generation Coefficient).And calculate reliability coefficient mean value ū and support Coefficient Mean v.Its practical significance is: A event is that customer buys A Class commodity, B event is that customer buys B class commodity, then in the result of the Apriori of A=> B, when reliability coefficient C (i, x) is greater than When ū, support coefficient S (i, x) is bigger, and the incidence of the easier raising event B of generation of event A and policymaker need to promote The sale of A is to drive the sale of B；When reliability coefficient is less than ū, support coefficient is bigger, and the generation of event A is more easily reduced The incidence of event B, the two may be that competitive relation either can merge the commodity of consideration and policymaker and can be considered two Person's package sale.

(4) related coefficient calculates

The present invention obtains the correlation between dimension two-by-two using Python array correlation calculations, related coefficient Г (i, X) bigger, it is contacted between two dimensions closer.Increment processing is carried out to the above results as a result, and calculates all correlation combiners Average correlation coefficient ν.

(5) the Xgboost correlation analysis based on multithreads computing, wherein Fig. 5 is shown implements according to the present invention The Xgboost correlation parallel computation schematic diagram of mode.

To obtain the correlation between each dimension, and the speed of service is improved, the present invention is using multi-threaded parallel operation Xgboost machine learning algorithm is calculated.By Xgboos gradient boosting algorithm, each dimension is calculated, this model In share 8 sub threads, per thread is all to carry out Xgboost processing to each dimension, chooses 80% data as training Collection, 20% is used as test data, and every thread can relatively direct obtain each category after Xgboost boosted tree is created Property importance score and importance scores, the score measured feature in a model promotion decision tree building in is worth.One A attribute it is more be used to construct decision tree in a model, its importance is just relatively higher.Importance of Attributes is by right Each attribute in data set is calculated, and is ranked up to obtain.It is improved in single decision by each Attributes Splitting point The amount of performance metric carrys out computation attribute importance, is responsible for weighted sum record number by node.That is to say, an attribute is to split point Improvement performance metric is bigger (closer to root node), and weight is bigger；Selected by more boosted trees, attribute is more important.Performance degree Amount can be the Gini purity of selection split vertexes, be also possible to other metric functions.Finally by an attribute in all promotions Then result in tree is weighted after summation averagely, obtain importance score.

Entire parallel computation carries out in Linux unix system, by Fork function caching sub thread and father's thread.Such as Shown in Fig. 5, the present invention uses Xgboost Python program bag, by adjusting ginseng to obtain the influence important coefficient of each dimension, son It is result data collection to be separated into two sections, is denoted as a result, other is impact factors that thread, which is using certain dimension, " 1 " and "0".It obtains the influence important coefficient I (i, x) between result and impact factor and sorts.By wait function make father into Journey occurs after subprocess, and the obtained data set of subprocess is transmitted to parent process by Pipe, parent process by each dimension it Between important coefficient summarize, the corresponding important coefficient for calculating correlation obtained by Apriori and combining.If (there are three dimension, Then solve importance mean value two-by-two), and calculate all combination important coefficient mean value ī.In addition, the present invention for parent process do with It guards, after guaranteeing subprocess, parent process can still carry out on backstage complete.

(6) correlativity calculation result integrates module

By support coefficient S (i, x), relative coefficient r (i, x), important coefficient I (i, x) is integrated into following formula One relationship measurement index R (i, x), confidence level target is discrete for " 0 " and " 1 ".

R (i, x)=[S (i, x)-v]+[r (i, x)-ν]+[I (i, x)-ī] (4)

If C (i, x) > ū, C (i, x)=1；

If C (i, x) < ū, C (i, x)=0； (5)

(7) regulations are recommended to refine module

I, x are combined, when C (i, x)=1, i is proportional with x, and policymaker is by promotion i to develop x, R (i, x) Bigger, this proportional relation is stronger；When C (i, x)=0, at competitive relation, policymaker has been reached by the binding sale of i and x by i and x To the two synchronized development, R (i, x) is bigger, this competitive relation is stronger.

(8) recommend regulations to show and measure module with validity

Fig. 6 show the time series forecasting parallel computation schematic diagram of embodiment according to the present invention.Recommendation regulations are pressed Management person requires sequence to show, the obtained coefficient of accuracy of Xgboost machine learning algorithm is measured as validity and is marked It is quasi-.Such as: manager will also require to be the quick sale for pushing supermarket's fresh food, by being calculated with upper module, R_{(i, fresh food)} In, R_{(instant noodles, fresh food)}Highest, and C_{(instant noodles, fresh food)}=0, and in order to push fresh food, instant noodles and fresh food are packaged pin It sells most beneficial for the sale for pushing fresh food；It and is as a result, other dimensions are the Xgboost of impact factor for fresh food Machine learning model, accuracy 97%, then the recommendation regulations --- instant noodles, fresh food package sale are to push fresh food Product sale --- validity be 97%.

(9) the time series forecasting module based on multithreads computing passes through in Linux or Unix system Python fork module antithetical phrase, the backup of father's thread cache, then the purpose of sub thread is to obtain the time series of all dimensions Predicted value, father's thread are then responsible for the output and visualization of prediction result.In sub thread, when the present invention uses ARIMA machine learning Between sequence algorithm, γ is carried out to first derivation in each dimension and (is possible to carry out second order derivation or more according to the actual situation Rank derivation), it is visualized with line chart, if obtained image is more gentle (mean value is more concentrated), is not necessarily in derivation, Otherwise the derivation of higher order is carried out；Function auto-correlation function (ACF) after calculating derivation, partial autocorrelation function (PACF) are simultaneously visual Change, determines parameter；Each dimension data collection of initial data is input in AEIMA time series, and is realized in each dimension Time series forecasting.

Technical solution of the present invention specifically discloses a kind of more detailed embodiment, specific as follows:

(1) this example uses web crawlers technology by certain supermarket 2015-2018 8 class commodity week sales volume, and is stored in SQL Database writes Python program using Pycharm compiler, imports pymysql module and connects local SQL database and reads Data.

(2) data preprocessing module: data are balanced Error processing, the average all sales volume μ in the market of i commodity are subjective Error b_x(i commodity xth week quantity purchase-μ), commodity error b_i(commodity xth week quantity purchase-μ), interactive relation deviation q_i* p_xIf (having 3 days i commodity xth week in advertising campaign, relationship deviation is 3/7) γ_(i,x)=μ+b_x+b_i+q_i*p_x (1)。

(3) multithreads computing model one is established in Linux4 core operating system --- Apriori correlation point Analysis:, 8 sub threads, which are respectively scanned data, is separated into 8 data sets, and every scanning is to one in the data for formulating section Father's thread is just input to by pipeline: (0,100], (100,200], (200,300], (300,400], (400,500], (500, 600], (600,700], (and 700,800], father's thread receives the data of sub thread and exports storage conduct into SQL database M。

(3) present invention is write using Python and is established the Apriori correlation calculations mould based on multithreads computing Type.One main purpose of sub thread is to generate data set and composition of relations；Sub thread two is for calculating all dimensions of text file Between support correlation and confidence level correlation；Sub thread is third is that be set according to actual conditions related support and confidence level Threshold tau and μ, filter out the combination greater than the threshold value.Share 8 dimensions, in sub thread one, each item of initial data M is Candidate item, generating item number is 1 not repeat 1 item collection C1 of candidate, and every to generate one, this is transmitted to son by pipeline Pipe_C1 Thread two, sub thread two are returned to " 0 " signal of one Signal_0 of sub thread, pass through control when receiving the group item that item number is 2 Wait processed and transmitting signal " 0 " or " 1 " control thread two in sub-line to control the progress of two threads, and by wait function Journey one carries out after receiving pipe, controls sub thread one by Signal_0 and wait function and stops after receiving sub thread two Only.Likewise, the confidence level c_1 that C1 is combined and support s_1 are passed to sub thread by sub thread two-way piping Pipe_sc1 Three, the progress of two threads is controlled by control wait and transmitting signal " 0 " or " 1 ".Sub thread three and sub thread one it Between be equally that identical mode transmits Pipe_L1 and signal Signal_1, come transmit data set L1 and transmitting signal " 1 " with Wait function starts sub thread one together.Finally, items are Candidate Set in L (n-1), and it is 8 not weigh that thread one, which generates item number, Multiple combination, thread two calculates each combined confidence level and support filters out the combination greater than threshold value, and thread is with this when final It is exported and is sorted by father's thread to data set Ln.Such as: generate 2 item collection of candidate " fresh food → instant food " that item number is two ∈ C_2 passes to sub thread two with signal Singal_1 by pipeline Pipe_2 together, and it is credible that sub thread two calculates this combination Spend C_{(fresh food, instant food)}With support S_{(fresh food, instant food)}, line is passed to signal Singal_1 by pipeline Pipe_sc2 together Journey three, C_{(fresh food, instant food)}> μ, and S_{(fresh food, instant food)}> τ, then the subset by " fresh food → instant food " as L2 passes through Pipe_L2 passes to thread one together with Signal_1, while being stored in output " fresh food → facilitate food in father's thread Product ", C_{(fresh food, instant food)}、S_{(fresh food, instant food)}Database is stored, thread scans at the beginning and establish the combination that item number is 3, such as " fresh food → instant food → articles for daily use ".Finally, bus journey is to all combination needles after completing the combination that item number is 8 To support S_(x,i)It is ranked up.

(3) by calculating the correlation r combined between dimension two-by-two_(i,x), finally average as the combination items Between phase

Guan Xing.Such as:

r_{(fresh food, instant food, articles for daily use)}=[r_{(fresh food, articles for daily use)}+r_{(fresh food, instant food)}+r_{(instant food, articles for daily use)}]/3。

By Xgboos gradient boosting algorithm, each dimension is calculated, 8 sub threads are shared in this model, each Thread is all to carry out Xgboost processing to each dimension, chooses 80% data as training set, 20% and is used as test data, often Thread can relatively direct obtain the importance score and importance of each attribute after Xgboost boosted tree is created Score, which has measured is worth in the promotion decision tree building of feature in a model.One attribute is more to be used in mould Decision tree is constructed in type, its importance is just relatively higher.Importance of Attributes is carried out by each attribute concentrated to data It calculates, and is ranked up to obtain.The amount of performance metric is improved come computation attribute by each Attributes Splitting point in single decision Importance is responsible for weighted sum record number by node.That is to say, an attribute to split point improve performance metric it is bigger (closer to Root node), weight is bigger；Selected by more boosted trees, attribute is more important.Performance metric can be selection split vertexes Gini purity is also possible to other metric functions.Result of the attribute in all boosted trees is finally weighted summation Then average afterwards, obtain importance score.Entire parallel computation carries out in Linux unix system, is delayed by Fork function Deposit sub thread and father's thread.As shown in figure 5, the present invention uses Xgboost Python program bag, by adjusting ginseng to obtain each dimension Influence important coefficient, it is that result data collection is separated into two as a result, other is impact factors that sub thread, which is using certain dimension, A section, is denoted as " 1 " and " 0 ".Obtain the influence important coefficient I between result and impact factor_(i,x)And it sorts.Pass through Wait function occurs that parent process after subprocess, and the obtained data set of subprocess is transmitted to parent process by Pipe, Parent process summarizes the important coefficient between each dimension, the corresponding importance system for calculating correlation obtained by Apriori and combining Number.(if there are three dimensions, solve importance mean value two-by-two), and calculate all combination important coefficient mean value ī.In addition, this Parent process is done to guard in invention, and after guaranteeing subprocess, parent process can still carry out on backstage complete.Such as, thread One, using total sales volume as predicted value, is averaged, and is denoted as 1 greater than sales volume average value, and being less than sales volume average value is 0, obtained category Property importance is as shown in Figure 7:

(5) relationship measurement index relationship measurement index R (i, x) is sought, confidence level target is discrete for " 0 " and " 1 "

R_(i,x)=[S_(i,x)-v]+[r_(i,x)-ν]+[I_(i,x)-ī] (4)

If C_(i,x)>ū, C_{(i, x)=1}；If C_(i,x)<ū, C_{(i, x)=0}；

(7) regulations are recommended to extract；I → x is combined, when C (i, x)=1, i is proportional with x, and policymaker, which passes through, to push away I is moved to develop x, R (i, x) is bigger, this proportional relation is stronger；When C (i, x)=0, at competitive relation, policymaker passes through by i and x The binding sale of i and x has reached the two synchronized development, and R (i, x) is bigger, this competitive relation is stronger.

(8) regulations will be recommended to require sequence to show according to manager, Xgboost machine learning algorithm is obtained accurate Coefficient is spent as validity measurement standard.Such as: manager will also require to be the quick sale for pushing supermarket's fresh food, pass through It is calculated with upper module, R_{(i, fresh food)}In, R_{(instant food, fresh food)}Highest, and C_{(instant food, fresh food)}=0, and in order to push fresh food, By instant noodles and fresh food package sale most beneficial for the sale for pushing fresh food；And be for fresh food as a result, its His dimension is the Xgboost machine learning model of impact factor, accuracy 97%, then the recommendation regulations --- instant noodles, life Fresh food package sale is to push fresh food to sell --- validity be 97%.

In Linux or Unix system, backed up by Python fork module antithetical phrase, father's thread cache, then sub-line The purpose of journey is to obtain the time series forecasting value of all dimensions, and father's thread is then responsible for the output and visualization of prediction result. In sub thread, the present invention uses ARIMA machine learning time series algorithm, and γ is carried out first derivation (root in each dimension Be possible to carry out second order derivation or more derivation according to actual conditions), it is visualized with line chart, if obtained image compared with For gentle (mean value is more concentrated), then it is not necessarily to otherwise carry out the derivation of higher order in derivation；Function auto-correlation after calculating derivation Function (ACF), partial autocorrelation function (PACF) simultaneously visualize, and determine parameter；Each dimension data collection of initial data is inputted Into AEIMA time series, and realize the time series forecasting in each dimension.It finally obtains under each Sales Volume of Commodity and total sales volume One station predicted value, to verify recommendation regulations from now on.

It should be appreciated that the embodiment of the present invention can be by computer hardware, the combination of hardware and software or by depositing The computer instruction in non-transitory computer-readable memory is stored up to be effected or carried out.Standard volume can be used in the method Journey technology-includes that the non-transitory computer-readable storage media configured with computer program is realized in computer program, In configured in this way storage medium computer is operated in a manner of specific and is predefined --- according in a particular embodiment The method and attached drawing of description.Each program can with the programming language of level process or object-oriented come realize with department of computer science System communication.However, if desired, the program can be realized with compilation or machine language.Under any circumstance, which can be volume The language translated or explained.In addition, the program can be run on the specific integrated circuit of programming for this purpose.

In addition, the operation of process described herein can be performed in any suitable order, unless herein in addition instruction or Otherwise significantly with contradicted by context.Process described herein (or modification and/or combination thereof) can be held being configured with It executes, and is can be used as jointly on the one or more processors under the control of one or more computer systems of row instruction The code (for example, executable instruction, one or more computer program or one or more application) of execution, by hardware or its group It closes to realize.The computer program includes the multiple instruction that can be performed by one or more processors.

Further, the method can be realized in being operably coupled to suitable any kind of computing platform, wrap Include but be not limited to PC, mini-computer, main frame, work station, network or distributed computing environment, individual or integrated Computer platform or communicated with charged particle tool or other imaging devices etc..Each aspect of the present invention can be to deposit The machine readable code on non-transitory storage medium or equipment is stored up to realize no matter be moveable or be integrated to calculating Platform, such as hard disk, optical reading and/or write-in storage medium, RAM, ROM, so that it can be read by programmable calculator, when Storage medium or equipment can be used for configuration and operation computer to execute process described herein when being read by computer.This Outside, machine readable code, or part thereof can be transmitted by wired or wireless network.When such media include combining microprocessor Or other data processors realize steps described above instruction or program when, invention as described herein including these and other not The non-transitory computer-readable storage media of same type.When methods and techniques according to the present invention programming, the present invention It further include computer itself.

Computer program can be applied to input data to execute function as described herein, to convert input data with life At storing to the output data of nonvolatile memory.Output information can also be applied to one or more output equipments as shown Device.In the preferred embodiment of the invention, the data of conversion indicate physics and tangible object, including the object generated on display Reason and the particular visual of physical objects are described.

The above, only presently preferred embodiments of the present invention, the invention is not limited to above embodiment, as long as It reaches technical effect of the invention with identical means, all within the spirits and principles of the present invention, any modification for being made, Equivalent replacement, improvement etc., should be included within the scope of the present invention.Its technical solution within the scope of the present invention And/or embodiment can have a variety of different modifications and variations.

Claims

1. a kind of method that prediction model is recommended in business analysis management, which is characterized in that the described method comprises the following steps:

S100 acquires corresponding data according to the decision-making management person of commercial management；

S200 pre-processes acquisition data, further, uses error function equilibrium data deviation；

S300 creates discrete raw data set, is based on Ariori algorithm using multithreading and Xgboost machine learning algorithm is parallel Multiple raw data sets are calculated, multiple dimension coefficient output results are obtained；

S400 integrates the result of S300 output, mentions using integrated results as measurement index to recommendation regulations Refining；

The recommendation regulations that the S400 is refined are shown by S500, further, will be obtained by Xgboost machine learning algorithm Coefficient of accuracy as validity measurement standard；

S600 uses ARIMA machine learning time series using multiple dimension coefficients of the multithreads computing to initial data Algorithm is predicted, is exported result and is visualized.

2. the method that prediction model is recommended in business analysis management according to claim 1, which is characterized in that in the S100 Acquire corresponding data to specifically include: data are crawled or are read out from specified data library, wherein the data obtained include but It is not limited to satisfactory relevant formula data.

3. the method that prediction model is recommended in business analysis management according to claim 1, which is characterized in that the S200 tool Body includes:

S201 is substituted the missing values in acquisition data using average value, and deletes repetition values；

S202 uses error function γ [i, x]=μ+b_x+b_i+q_i*p_xIt balances objective factor and data caused by environmental factor is inclined Difference, wherein μ is market mean value, b_xFor user's bring subjective bias, b_iFor deviation brought by commodity, q_i*p_xFor interactive relation Deviation.

4. the method that prediction model is recommended in business analysis management according to claim 1, which is characterized in that the S300 tool Body includes:

Acquisition data are carried out discretization, obtain multiple raw data sets by S301；

Multiple raw data sets are used multithreads computing support coefficient and reliability coefficient, filter out and meet by S302 The data of support and believability threshold；

S303, the data set that the S302 is exported based on Ariori algorithm computing rule carry out the related coefficient between dimension into Row calculates, and the number average correlation coefficient of increment processing and all phase relation combinations of relevant calculation is carried out to related coefficient, wherein phase Relationship array is combined into the combination of at least two related coefficients；

S304 is handled using each dimension of the Xgboos gradient boosting algorithm to average correlation coefficient, is obtained between dimension Significance index；

S305 is integrated into relationship weighing apparatus to support coefficient, relative coefficient and the important coefficient that the S301~S304 is obtained Figureofmerit.

5. the method that prediction model is recommended in business analysis management according to claim 4, which is characterized in that the S302 tool Body includes:

First son generates data set and composition of relations, and is sent to the second sub thread；

Second sub thread calculates the support correlation and confidence level correlation between all dimensions of text file, and is sent to the Three sub threads；

Third sub thread filters out the group greater than the threshold value according to the threshold value of preset corresponding support correlation and confidence level It closes, and is exported by father's thread.

6. the method that prediction model is recommended in business analysis management according to claim 4, which is characterized in that the S302 is equal Related coefficient specifically:

Wherein X, Y are different item collections.

7. the method that prediction model is recommended in business analysis management according to claim 6, which is characterized in that the S305 tool Body includes:

Using R (i, x)=[S (i, x)-ν]+[r (i, x)-ν]+[I (i, x) -1] calculated relationship measurement index, by confidence level target Discrete is 0 and 1；

WhenWhen, C (i, x)=1；

WhenWhen, C (i, x)=0；

Wherein S (i, x) is support coefficient, and r (i, x) is relative coefficient, and I (i, x) important coefficient, R (i, x) is relationship weighing apparatus Figureofmerit, C (i, x) are reliability coefficient.

8. the method that prediction model is recommended in business analysis management according to claim 1, which is characterized in that the S600 tool Body includes:

Using sub thread initial data is subjected to first derivation in each dimension, it is visualized with line chart, if obtained Image it is more gentle, then without otherwise carrying out the derivation of higher order in derivation；

Function auto-correlation function after calculating derivation, partial autocorrelation function simultaneously visualize, and obtain determining parameter；

Each dimension data collection of initial data is input in ARIMA time series, it is pre- to the time series in each dimension It surveys.

9. the system that prediction model is recommended in a kind of business analysis management, for executing shown 1-9 any means, which is characterized in that The system includes:

Data acquisition module, for acquiring corresponding data according to the decision-making management person of commercial management；

Data preprocessing module further, uses error function equilibrium data deviation for pre-processing to acquisition data；

Correlation calculations module is based on Ariori algorithm and Xgboost machine using multithreading for creating discrete raw data set The multiple raw data sets of device learning algorithm parallel computation obtain multiple dimension coefficient output results；

Measurement index refines module, and the result for exporting to the correlation calculations module is integrated, integrated results are made Recommendation regulations are refined for measurement index；

Recommendation regulations display module is shown for the measurement index to be refined the recommendation regulations that module is refined, further, Using the obtained coefficient of accuracy of Xgboost machine learning algorithm as validity measurement standard；

Prediction module, for using multithreads computing to use ARIMA machine learning to multiple dimension coefficients of initial data Time series algorithm is predicted, is exported result and is visualized.

10. the system that prediction model is recommended in business analysis management according to claim 9, which is characterized in that the correlation Property computing module specifically includes:

Data Discretization module carries out discretization for that will acquire data, obtains multiple raw data sets；

Ariori correlation calculations module, for multiple raw data sets to be used multithreads computing support coefficient and can Coefficient of reliability filters out the data for meeting support and believability threshold；

Related coefficient computing module, the number that the Ariori correlation calculations module is exported based on Ariori algorithm computing rule The related coefficient between dimension is carried out according to collection to be calculated, and increment processing and all phase relations of relevant calculation are carried out to related coefficient Combined number average correlation coefficient, wherein related coefficient group is combined into the combination of at least two related coefficients；

Xgboost correlation calculations module is carried out using each dimension of the Xgboost gradient boosting algorithm to average correlation coefficient Processing, obtains the significance index between dimension；

Correlativity calculation result integrates module, according to the Data Discretization module, Ariori correlation calculations module, phase relation Support coefficient, relative coefficient and the important coefficient integration that number computing module and Xgboost importance computing module obtain For relationship measurement index.