Summary of the invention
The present invention provides the method and system that prediction model is recommended in a kind of business analysis management, by using error balance
Function pre-processes initial data, then writes the survey of the Ariori correlation based on multithreads computing using Python
Quantity algorithm will carry out phase between the vector of same event different latitude in conjunction with the correlation rule of linear two vector of algebraic manipulation
The analysis of closing property.Meanwhile it being based on multithreads computing, the present invention trains raw data set using Xgboost machine learning algorithm,
And it obtains the prediction result thread parallel calculating present invention and establishes ARIMA settling time sequential forecasting models, and initial data is made
To be input in machine learning model, selected part data are trained and test respectively, to quickly and effectively be manager
Prediction in real time is provided.In addition, it is contemplated that the data inputted in correlation calculations process there may be subjective factor or environment because
Error caused by element in dependency analysis process of the present invention, introduces an error function, coordinated balance goods themselves with comment
By the subjective factor of customer, and the environmental impact factor being related to.
Technical solution of the present invention includes a kind of method of business analysis management recommendation prediction model, which is characterized in that institute
Method is stated the following steps are included: S100, acquires corresponding data according to the decision-making management person of commercial management;S200, to acquisition number
According to being pre-processed, further, error function equilibrium data deviation is used;S300, creates discrete raw data set, and use is multi-thread
Journey is based on Ariori algorithm and the multiple raw data sets of Xgboost machine learning algorithm parallel computation, obtains multiple dimension coefficients
Export result;S400 integrates the result of step S300 output, using integrated results as measurement index to recommendation item
Example is refined;The step S400 recommendation regulations refined are shown, further, by Xgboost machine learning by S500
The obtained coefficient of accuracy of algorithm is as validity measurement standard;S600, using multithreads computing to initial data
Multiple dimension coefficients are predicted using ARIMA machine learning time series algorithm, are exported result and are visualized.
The method for recommending prediction model according to the business analysis management, wherein acquires corresponding data in step S100
Specifically include: data are crawled or are read out from specified data library, wherein the data obtained include but is not limited to meet the requirements
Relevant formula data.
Recommend the method for prediction model according to the business analysis management, wherein step S200 is specifically included: S201, right
Missing values in acquisition data are substituted using average value, and delete repetition values;S202, using error function γ [i, x]=μ+
bx+bi+qi*pxData deviation caused by objective factor and environmental factor is balanced, wherein μ is market mean value, bxIt is brought for user
Subjective bias, biFor deviation brought by commodity, qi*pxFor interactive relation deviation.
The method for recommending prediction model according to the business analysis management, wherein the step S300 is specifically included:
Acquisition data are carried out discretization, obtain multiple raw data sets by S301;Multiple raw data sets are used multithreading by S302
Parallel computation support coefficient and reliability coefficient filter out the data for meeting support and believability threshold;S303 is based on
Ariori algorithm computing rule carries out the related coefficient between dimension to the data set that the step S302 is exported and calculates, right
Related coefficient carries out the number average correlation coefficient of increment processing and all phase relation combinations of relevant calculation, and wherein related coefficient combines
For the combination of at least two related coefficients;S304, using Xgboost gradient boosting algorithm to each dimension of average correlation coefficient
It is handled, obtains the significance index between dimension;S305, the support coefficient that the step S301~S304 is obtained,
Relative coefficient and important coefficient are integrated into relationship measurement index.
Recommend the method for prediction model according to the business analysis management, wherein step S302 is specifically included: the first son
Data set and composition of relations are generated, and is sent to the second sub thread;Second sub thread calculates between all dimensions of text file
Support correlation and confidence level correlation, and it is sent to third sub thread;Third sub thread is according to preset corresponding support
The threshold value of correlation and confidence level filters out the combination greater than the threshold value, and is exported by father's thread.
Recommend the method for prediction model according to the business analysis management, wherein the related coefficient of step S302 is specific
Are as follows:
Wherein X, Y are different item collections.
Recommend the method for prediction model according to the business analysis management, wherein step S305 is specifically included: using R
(i, x)=[S (i, x)-ν]+[r (i, x)-ν]+[I (i, x) -1] calculated relationship measurement index, by confidence level target it is discrete be 0 He
1;WhenWhen, C (i, x)=1;WhenWhen, C (i, x)=0;Wherein S (i, x) be support coefficient, r (i,
It x) is relative coefficient, I (i, x) important coefficient, R (i, x) is relationship measurement index, and C (i, x) is reliability coefficient.
Recommend the method for prediction model according to the business analysis management, wherein step S600 is specifically included: using son
Initial data is carried out first derivation by thread in each dimension, it is visualized with line chart, if obtained image is more
Gently, then without otherwise carrying out the derivation of higher order in derivation;Function auto-correlation function after calculating derivation, partial autocorrelation function
And visualize, it obtains determining parameter;Each dimension data collection of initial data is input in ARIMA time series, to each
Time series forecasting in dimension.
Technical solution of the present invention further includes a kind of pre- for realizing the business analysis management recommendation of above-mentioned any power method
The system for surveying model, which includes: data acquisition module, for acquiring corresponding number according to the decision-making management person of commercial management
According to;Data preprocessing module further, uses error function equilibrium data deviation for pre-processing to acquisition data;Phase
Closing property computing module is based on Ariori algorithm and Xgboost machine learning using multithreading for creating discrete raw data set
The multiple raw data sets of algorithm parallel computation obtain multiple dimension coefficient output results;Measurement index refines module, for institute
The result for stating the output of correlation calculations module is integrated, and is refined using integrated results as measurement index to recommendation regulations;
Recommendation regulations display module is shown for the measurement index to be refined the recommendation regulations that module is refined, further, will
The obtained coefficient of accuracy of Xgboost machine learning algorithm is as validity measurement standard;Prediction module, for using multi-thread
Multiple dimension coefficients of initial data are predicted in journey parallel computation using ARIMA machine learning time series algorithm, are exported
As a result it and is visualized.
Recommend the system of prediction model according to the business analysis management, wherein correlation calculations module specifically includes:
Data Discretization module carries out discretization for that will acquire data, obtains multiple raw data sets;Ariori correlation calculations mould
Block filters out for multiple raw data sets to be used multithreads computing support coefficient and reliability coefficient and meets branch
The data of degree of holding and believability threshold;Related coefficient computing module, based on Ariori algorithm computing rule to the Ariori phase
The related coefficient that the data set of closing property computing module output carries out between dimension is calculated, and carries out increment processing to related coefficient
And the number average correlation coefficient of all phase relation combinations of relevant calculation, wherein related coefficient group is combined at least two related coefficients
Combination;Xgboost correlation calculations module, using Xgboost gradient boosting algorithm to each dimension of average correlation coefficient into
Row processing, obtains the significance index between dimension;Correlativity calculation result integrates module, according to the Data Discretization mould
The support system that block, Ariori correlation calculations module, related coefficient computing module and Xgboost correlation calculations module obtain
Number, relative coefficient and important coefficient are integrated into relationship measurement index.
The invention has the benefit that being directed to decision-making management person, business analysis recommender system is formulated;A balance is introduced to miss
Difference function carries out data prediction, the subjective factor of coordinated balance goods themselves and comment customer, and the environment being related to
Influence factor;Combined data science, applied linear algebra correlation, Apriori proposed algorithm and Xgboost machine learning pair
Initial data association analysis, so that decision recommendation is reasonable;Using ARIMA machine learning algorithm were carried out to data the time
Sequence prediction, manager's operation provides trend prediction and measure result of implementation is recommended to examine for after;By entire analysis system one
Body, is input to that consequently recommended regulations provide, the data of every recommendation are supported and time series forecasting result from data;
The present invention does on the basis of multicore thread parallel with excellent Apriori model, Xgboost algorithm and ARIMA and conjecture model
Change upgrading, realizes making full use of for cpu resource, and the promotion to mass data arithmetic speed.
Specific embodiment
Technical side of the invention includes a kind of method and system of business analysis management recommendation prediction model, is related to data
Correlation analysis and machine learning, recommender system and prediction model.Below with reference to embodiment and attached drawing to design of the invention,
Specific structure and the technical effect of generation carry out clear, complete description, to be completely understood by the purpose of the present invention, scheme and effect
Fruit.
It should be noted that unless otherwise specified, when a certain feature referred to as " fixation ", " connection " are in another feature,
It can directly fix, be connected to another feature, and can also fix, be connected to another feature indirectly.In addition, this
The descriptions such as the upper and lower, left and right used in open are only the mutual alignment pass relative to each component part of the disclosure in attached drawing
For system.The "an" of used singular, " described " and "the" are also intended to including most forms in the disclosure, are removed
Non- context clearly expresses other meaning.In addition, unless otherwise defined, all technical and scientific terms used herein
It is identical as the normally understood meaning of those skilled in the art.Term used in the description is intended merely to describe herein
Specific embodiment is not intended to be limiting of the invention.Term as used herein "and/or" includes one or more relevant
The arbitrary combination of listed item.
It will be appreciated that though various elements, but this may be described using term first, second, third, etc. in the disclosure
A little elements should not necessarily be limited by these terms.These terms are only used to for same type of element being distinguished from each other out.For example, not departing from
In the case where disclosure range, first element can also be referred to as second element, and similarly, second element can also be referred to as
One element.The use of provided in this article any and all example or exemplary language (" such as ", " such as ") is intended merely to more
Illustrate the embodiment of the present invention well, and unless the context requires otherwise, otherwise the scope of the present invention will not be applied and be limited.
Fig. 1 show the method overview flow chart of embodiment according to the present invention.It is specifically included: S100, according to business
The decision-making management person of management acquires corresponding data;S200 pre-processes acquisition data, further, uses error function
Equilibrium data deviation;S300 creates discrete raw data set, is based on Ariori algorithm and Xgboost engineering using multithreading
The multiple raw data sets of algorithm parallel computation are practised, multiple dimension coefficient output results are obtained;S400 exports the step S300
Result integrated, recommendation regulations are refined using integrated results as measurement index;S500 mentions the step S400
The recommendation regulations of refining are shown, further, using the obtained coefficient of accuracy of Xgboost machine learning algorithm as validity
Measurement standard;S600 uses the ARIMA machine learning time using multiple dimension coefficients of the multithreads computing to initial data
Sequence algorithm is predicted, is exported result and is visualized.
Fig. 2 show the system structure diagram of embodiment according to the present invention.It is specifically included: data acquisition module, is used
According to the corresponding data of the decision-making management person of commercial management acquisition;Data preprocessing module, it is pre- for being carried out to acquisition data
Processing, further, uses error function equilibrium data deviation;Correlation calculations module makes for creating discrete raw data set
Ariori algorithm and the multiple raw data sets of Xgboost machine learning algorithm parallel computation are based on multithreading, obtain multiple dimensions
It spends coefficient and exports result;Measurement index refines module, and the result for exporting to the correlation calculations module is integrated, will
Integrated results refine recommendation regulations as measurement index;Recommendation regulations display module, for mentioning the measurement index
The recommendation regulations that refining module is refined are shown, and further, the obtained coefficient of accuracy of Xgboost machine learning algorithm is made
For validity measurement standard;Prediction module, for using multithreads computing to use multiple dimension coefficients of initial data
ARIMA machine learning time series algorithm is predicted, is exported result and is visualized.
Fig. 3 show the general illustration of embodiment according to the present invention.It is specific as follows:
(1) initial data inputs, and Python is compiled platform and mysql administration interface using pymysql Python module
It is connected, and extracts commodity, service, businessman, brand relevant formula data from SQL database using sql command row.Example
Such as: purchase situation of the customer to commodity;Evaluation score of the customer to commodity different attribute (color, material etc.).
(2) data prediction replaces missing values using average value, deletes repetition values etc..Utilize error function (1), balance
Data deviation caused by objective factor and environmental factor, wherein γ(i,x)To handle the data obtained, μ is market mean value, bxFor with
Family bring subjective bias, biFor deviation brought by commodity, qi*pxFor the two interactive relation deviation.Such as: somewhere Samsung
Grade hotel evaluates on the net to be commonly 3.4 points of (mean value) (μ=+ 3.4), certain client is 7 grades of members of this area's average hotel
(qi=+0.7) (totally 10 grades), the hotel are that the client has been free room business upgrading (px=+1) (interaction is denoted as 1, no interaction note
For 0), which is higher than 0.5 point of (b to the marking of this hotelx=+0.5), but hotel marking average value is lower than μ 1 and divides (bi=-
1).Final customer marking γ as a result,(i,x)=3.4+0.5-1+0.7*1=3.6 divides
γ(i,x)=μ+bx+bi+qi*px(formula 1),
(3) the Apriori correlation analysis based on multithreads computing
Fig. 4 show the Apriori correlation parallel computation schematic diagram of embodiment according to the present invention.As shown in figure 4,
It is carried out in Linux unix system, by Fork function caching sub thread and father's thread.
Data Discretization module: by consecutive data set γ discretization, and multiple raw data sets are established.For example, sub thread
One: will (0,1] in continuous data it is discrete be 1, other are as 0, as first data set;Sub thread two: will (1,2] in
For continuous data as 1, other are 0, as second data set;Sub thread m: will (m-1, m] continuous data as 1, other
It is 0, as than the m-th data collection.Sub thread carries out simultaneously, saves the time, carries out sequencing operation using wait function, advanced
Father's thread is carried out after temper thread, finally, m data collection is merged in father's thread, is denoted as input data M, and use protection process,
Guarantee that parent process is still gone on smoothly after stop.
In Apriori correlation calculations module, the present invention is write using Python and is established based on multithreads computing
Apriori correlation calculations model.One main purpose of sub thread is to generate data set and composition of relations;Sub thread two is used
Support correlation and confidence level correlation between calculating all dimensions of text file;Sub thread is third is that according to the actual situation
Threshold tau and μ in relation to support and confidence level are set, the combination greater than the threshold value is filtered out.Assuming that have n dimension, three lines
For journey operating mode as shown in Fig. 2, in sub thread one, each item of initial data M is candidate item, and generating item number is 1 not repeat
Candidate 1 item collection C1, every to generate one, this is transmitted to sub thread two by pipeline Pipe_C1, and sub thread two is wanted to count receiving
For 2 candidate combinations item manifold when, be returned to " 0 " signal of one Signal_0 of sub thread, pass through control wait and transmitting signal
" 0 " or " 1 " controls the progress of two threads, and by wait function control thread two sub thread one receive pipe it
After carry out, by Signal_0 and wait function control sub thread one stop after receiving sub thread two.Likewise, sub thread
The confidence level c_1 that C1 is combined and support s_1 are passed to sub thread three by two-way piping Pipe_sc1, by controlling wait
And signal " 0 " or " 1 " is transmitted to control the progress of two threads.It is equally identical side between sub thread three and sub thread one
Formula transmits Pipe_L1 and signal Signal_1, starts together with wait function to transmit data set L1 and transmitting signal " 1 "
Sub thread one.Finally, items are Candidate Set in L (n-1), and thread one generates the not repeated combination that item number is n, and thread two calculates
The confidence level and support respectively combined, thread three filter out the combination greater than threshold value, and thread is worked as with this finally obtains data set Ln
It is exported by father's thread.
Apriori algorithm is used for two or more dimension correlation calculations, and realization is by calculating two or more things
The simultaneous probability of part (support coefficient), under the premise of occurring with individual event, the probability (confidence level of another event generation
Coefficient).And calculate reliability coefficient mean value ū and support Coefficient Mean v.Its practical significance is: A event is that customer buys A
Class commodity, B event is that customer buys B class commodity, then in the result of the Apriori of A=> B, when reliability coefficient C (i, x) is greater than
When ū, support coefficient S (i, x) is bigger, and the incidence of the easier raising event B of generation of event A and policymaker need to promote
The sale of A is to drive the sale of B;When reliability coefficient is less than ū, support coefficient is bigger, and the generation of event A is more easily reduced
The incidence of event B, the two may be that competitive relation either can merge the commodity of consideration and policymaker and can be considered two
Person's package sale.
(4) related coefficient calculates
The present invention obtains the correlation between dimension two-by-two using Python array correlation calculations, related coefficient Г (i,
X) bigger, it is contacted between two dimensions closer.Increment processing is carried out to the above results as a result, and calculates all correlation combiners
Average correlation coefficient ν.
(5) the Xgboost correlation analysis based on multithreads computing, wherein Fig. 5 is shown implements according to the present invention
The Xgboost correlation parallel computation schematic diagram of mode.
To obtain the correlation between each dimension, and the speed of service is improved, the present invention is using multi-threaded parallel operation
Xgboost machine learning algorithm is calculated.By Xgboos gradient boosting algorithm, each dimension is calculated, this model
In share 8 sub threads, per thread is all to carry out Xgboost processing to each dimension, chooses 80% data as training
Collection, 20% is used as test data, and every thread can relatively direct obtain each category after Xgboost boosted tree is created
Property importance score and importance scores, the score measured feature in a model promotion decision tree building in is worth.One
A attribute it is more be used to construct decision tree in a model, its importance is just relatively higher.Importance of Attributes is by right
Each attribute in data set is calculated, and is ranked up to obtain.It is improved in single decision by each Attributes Splitting point
The amount of performance metric carrys out computation attribute importance, is responsible for weighted sum record number by node.That is to say, an attribute is to split point
Improvement performance metric is bigger (closer to root node), and weight is bigger;Selected by more boosted trees, attribute is more important.Performance degree
Amount can be the Gini purity of selection split vertexes, be also possible to other metric functions.Finally by an attribute in all promotions
Then result in tree is weighted after summation averagely, obtain importance score.
Entire parallel computation carries out in Linux unix system, by Fork function caching sub thread and father's thread.Such as
Shown in Fig. 5, the present invention uses Xgboost Python program bag, by adjusting ginseng to obtain the influence important coefficient of each dimension, son
It is result data collection to be separated into two sections, is denoted as a result, other is impact factors that thread, which is using certain dimension, " 1 " and
"0".It obtains the influence important coefficient I (i, x) between result and impact factor and sorts.By wait function make father into
Journey occurs after subprocess, and the obtained data set of subprocess is transmitted to parent process by Pipe, parent process by each dimension it
Between important coefficient summarize, the corresponding important coefficient for calculating correlation obtained by Apriori and combining.If (there are three dimension,
Then solve importance mean value two-by-two), and calculate all combination important coefficient mean value ī.In addition, the present invention for parent process do with
It guards, after guaranteeing subprocess, parent process can still carry out on backstage complete.
(6) correlativity calculation result integrates module
By support coefficient S (i, x), relative coefficient r (i, x), important coefficient I (i, x) is integrated into following formula
One relationship measurement index R (i, x), confidence level target is discrete for " 0 " and " 1 ".
R (i, x)=[S (i, x)-v]+[r (i, x)-ν]+[I (i, x)-ī] (4)
If C (i, x) > ū, C (i, x)=1;
If C (i, x) < ū, C (i, x)=0; (5)
(7) regulations are recommended to refine module
I, x are combined, when C (i, x)=1, i is proportional with x, and policymaker is by promotion i to develop x, R (i, x)
Bigger, this proportional relation is stronger;When C (i, x)=0, at competitive relation, policymaker has been reached by the binding sale of i and x by i and x
To the two synchronized development, R (i, x) is bigger, this competitive relation is stronger.
(8) recommend regulations to show and measure module with validity
Fig. 6 show the time series forecasting parallel computation schematic diagram of embodiment according to the present invention.Recommendation regulations are pressed
Management person requires sequence to show, the obtained coefficient of accuracy of Xgboost machine learning algorithm is measured as validity and is marked
It is quasi-.Such as: manager will also require to be the quick sale for pushing supermarket's fresh food, by being calculated with upper module, R(i, fresh food)
In, R(instant noodles, fresh food)Highest, and C(instant noodles, fresh food)=0, and in order to push fresh food, instant noodles and fresh food are packaged pin
It sells most beneficial for the sale for pushing fresh food;It and is as a result, other dimensions are the Xgboost of impact factor for fresh food
Machine learning model, accuracy 97%, then the recommendation regulations --- instant noodles, fresh food package sale are to push fresh food
Product sale --- validity be 97%.
(9) the time series forecasting module based on multithreads computing passes through in Linux or Unix system
Python fork module antithetical phrase, the backup of father's thread cache, then the purpose of sub thread is to obtain the time series of all dimensions
Predicted value, father's thread are then responsible for the output and visualization of prediction result.In sub thread, when the present invention uses ARIMA machine learning
Between sequence algorithm, γ is carried out to first derivation in each dimension and (is possible to carry out second order derivation or more according to the actual situation
Rank derivation), it is visualized with line chart, if obtained image is more gentle (mean value is more concentrated), is not necessarily in derivation,
Otherwise the derivation of higher order is carried out;Function auto-correlation function (ACF) after calculating derivation, partial autocorrelation function (PACF) are simultaneously visual
Change, determines parameter;Each dimension data collection of initial data is input in AEIMA time series, and is realized in each dimension
Time series forecasting.
Technical solution of the present invention specifically discloses a kind of more detailed embodiment, specific as follows:
(1) this example uses web crawlers technology by certain supermarket 2015-2018 8 class commodity week sales volume, and is stored in SQL
Database writes Python program using Pycharm compiler, imports pymysql module and connects local SQL database and reads
Data.
(2) data preprocessing module: data are balanced Error processing, the average all sales volume μ in the market of i commodity are subjective
Error bx(i commodity xth week quantity purchase-μ), commodity error bi(commodity xth week quantity purchase-μ), interactive relation deviation qi*
pxIf (having 3 days i commodity xth week in advertising campaign, relationship deviation is 3/7) γ(i,x)=μ+bx+bi+qi*px (1)。
(3) multithreads computing model one is established in Linux4 core operating system --- Apriori correlation point
Analysis:, 8 sub threads, which are respectively scanned data, is separated into 8 data sets, and every scanning is to one in the data for formulating section
Father's thread is just input to by pipeline: (0,100], (100,200], (200,300], (300,400], (400,500], (500,
600], (600,700], (and 700,800], father's thread receives the data of sub thread and exports storage conduct into SQL database
M。
(3) present invention is write using Python and is established the Apriori correlation calculations mould based on multithreads computing
Type.One main purpose of sub thread is to generate data set and composition of relations;Sub thread two is for calculating all dimensions of text file
Between support correlation and confidence level correlation;Sub thread is third is that be set according to actual conditions related support and confidence level
Threshold tau and μ, filter out the combination greater than the threshold value.Share 8 dimensions, in sub thread one, each item of initial data M is
Candidate item, generating item number is 1 not repeat 1 item collection C1 of candidate, and every to generate one, this is transmitted to son by pipeline Pipe_C1
Thread two, sub thread two are returned to " 0 " signal of one Signal_0 of sub thread, pass through control when receiving the group item that item number is 2
Wait processed and transmitting signal " 0 " or " 1 " control thread two in sub-line to control the progress of two threads, and by wait function
Journey one carries out after receiving pipe, controls sub thread one by Signal_0 and wait function and stops after receiving sub thread two
Only.Likewise, the confidence level c_1 that C1 is combined and support s_1 are passed to sub thread by sub thread two-way piping Pipe_sc1
Three, the progress of two threads is controlled by control wait and transmitting signal " 0 " or " 1 ".Sub thread three and sub thread one it
Between be equally that identical mode transmits Pipe_L1 and signal Signal_1, come transmit data set L1 and transmitting signal " 1 " with
Wait function starts sub thread one together.Finally, items are Candidate Set in L (n-1), and it is 8 not weigh that thread one, which generates item number,
Multiple combination, thread two calculates each combined confidence level and support filters out the combination greater than threshold value, and thread is with this when final
It is exported and is sorted by father's thread to data set Ln.Such as: generate 2 item collection of candidate " fresh food → instant food " that item number is two
∈ C_2 passes to sub thread two with signal Singal_1 by pipeline Pipe_2 together, and it is credible that sub thread two calculates this combination
Spend C(fresh food, instant food)With support S(fresh food, instant food), line is passed to signal Singal_1 by pipeline Pipe_sc2 together
Journey three, C(fresh food, instant food)> μ, and S(fresh food, instant food)> τ, then the subset by " fresh food → instant food " as L2 passes through
Pipe_L2 passes to thread one together with Signal_1, while being stored in output " fresh food → facilitate food in father's thread
Product ", C(fresh food, instant food)、S(fresh food, instant food)Database is stored, thread scans at the beginning and establish the combination that item number is 3, such as
" fresh food → instant food → articles for daily use ".Finally, bus journey is to all combination needles after completing the combination that item number is 8
To support S(x,i)It is ranked up.
(3) by calculating the correlation r combined between dimension two-by-two(i,x), finally average as the combination items
Between phase
Guan Xing.Such as:
r(fresh food, instant food, articles for daily use)=[r(fresh food, articles for daily use)+r(fresh food, instant food)+r(instant food, articles for daily use)]/3。
By Xgboos gradient boosting algorithm, each dimension is calculated, 8 sub threads are shared in this model, each
Thread is all to carry out Xgboost processing to each dimension, chooses 80% data as training set, 20% and is used as test data, often
Thread can relatively direct obtain the importance score and importance of each attribute after Xgboost boosted tree is created
Score, which has measured is worth in the promotion decision tree building of feature in a model.One attribute is more to be used in mould
Decision tree is constructed in type, its importance is just relatively higher.Importance of Attributes is carried out by each attribute concentrated to data
It calculates, and is ranked up to obtain.The amount of performance metric is improved come computation attribute by each Attributes Splitting point in single decision
Importance is responsible for weighted sum record number by node.That is to say, an attribute to split point improve performance metric it is bigger (closer to
Root node), weight is bigger;Selected by more boosted trees, attribute is more important.Performance metric can be selection split vertexes
Gini purity is also possible to other metric functions.Result of the attribute in all boosted trees is finally weighted summation
Then average afterwards, obtain importance score.Entire parallel computation carries out in Linux unix system, is delayed by Fork function
Deposit sub thread and father's thread.As shown in figure 5, the present invention uses Xgboost Python program bag, by adjusting ginseng to obtain each dimension
Influence important coefficient, it is that result data collection is separated into two as a result, other is impact factors that sub thread, which is using certain dimension,
A section, is denoted as " 1 " and " 0 ".Obtain the influence important coefficient I between result and impact factor(i,x)And it sorts.Pass through
Wait function occurs that parent process after subprocess, and the obtained data set of subprocess is transmitted to parent process by Pipe,
Parent process summarizes the important coefficient between each dimension, the corresponding importance system for calculating correlation obtained by Apriori and combining
Number.(if there are three dimensions, solve importance mean value two-by-two), and calculate all combination important coefficient mean value ī.In addition, this
Parent process is done to guard in invention, and after guaranteeing subprocess, parent process can still carry out on backstage complete.Such as, thread
One, using total sales volume as predicted value, is averaged, and is denoted as 1 greater than sales volume average value, and being less than sales volume average value is 0, obtained category
Property importance is as shown in Figure 7:
(5) relationship measurement index relationship measurement index R (i, x) is sought, confidence level target is discrete for " 0 " and " 1 "
R(i,x)=[S(i,x)-v]+[r(i,x)-ν]+[I(i,x)-ī] (4)
If C(i,x)>ū, C(i, x)=1;If C(i,x)<ū, C(i, x)=0;
(7) regulations are recommended to extract;I → x is combined, when C (i, x)=1, i is proportional with x, and policymaker, which passes through, to push away
I is moved to develop x, R (i, x) is bigger, this proportional relation is stronger;When C (i, x)=0, at competitive relation, policymaker passes through by i and x
The binding sale of i and x has reached the two synchronized development, and R (i, x) is bigger, this competitive relation is stronger.
(8) regulations will be recommended to require sequence to show according to manager, Xgboost machine learning algorithm is obtained accurate
Coefficient is spent as validity measurement standard.Such as: manager will also require to be the quick sale for pushing supermarket's fresh food, pass through
It is calculated with upper module, R(i, fresh food)In, R(instant food, fresh food)Highest, and C(instant food, fresh food)=0, and in order to push fresh food,
By instant noodles and fresh food package sale most beneficial for the sale for pushing fresh food;And be for fresh food as a result, its
His dimension is the Xgboost machine learning model of impact factor, accuracy 97%, then the recommendation regulations --- instant noodles, life
Fresh food package sale is to push fresh food to sell --- validity be 97%.
In Linux or Unix system, backed up by Python fork module antithetical phrase, father's thread cache, then sub-line
The purpose of journey is to obtain the time series forecasting value of all dimensions, and father's thread is then responsible for the output and visualization of prediction result.
In sub thread, the present invention uses ARIMA machine learning time series algorithm, and γ is carried out first derivation (root in each dimension
Be possible to carry out second order derivation or more derivation according to actual conditions), it is visualized with line chart, if obtained image compared with
For gentle (mean value is more concentrated), then it is not necessarily to otherwise carry out the derivation of higher order in derivation;Function auto-correlation after calculating derivation
Function (ACF), partial autocorrelation function (PACF) simultaneously visualize, and determine parameter;Each dimension data collection of initial data is inputted
Into AEIMA time series, and realize the time series forecasting in each dimension.It finally obtains under each Sales Volume of Commodity and total sales volume
One station predicted value, to verify recommendation regulations from now on.
It should be appreciated that the embodiment of the present invention can be by computer hardware, the combination of hardware and software or by depositing
The computer instruction in non-transitory computer-readable memory is stored up to be effected or carried out.Standard volume can be used in the method
Journey technology-includes that the non-transitory computer-readable storage media configured with computer program is realized in computer program,
In configured in this way storage medium computer is operated in a manner of specific and is predefined --- according in a particular embodiment
The method and attached drawing of description.Each program can with the programming language of level process or object-oriented come realize with department of computer science
System communication.However, if desired, the program can be realized with compilation or machine language.Under any circumstance, which can be volume
The language translated or explained.In addition, the program can be run on the specific integrated circuit of programming for this purpose.
In addition, the operation of process described herein can be performed in any suitable order, unless herein in addition instruction or
Otherwise significantly with contradicted by context.Process described herein (or modification and/or combination thereof) can be held being configured with
It executes, and is can be used as jointly on the one or more processors under the control of one or more computer systems of row instruction
The code (for example, executable instruction, one or more computer program or one or more application) of execution, by hardware or its group
It closes to realize.The computer program includes the multiple instruction that can be performed by one or more processors.
Further, the method can be realized in being operably coupled to suitable any kind of computing platform, wrap
Include but be not limited to PC, mini-computer, main frame, work station, network or distributed computing environment, individual or integrated
Computer platform or communicated with charged particle tool or other imaging devices etc..Each aspect of the present invention can be to deposit
The machine readable code on non-transitory storage medium or equipment is stored up to realize no matter be moveable or be integrated to calculating
Platform, such as hard disk, optical reading and/or write-in storage medium, RAM, ROM, so that it can be read by programmable calculator, when
Storage medium or equipment can be used for configuration and operation computer to execute process described herein when being read by computer.This
Outside, machine readable code, or part thereof can be transmitted by wired or wireless network.When such media include combining microprocessor
Or other data processors realize steps described above instruction or program when, invention as described herein including these and other not
The non-transitory computer-readable storage media of same type.When methods and techniques according to the present invention programming, the present invention
It further include computer itself.
Computer program can be applied to input data to execute function as described herein, to convert input data with life
At storing to the output data of nonvolatile memory.Output information can also be applied to one or more output equipments as shown
Device.In the preferred embodiment of the invention, the data of conversion indicate physics and tangible object, including the object generated on display
Reason and the particular visual of physical objects are described.
The above, only presently preferred embodiments of the present invention, the invention is not limited to above embodiment, as long as
It reaches technical effect of the invention with identical means, all within the spirits and principles of the present invention, any modification for being made,
Equivalent replacement, improvement etc., should be included within the scope of the present invention.Its technical solution within the scope of the present invention
And/or embodiment can have a variety of different modifications and variations.