[go: up one dir, main page]

CN112966210B - A method and device for storing user data - Google Patents

A method and device for storing user data Download PDF

Info

Publication number
CN112966210B
CN112966210B CN201911272851.0A CN201911272851A CN112966210B CN 112966210 B CN112966210 B CN 112966210B CN 201911272851 A CN201911272851 A CN 201911272851A CN 112966210 B CN112966210 B CN 112966210B
Authority
CN
China
Prior art keywords
index
users
distribution
rank
user
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911272851.0A
Other languages
Chinese (zh)
Other versions
CN112966210A (en
Inventor
黄建杰
黄明星
赖晨东
温道新
晁子亮
周亚楠
李银锋
刘婷婷
周彬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201911272851.0A priority Critical patent/CN112966210B/en
Publication of CN112966210A publication Critical patent/CN112966210A/en
Application granted granted Critical
Publication of CN112966210B publication Critical patent/CN112966210B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/15Correlation function computation including computation of convolution operations
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • G06F17/10Complex mathematical operations
    • G06F17/18Complex mathematical operations for evaluating statistical data, e.g. average values, frequency distributions, probability functions, regression analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • Entrepreneurship & Innovation (AREA)
  • General Engineering & Computer Science (AREA)
  • Finance (AREA)
  • Algebra (AREA)
  • Databases & Information Systems (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Operations Research (AREA)
  • Accounting & Taxation (AREA)
  • Game Theory and Decision Science (AREA)
  • Computing Systems (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Complex Calculations (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a method and a device for storing user data, and relates to the technical field of computers. The method comprises the steps of carrying out rank distribution conversion on index values of all users, calculating rank ranking rates of all users, carrying out distribution mapping on the rank ranking rates of all users based on an inverse function of a cumulative distribution function to obtain mapping result values of all users, determining index expression levels of all users based on mapping threshold values and the mapping result values of all users, and storing the mapping result values and the index expression levels of all users to a big data platform for a database system to call. The implementation mode can solve the technical problem that data of each user stored in a big data platform are not accurate enough.

Description

Method and device for storing user data
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and apparatus for storing user data.
Background
A large number of third-party merchants are provided with shops on an electronic commerce platform, a large number of sales behaviors every day bring new mass data, and the electronic commerce platform needs to judge the service performance level of the third-party merchants according to feedback of some data indexes so as to ensure that platform users can enjoy high-quality services. In the process of judging the service performance level of a third party merchant, a standardized mode is mainly adopted to judge the index performance level for a certain data index at present.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:
because the data index can have the problems of small samples, off-state of index distribution, extreme value abnormality and the like, the standard mode is adopted to identify the index expression level, so that the identification result is inaccurate, and then the data of each user stored in a large data platform is inaccurate, and the use of other service systems is affected.
Disclosure of Invention
In view of the above, the embodiments of the present invention provide a method and an apparatus for storing user data, so as to solve the technical problem that the data of each user stored in a large data platform is not accurate enough.
To achieve the above object, according to one aspect of the embodiments of the present invention, there is provided a method of storing user data, including:
performing rank distribution conversion on index values of all users, and calculating rank ranking rates of all the users;
Performing distribution mapping on the rank ranking rates of all the users based on an inverse function of the cumulative distribution function to obtain mapping result values of all the users;
Determining the index performance level of each user based on the mapping threshold value and the mapping result value of each user;
and storing the mapping result values and the index performance levels of the users to a big data platform for the database system to call.
Optionally, performing rank distribution conversion on the index values of the respective users includes:
And aiming at better index value performance and smaller rank ranking, performing rank ranking on the index values of the users.
Optionally, rank ordering the index values of the respective users with the aim of better index value performance and smaller rank ranking includes:
If the index value is expressed as the larger index value, the rank ranking corresponding to the index value is smaller, and each user is ranked according to the sequence from the larger index value to the smaller index value;
and if the index value is expressed as smaller, the corresponding rank is smaller, and each user is ranked according to the order of the index value from small to large.
Optionally, for any one user, calculating the rank ranking rate of the user by adopting the following method:
and calculating the rank ranking rate of the users according to the total number of the users in the ranking and the rank ranking of the users in the ranking.
Optionally, performing distribution mapping on rank ranking rates of the respective users based on an inverse function of the cumulative distribution function, including:
if the distribution of the index values of each user is in a bias distribution, performing distribution mapping on the rank ranking rate of each user based on an inverse function of a cumulative distribution function of the index distribution;
And if the distribution of the index values of the users is normal distribution, performing distribution mapping on the rank ranking rate of the users based on the inverse function of the cumulative distribution function of the bell-shaped distribution.
Optionally, before performing rank distribution conversion on the index values of the respective users, the method further includes:
Judging whether the number of users with the same index value is greater than or equal to a compression threshold value;
If yes, compressing the users with the same index value into a user group, and directly mapping the index value to the user group.
Optionally, before performing rank distribution conversion on the index values of the respective users, the method further includes:
Judging whether the total number of users is smaller than a sample size threshold value;
If yes, the index value of each user is subjected to weighted correction through Bayes.
In addition, according to another aspect of an embodiment of the present invention, there is provided an apparatus for storing user data, including:
the ranking module is used for carrying out rank distribution conversion on index values of all users and calculating rank ranking rates of all the users;
The mapping module is used for carrying out distribution mapping on the rank ranking rates of the users based on the inverse function of the cumulative distribution function to obtain mapping result values of the users;
The identification module is used for determining the index performance level of each user based on the mapping threshold value and the mapping result value of each user;
and the storage module is used for storing the mapping result values and the index performance levels of the users to a big data platform for the database system to call.
Optionally, the ranking module is further configured to:
And aiming at better index value performance and smaller rank ranking, performing rank ranking on the index values of the users.
Optionally, the ranking module is further configured to:
If the index value is expressed as the larger index value, the rank ranking corresponding to the index value is smaller, and each user is ranked according to the sequence from the larger index value to the smaller index value;
and if the index value is expressed as smaller, the corresponding rank is smaller, and each user is ranked according to the order of the index value from small to large.
Optionally, the ranking module is further configured to calculate, for any one user, a rank ranking rate of the user by adopting the following method:
and calculating the rank ranking rate of the users according to the total number of the users in the ranking and the rank ranking of the users in the ranking.
Optionally, the mapping module is further configured to:
if the distribution of the index values of each user is in a bias distribution, performing distribution mapping on the rank ranking rate of each user based on an inverse function of a cumulative distribution function of the index distribution;
And if the distribution of the index values of the users is normal distribution, performing distribution mapping on the rank ranking rate of the users based on the inverse function of the cumulative distribution function of the bell-shaped distribution.
Optionally, the ranking module is further configured to:
Before carrying out rank distribution conversion on index values of all users, judging whether the number of users with the same index value is greater than or equal to a compression threshold value;
If yes, compressing the users with the same index value into a user group, and directly mapping the index value to the user group.
Optionally, the ranking module is further configured to:
Before carrying out rank distribution conversion on index values of all users, judging whether the total number of the users is smaller than a sample size threshold value;
If yes, the index value of each user is subjected to weighted correction through Bayes.
According to another aspect of an embodiment of the present invention, there is also provided an electronic device including:
One or more processors;
storage means for storing one or more programs,
The one or more programs, when executed by the one or more processors, cause the one or more processors to implement the methods of any of the embodiments described above.
According to another aspect of an embodiment of the present invention, there is also provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method according to any of the embodiments described above.
The embodiment of the invention has the advantages that the technical means of calculating the rank ranking rate of each user and carrying out distribution mapping on the rank ranking rate of each user so as to determine the index expression level of each user is adopted, so that the technical problem that the data of each user stored in a big data platform in the prior art is inaccurate is solved. According to the embodiment of the invention, the index value is subjected to rank distribution conversion, so that the difference of different data sizes in a service scene is eliminated, the difference of different data distributions is processed, the problem of extreme value abnormality is solved, a data result has very strong robustness, and then the previous information loss is compensated by carrying out distribution mapping based on an inverse function of a cumulative distribution function, so that the data information loss caused by rank distribution conversion is reduced, and a better data characteristic expression effect is achieved, so that the user data stored in a big data platform is more accurate.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
FIG. 1 is a schematic diagram of the main flow of a method of storing user data according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of the main flow of a method of storing user data according to one referenceable embodiment of the invention;
FIG. 3 is a schematic diagram of the main flow of a method of storing user data according to another referenceable embodiment of the invention;
FIG. 4 is a schematic diagram of the main flow of a method of storing user data according to yet another referenceable embodiment of the invention;
FIG. 5 is a schematic diagram of the main modules of an apparatus for storing user data according to an embodiment of the present invention;
FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 7 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
At present, the index expression level is judged mainly in a standardized mode, and the calculation method is as follows:
wherein Y is the index expression level, and the larger Y is the better the data index expression of the merchant. X is the original value of the index of the merchant, Representing industry average level, sigma is industry standard deviation, and the calculating method is as follows:
Judging the index expression level by the standardized method has the following problems:
1) The distribution requirement on the data is high, and the method is better in performance only when the original data distribution approximates to normal distribution and no extreme value abnormal point exists.
2) However, when the original data distribution belongs to the bias distribution (for example, the data has a serious left-leaning phenomenon), the calculated index expression level Y also tends to be the bias distribution, which is not in line with the actual situation, and a large amount of index expression levels Y lower or higher than the average level may occur. In this case, a merchant at a common level will have a very high performance score, or a very low performance score, at that level.
3) When the original data has extreme value abnormal points, an abnormal extreme value can influence the calculation of the industry standard deviation sigma, and only one abnormal data can change the index expression level of all other merchants, which is not in accordance with the actual business condition.
4) Considering only the original value of the index, the problem of small samples may result in insufficient reliability of the index of the merchant data, for example, the data when a merchant makes only one sale may not be enough to prove that the index of the merchant is performing well or poorly.
The method for storing the user data can solve the problems of small samples, deviation of index distribution, extreme value abnormality and the like of the data indexes, so that a steady index expression level is obtained.
Fig. 1 is a schematic diagram of a main flow of a method of storing user data according to an embodiment of the present invention. As an embodiment of the present invention, as shown in fig. 1, the method of storing user data may include:
Step 101, performing rank distribution conversion on index values of all users, and calculating rank ranking rates of all the users.
The dimensions of different indexes are different, the distribution difference is extremely large, and the problem of extreme value abnormal points exists. Therefore, the embodiment of the invention converts the index value into the rank statistic in the non-parameter statistics, and then calculates the rank ranking rate so as to eliminate the dimension difference and process the original distribution difference and the abnormal points. For merchants on the e-commerce platform, different indexes can be configured according to service scenes, for example, indexes such as commodity quality satisfaction, material speed satisfaction, commodity description satisfaction, seller service satisfaction and the like can be included in an evaluation scene, indexes such as 30-second response rate, average response time and the like can be included in a consultation scene, indexes such as refund repair rate, after-sale service time and the like can be included in an after-sale scene, indexes such as 48-hour package passing rate, date-of-sale arrival rate and the like can be included in a logistics scene, and indexes such as transaction dispute rate, self-service transaction completion rate, dispute handling and time-following rate and the like can be included in a dispute scene.
Optionally, performing rank distribution conversion on the index values of the users comprises performing rank ordering on the index values of the users with the aim that the better the index value shows, the smaller the rank ranking. That is, the better the index value X 1, the smaller the Rank X1, and the higher the Rank ranking rate X 2.
Optionally, the rank ordering of the index values of the users is performed with the aim that the index value is better in performance and the rank ranking is smaller, wherein the rank ordering comprises the steps of ordering the users according to the order of the index values from large to small if the index value is higher in performance and the rank ordering comprises the steps of ordering the users according to the order of the index values from small to large if the index value is lower in performance. Thus, the better the index value X 1 is, the smaller the Rank X1 is, and the higher the Rank rate X 2 is.
Optionally, for any one user, calculating the rank ranking rate of the user by adopting the following method:
and calculating the rank ranking rate of the users according to the total number of the users in the ranking and the rank ranking of the users in the ranking.
For example, rank ranking rate X 2 may be calculated using the following formula:
where Rank X1 represents the Rank ranking of the user in the Rank order.
The better index value X 1 in the industry has a larger rank ranking rate X 2, the result eliminates the dimension difference of different data sources, is not influenced by data distribution bias, and has strong anti-interference performance on the problem of abnormal polar values.
Therefore, the original index value is converted into the rank statistic in the non-parameter statistics in the step 101, so that the dimension difference is eliminated, and the original distribution difference and the extreme value outlier are effectively processed. Extremum outliers refer to extremely far from the data mean, being severely outliers, and are mainly represented by "unreasonable values" due to special circumstances, for example, transaction dispute rate=3000%, after-sales service duration=4000 h, etc. Rank statistic is a sequential statistic for non-parametric statistical tests, which is a statistic of the number of bits (rank) occupied in a population of samples based on the size of the sample value. In an actual scenario, it may be equivalent to an index ranking.
And 102, performing distribution mapping on the rank ranking rates of the users based on an inverse function of the cumulative distribution function to obtain mapping result values of the users.
After the rank conversion of step 101, the index difference is eliminated, but there is a problem that the information part is lost. Therefore, a distribution with a strong expression capability for the original index value is found through step 102, and positive and negative feedback information of the index is deduced by combining with the actual service scene.
After the target distribution is determined, since the rank ranking rate X 2 after rank conversion is approximately uniformly distributed, the distribution of the rank ranking rate X 2 can be converted into the target distribution by using the inverse function mapping method of the cumulative distribution function F of the target distribution, and the specific formula is as follows
X 3=F-1(X2),F-1 represents the inverse function of F
Inverse function distribution mapping if the cumulative distribution function F is a continuous strictly increasing function, then there is its inverse function F -1 (y), y ε [0,1]. The inverse of the cumulative distribution function may be used to generate random variables that obey the random distribution. Assuming that F X (X) is the cumulative distribution function of the probability distribution X, and there is an inverse functionIf a is a random variable uniformly distributed over the [0,1 ] interval, thenObeying the distribution of X.
And defining a threshold value of the mapping result value X 3 according to the actual service scene, and observing whether the mapping result value X 3 meets the positive and negative feedback of the service on the index in the service scene. If yes, mapping is carried out by the target distribution, if not, the target distribution is searched again until the mapping result value X 3 meets the positive and negative feedback of the service to the index in the service scene. The positive and negative feedback refers to obtaining feedback information according to input information. In the embodiment of the invention, the result value obtained by performing rank distribution conversion and inverse function distribution mapping on the original index value of the index is shown as positive in the scene when the result value is larger than 0, and the larger the result value is, the better the index is shown in the scene and vice versa.
Optionally, if the distribution of the index values of each user is a bias distribution, the rank ranking rates of each user are mapped based on an inverse function of a cumulative distribution function of the index distribution, and if the distribution of the index values of each user is a normal distribution, the rank ranking rates of each user are mapped based on an inverse function of a cumulative distribution function of a bell-shaped distribution. The bias distribution is a data frequency distribution with asymmetric left and right distribution curves, which is opposite to the normal distribution, and is a continuous random variable probability distribution. The degree of skewness can be measured by calculation of kurtosis and skewness.
Typically, for severe skewness of the original index value distribution (e.g., transaction dispute), an index distribution map may be employed, as follows:
X3=0.7*ln(X2)
Typically, for an original index value distribution that is more symmetric (e.g., the space-time yield), a bell-shaped distribution map may be employed, as follows:
in the embodiment of the invention, the process of converting rank distribution into inverse function distribution mapping is collectively called positive and negative feedback, namely, judging whether the index is positive or negative feedback for a user according to the original value of the user index, and judging the feedback intensity.
And step 103, determining the index performance level of each user based on the mapping threshold value and the mapping result value of each user.
In general, the size of the mapping threshold may be set according to the target distribution and the confidence interval. Taking the 95% confidence interval as an example, if the target distribution is an exponential distribution, the mapping threshold is [ -2,0], and if the target distribution is a bell-shaped distribution, the mapping threshold is [ -2,2]. After the mapping result values are obtained, the index performance level of each user is analyzed based on the corresponding mapping threshold values.
Alternatively, the recorded results of the performance level may be classified into two types, one type is a feedback value, one numerical value is used to represent the intensity of positive and negative feedback, more than 0 indicates that the index performs more excellently, and vice versa, and the other type is a feedback type, and one classification result is used to represent judgment of the index, such as good, medium, bad, and inferior.
And 104, storing the mapping result values and the index performance levels of the users to a big data platform for the database system to call.
After the mapping result value and the index expression level of each user are obtained, the mapping result value and the index expression level can be stored in a large data platform for direct calling of a database system, so that the feature extraction requirements of other models are met.
According to the various embodiments described above, it can be seen that the present invention solves the technical problem in the prior art that the data of each user stored in the big data platform is not accurate enough by calculating the rank ranking rate of each user and performing distribution mapping on the rank ranking rate of each user, thereby determining the index performance level of each user. According to the embodiment of the invention, the index value is subjected to rank distribution conversion, so that the difference of different data sizes in a service scene is eliminated, the difference of different data distributions is processed, the problem of extreme value abnormality is solved, a data result has very strong robustness, and then the previous information loss is compensated by carrying out distribution mapping based on an inverse function of a cumulative distribution function, so that the data information loss caused by rank distribution conversion is reduced, and a better data characteristic expression effect is achieved, so that the user data stored in a big data platform is more accurate.
Fig. 2 is a schematic diagram of the main flow of a method of storing user data according to one referenceable embodiment of the invention. As still another embodiment of the present invention, the method of storing user data may include the steps of:
Step 201, judging whether the number of users with the same index value is greater than or equal to a compression threshold, if yes, executing step 202, and if not, executing step 203.
Step 202, compressing the users with the same index value into a user group, and directly mapping the index value to the user group.
Before performing rank distribution conversion, first consider the node problem of rank distribution, that is, whether index values of a large number of users are the same. If such special nodes exist, the same index value is directly mapped to a suitable Rank X1 by means of node compression. That is, users having the same index value are compressed into one user group, and the index value is directly mapped to the user group. When the index value is converted into Rank distribution, the user group is used as a user to Rank, so that the Rank ranks X1 of the users in the user group are the same, and the Rank ranking rate X 2 is also the same.
And 203, aiming at better index value performance and smaller rank ranking, and performing rank ranking on the index values of the users.
Optionally, the rank ordering of the index values of the users is performed with the aim that the index value is better in performance and the rank ranking is smaller, wherein the rank ordering comprises the steps of ordering the users according to the order of the index values from large to small if the index value is higher in performance and the rank ordering comprises the steps of ordering the users according to the order of the index values from small to large if the index value is lower in performance. Thus, the better the index value X 1 is, the smaller the Rank X1 is, and the higher the Rank rate X 2 is.
Step 204, calculating rank ranking rates of the users.
Rank ranking rate X 2 may be calculated using the following formula:
where Rank X1 represents the Rank ranking of the user in the Rank order.
And 205, performing distribution mapping on the rank ranking rates of the users based on an inverse function of the cumulative distribution function to obtain mapping result values of the users.
And if the distribution of the index values of the users is normal distribution, the distribution mapping of the rank ranking rates of the users is performed based on the inverse function of the cumulative distribution function of the bell-shaped distribution.
And 206, determining the index performance level of each user based on the mapping threshold value and the mapping result value of each user.
Step 207, storing the mapping result value and the index expression level of each user to a big data platform for the database system to call.
In addition, the implementation of the method for storing user data according to the present invention is described in detail in the above-described method for storing user data, and thus, the description thereof will not be repeated here.
Fig. 3 is a schematic diagram of the main flow of a method of storing user data according to another referenceable embodiment of the invention. As another embodiment of the present invention, the method of storing user data may include the steps of:
step 301, judging whether the total number of users is smaller than the sample size threshold, if yes, executing step 302, and if not, executing step 303.
In step 302, the index values of the respective users are subjected to weighted correction by bayesian.
If the sample size of the index is low (i.e. the total number of users is small, generally, there are many users with many sample sizes), the reliability of the index result provided by the index is low, and the fluctuation of the result value is large, which is not beneficial to identifying the index expression level. Therefore, in step 302, the result of the small sample data is corrected according to the bayesian theory, so as to improve the overall stability of the model.
Optionally, the bayesian weighted correction formula is as follows:
mc=nt-n0
Wherein X 0 represents the index value before correction, n 0 represents the sample size of the index, P represents the average index value of the industry, the parameter n t is determined by the actual business scene, and represents the minimum sample size with credible index result, and the default value can be set to 15, 20, 30, 40 or 50.
Bayesian theorem refers to one of the theorems of probability theory, describing the probability of occurrence of an event under known conditions. The bayesian weighted correction refers to adjusting the index result according to the actual sample data under the condition that the index result is assumed to be an industry average value. The embodiment of the invention corrects the small sample data on the basis of Bayesian weighting so as to solve the problems of low reliability and insufficient service value of the small sample data.
Step 303, rank ordering the index values of the respective users with the aim of better index value performance and smaller rank ranking.
Optionally, the rank ordering of the index values of the users is performed with the aim that the index value is better in performance and the rank ranking is smaller, wherein the rank ordering comprises the steps of ordering the users according to the order of the index values from large to small if the index value is higher in performance and the rank ordering comprises the steps of ordering the users according to the order of the index values from small to large if the index value is lower in performance. Thus, the better the index value X 1 is, the smaller the Rank X1 is, and the higher the Rank rate X 2 is.
Step 304, calculating rank ranking rate of each user.
Rank ranking rate X 2 may be calculated using the following formula:
where Rank X1 represents the Rank ranking of the user in the Rank order.
And 305, performing distribution mapping on the rank ranking rates of the users based on an inverse function of the cumulative distribution function to obtain mapping result values of the users.
For severe bias of the original index value distribution (e.g., trade dispute), an index distribution map may be used, as follows:
X3=0.7*ln(X2)
for the original index value distribution to be more symmetric (e.g., the space-time arrival rate), a bell-shaped distribution map may be employed, as follows:
Step 306, determining the index performance level of each user based on the mapping threshold value and the mapping result value of each user.
And step 307, storing the mapping result values and the index performance levels of the users to a big data platform for the database system to call.
In addition, in another embodiment of the present invention, the method for storing user data is described in detail in the above description, and thus, the description is not repeated here.
Fig. 4 is a schematic diagram of the main flow of a method of storing user data according to still another referenceable embodiment of the invention. As still another embodiment of the present invention, the method of storing user data may include the steps of:
Step 401, calculating industry mean, variance and extremum of index item.
Step 402, judging whether the industry mean, variance and extremum of the index item have significant differences, if not, ending, and if so, executing step 403.
The selected index may provide valid information for subsequent use only if the index has information differences. Accordingly, in step 402, it is determined whether industry information for the indicator is sufficient, eliminating logic spurious, to improve overall performance of the model.
In the embodiment of the invention, whether the industry mean, variance and extremum of the index item have significant differences is mainly considered, and the judgment logic is as follows:
Industry average
Industry variance
Industry extremum X t = max (X) -min (X) < threshold C
Wherein the threshold A, B, C is derived from an actual data distribution evaluation. Default values of A, B, C, for example, may be 0, 0.0001, 0.01.
If the three conditions are satisfied at the same time, whether the industry mean, variance and extremum of the index item have significant differences is described.
Step 403, rank ordering the index values of the users with the aim of better index value performance and smaller rank ranking.
Step 404, calculating rank ranking rate of each user.
And step 405, performing distribution mapping on the rank ranking rates of the users based on an inverse function of the cumulative distribution function to obtain mapping result values of the users.
Step 406, determining the index performance level of each user based on the mapping threshold value and the mapping result value of each user.
And step 407, storing the mapping result values and the index performance levels of the users to a big data platform for the database system to call.
In addition, in still another embodiment of the present invention, the method for storing user data has been described in detail in the above-described method for storing user data, and thus, the description thereof will not be repeated here.
Fig. 5 is a schematic diagram of main modules of an apparatus for storing user data according to an embodiment of the present invention, and as shown in fig. 5, the apparatus 500 for storing user data includes a ranking module 501, a mapping module 502, an identification module 503, and a storage module 504. The ranking module 501 is used for performing rank distribution conversion on index values of all users and calculating rank ranking rates of all users, the mapping module 502 is used for performing distribution mapping on the rank ranking rates of all users based on an inverse function of a cumulative distribution function to obtain mapping result values of all users, the identification module 503 is used for determining index performance levels of all users based on mapping threshold values and the mapping result values of all users, and the storage module 504 is used for storing the mapping result values and the index performance levels of all users to a big data platform for a database system to call.
Optionally, the ranking module 501 is further configured to:
And aiming at better index value performance and smaller rank ranking, performing rank ranking on the index values of the users.
Optionally, the ranking module 501 is further configured to:
If the index value is expressed as the larger index value, the rank ranking corresponding to the index value is smaller, and each user is ranked according to the sequence from the larger index value to the smaller index value;
and if the index value is expressed as smaller, the corresponding rank is smaller, and each user is ranked according to the order of the index value from small to large.
Optionally, the ranking module 501 is further configured to calculate, for any one user, a rank ranking rate of the user by:
and calculating the rank ranking rate of the users according to the total number of the users in the ranking and the rank ranking of the users in the ranking.
Optionally, the mapping module 502 is further configured to:
if the distribution of the index values of each user is in a bias distribution, performing distribution mapping on the rank ranking rate of each user based on an inverse function of a cumulative distribution function of the index distribution;
And if the distribution of the index values of the users is normal distribution, performing distribution mapping on the rank ranking rate of the users based on the inverse function of the cumulative distribution function of the bell-shaped distribution.
Optionally, the ranking module 501 is further configured to:
Before carrying out rank distribution conversion on index values of all users, judging whether the number of users with the same index value is greater than or equal to a compression threshold value;
If yes, compressing the users with the same index value into a user group, and directly mapping the index value to the user group.
Optionally, the ranking module 501 is further configured to:
Before carrying out rank distribution conversion on index values of all users, judging whether the total number of the users is smaller than a sample size threshold value;
If yes, the index value of each user is subjected to weighted correction through Bayes.
According to the various embodiments described above, it can be seen that the present invention solves the technical problem in the prior art that the data of each user stored in the big data platform is not accurate enough by calculating the rank ranking rate of each user and performing distribution mapping on the rank ranking rate of each user, thereby determining the index performance level of each user. According to the embodiment of the invention, the index value is subjected to rank distribution conversion, so that the difference of different data sizes in a service scene is eliminated, the difference of different data distributions is processed, the problem of extreme value abnormality is solved, a data result has very strong robustness, and then the previous information loss is compensated by carrying out distribution mapping based on an inverse function of a cumulative distribution function, so that the data information loss caused by rank distribution conversion is reduced, and a better data characteristic expression effect is achieved, so that the user data stored in a big data platform is more accurate.
In addition, the specific implementation of the apparatus for storing user data according to the present invention is described in detail in the above method for storing user data, and thus the description thereof will not be repeated here.
Fig. 6 illustrates an exemplary system architecture 600 of a method of storing user data or an apparatus storing user data to which embodiments of the present invention may be applied.
As shown in fig. 6, the system architecture 600 may include terminal devices 601, 602, 603, a network 604, and a server 605. The network 604 is used as a medium to provide communication links between the terminal devices 601, 602, 603 and the server 605. The network 604 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 605 via the network 604 using the terminal devices 601, 602, 603 to receive or send messages, etc. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 601, 602, 603.
The terminal devices 601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 605 may be a server providing various services, such as a background management server (by way of example only) providing support for shopping-type websites browsed by users using terminal devices 601, 602, 603. The background management server may analyze and process the received data such as the article information query request, and feedback the processing result (e.g., the target push information, the article information—only an example) to the terminal device.
It should be noted that, the method for storing user data provided by the embodiment of the present invention is generally performed by the server 605, and accordingly, the device for storing user data is generally provided in the server 605. The method for storing user data provided by the embodiment of the present invention may also be performed by the terminal devices 601, 602, 603, and accordingly, the apparatus for storing user data may be provided in the terminal devices 601, 602, 603.
It should be understood that the number of terminal devices, networks and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 7, there is illustrated a schematic diagram of a computer system 700 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU) 701, which can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM703, various programs and data required for the operation of the system 700 are also stored. The CPU 701, ROM 702, and RAM703 are connected to each other through a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.
Connected to the I/O interface 705 are an input section 706 including a keyboard, a mouse, and the like, an output section 707 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, a speaker, and the like, a storage section 708 including a hard disk, and the like, and a communication section 709 including a network interface card such as a LAN card, a modem, and the like. The communication section 709 performs communication processing via a network such as the internet. The drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read therefrom is mounted into the storage section 708 as necessary.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication portion 709, and/or installed from the removable medium 711. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 701.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of a computer-readable storage medium may include, but are not limited to, an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer programs according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules involved in the embodiments of the present invention may be implemented in software or in hardware. The described modules may also be provided in a processor, for example, a processor may be described as comprising a ranking module, a mapping module and an identification module, wherein the names of the modules do not in some cases constitute a limitation of the module itself.
As a further aspect, the invention also provides a computer readable medium which may be comprised in the device described in the above embodiments or may be present alone without being fitted into the device. The computer readable medium carries one or more programs, which when executed by the device, cause the device to include performing rank distribution conversion on index values of respective users, calculating rank ranking rates of the respective users, performing distribution mapping on the rank ranking rates of the respective users based on an inverse function of a cumulative distribution function, obtaining mapping result values of the respective users, determining index performance levels of the respective users based on a mapping threshold and the mapping result values of the respective users, and storing the mapping result values and the index performance levels of the respective users to a big data platform for a database system to call.
According to the technical scheme provided by the embodiment of the invention, the technical means of calculating the rank ranking rate of each user and carrying out distribution mapping on the rank ranking rate of each user so as to determine the index expression level of each user is adopted, so that the technical problem that the data of each user stored in a large data platform in the prior art is inaccurate is solved. According to the embodiment of the invention, the index value is subjected to rank distribution conversion, so that the difference of different data sizes in a service scene is eliminated, the difference of different data distributions is processed, the problem of extreme value abnormality is solved, a data result has very strong robustness, and then the previous information loss is compensated by carrying out distribution mapping based on an inverse function of a cumulative distribution function, so that the data information loss caused by rank distribution conversion is reduced, and a better data characteristic expression effect is achieved, so that the user data stored in a big data platform is more accurate.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (9)

1. A method of storing user data, comprising:
performing rank distribution conversion on index values of all users, and calculating rank ranking rates of all the users;
Performing distribution mapping on the rank ranking rates of all the users based on an inverse function of the cumulative distribution function to obtain mapping result values of all the users;
Determining the index performance level of each user based on the mapping threshold value and the mapping result value of each user;
Storing the mapping result values and the index expression levels of the users to a big data platform for the database system to call, wherein the expression levels are feedback values or feedback types;
performing distribution mapping on rank ranking rates of the users based on an inverse function of the cumulative distribution function, including:
if the distribution of the index values of each user is in a bias distribution, performing distribution mapping on the rank ranking rate of each user based on an inverse function of a cumulative distribution function of the index distribution;
if the distribution of the index values of the users is normal distribution, performing distribution mapping on the rank ranking rate of the users based on an inverse function of a cumulative distribution function of bell-shaped distribution;
If in the evaluation scene, the index comprises a plurality of commodity quality satisfaction, material speed satisfaction, commodity description satisfaction and seller service satisfaction;
if under the consultation scene, the index comprises a 30 second response rate and average response time length;
if in the after-sales scenario, the indicators include return to goods repair rate and after-sales service duration;
if the index comprises a 48-hour item collecting time rate and a daily interval arrival rate in a logistics scene;
if in the dispute scene, the index comprises a plurality of transaction dispute rates, self-help transaction ending rates and dispute processing compliance rates.
2. The method of claim 1, wherein rank distribution converting the index values of each user comprises:
And aiming at better index value performance and smaller rank ranking, performing rank ranking on the index values of the users.
3. The method of claim 2, wherein rank ordering the index values for each user with the goal of better index value performance and smaller rank ranking comprises:
If the index value is expressed as the larger index value, the rank ranking corresponding to the index value is smaller, and each user is ranked according to the sequence from the larger index value to the smaller index value;
and if the index value is expressed as smaller, the corresponding rank is smaller, and each user is ranked according to the order of the index value from small to large.
4. The method of claim 1, wherein for any one user, the rank ranking rate of the user is calculated using the following method:
and calculating the rank ranking rate of the users according to the total number of the users in the ranking and the rank ranking of the users in the ranking.
5. The method of claim 1, further comprising, prior to rank distribution converting the index values of the respective users:
Judging whether the number of users with the same index value is greater than or equal to a compression threshold value;
If yes, compressing the users with the same index value into a user group, and directly mapping the index value to the user group.
6. The method of claim 1, further comprising, prior to rank distribution converting the index values of the respective users:
Judging whether the total number of users is smaller than a sample size threshold value;
If yes, the index value of each user is subjected to weighted correction through Bayes.
7. An apparatus for storing user data, comprising:
the ranking module is used for carrying out rank distribution conversion on index values of all users and calculating rank ranking rates of all the users;
The mapping module is used for carrying out distribution mapping on the rank ranking rates of the users based on the inverse function of the cumulative distribution function to obtain mapping result values of the users;
the identification module is used for determining the index expression level of each user based on the mapping threshold value and the mapping result value of each user, wherein the expression level is a feedback value or a feedback type;
The storage module is used for storing the mapping result values and the index expression levels of the users to a big data platform for the database system to call;
The mapping module is further configured to:
if the distribution of the index values of each user is in a bias distribution, performing distribution mapping on the rank ranking rate of each user based on an inverse function of a cumulative distribution function of the index distribution;
if the distribution of the index values of the users is normal distribution, performing distribution mapping on the rank ranking rate of the users based on an inverse function of a cumulative distribution function of bell-shaped distribution;
If in the evaluation scene, the index comprises a plurality of commodity quality satisfaction, material speed satisfaction, commodity description satisfaction and seller service satisfaction;
if under the consultation scene, the index comprises a 30 second response rate and average response time length;
if in the after-sales scenario, the indicators include return to goods repair rate and after-sales service duration;
if the index comprises a 48-hour item collecting time rate and a daily interval arrival rate in a logistics scene;
if in the dispute scene, the index comprises a plurality of transaction dispute rates, self-help transaction ending rates and dispute processing compliance rates.
8. An electronic device, comprising:
One or more processors;
storage means for storing one or more programs,
When executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-6.
9. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-6.
CN201911272851.0A 2019-12-12 2019-12-12 A method and device for storing user data Active CN112966210B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911272851.0A CN112966210B (en) 2019-12-12 2019-12-12 A method and device for storing user data

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911272851.0A CN112966210B (en) 2019-12-12 2019-12-12 A method and device for storing user data

Publications (2)

Publication Number Publication Date
CN112966210A CN112966210A (en) 2021-06-15
CN112966210B true CN112966210B (en) 2025-02-25

Family

ID=76271006

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911272851.0A Active CN112966210B (en) 2019-12-12 2019-12-12 A method and device for storing user data

Country Status (1)

Country Link
CN (1) CN112966210B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504618A (en) * 2014-12-29 2015-04-08 天津大学 Micro-grid reliability evaluation data sampling method based on pair-copula function
CN107464571A (en) * 2016-06-06 2017-12-12 南京邮电大学 A kind of method of data quality accessment, equipment and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7689520B2 (en) * 2005-02-25 2010-03-30 Microsoft Corporation Machine learning system and method for ranking sets of data using a pairing cost function
US7756867B2 (en) * 2007-02-16 2010-07-13 Yahoo! Inc. Ranking documents
US20130024448A1 (en) * 2011-07-21 2013-01-24 Microsoft Corporation Ranking search results using feature score distributions
US8928781B2 (en) * 2011-11-30 2015-01-06 Microsoft Corporation Response function determination by rank minimization

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504618A (en) * 2014-12-29 2015-04-08 天津大学 Micro-grid reliability evaluation data sampling method based on pair-copula function
CN107464571A (en) * 2016-06-06 2017-12-12 南京邮电大学 A kind of method of data quality accessment, equipment and system

Also Published As

Publication number Publication date
CN112966210A (en) 2021-06-15

Similar Documents

Publication Publication Date Title
CN111967167B (en) A Reliability Evaluation Method for Nonlinear Degradation Process
CN108595448B (en) Information pushing method and device
CN113743971B (en) Data processing method and device
CN113095893A (en) Method and device for determining sales of articles
CN114500339B (en) Node bandwidth monitoring method and device, electronic equipment and storage medium
CN112287208B (en) User portrait generation method, device, electronic equipment and storage medium
CN110866698A (en) Device for assessing service score of service provider
CN110895761A (en) Method and device for processing after-sale service application information
CN110472190A (en) The method and apparatus for filling ordered sequence
CN114049072B (en) Index determination method and device, electronic equipment and computer readable medium
CN119850265A (en) Data processing method, device, electronic equipment and storage medium
CN117113613A (en) Data processing method and device
CN110599281A (en) Method and device for determining target shop
CN112966210B (en) A method and device for storing user data
CN110490682B (en) Method and device for analyzing commodity attributes
CN117788115A (en) Method, device, equipment and storage medium for determining article demand information
CN110110267B (en) Method and device for extracting object characteristics and searching objects
CN112819555A (en) Article recommendation method and device
CN113342903B (en) A method and device for managing models in a data warehouse
CN112783956B (en) Information processing method and device
CN115545341A (en) Event prediction method and device, electronic equipment and storage medium
CN113763070B (en) Information recommendation method and device
CN110838019A (en) Method and device for determining the population of trial product distribution
CN113362097B (en) User determination method and device
CN110472645B (en) A method and device for selecting a target object

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant