Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to give a thorough understanding of embodiments of the application. One skilled in the relevant art will recognize, however, that the application may be practiced without one or more of the specific details, or with other methods, components, devices, steps, etc. In other instances, well-known methods, devices, implementations, or operations are not shown or described in detail to avoid obscuring aspects of the application.
The block diagrams depicted in the figures are merely functional entities and do not necessarily correspond to physically separate entities. That is, the functional entities may be implemented in software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor devices and/or microcontroller devices.
The flow diagrams depicted in the figures are exemplary only, and do not necessarily include all of the elements and operations/steps, nor must they be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the order of actual execution may be changed according to actual situations.
It should be noted that: references herein to "a plurality" means two or more.
As artificial intelligence technology research and advances, artificial intelligence technology expands research and applications in a variety of fields and develops increasingly important value.
Artificial intelligence (Artificial Intelligence, AI) is the theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and extend human intelligence, sense the environment, acquire knowledge and use the knowledge to obtain optimal results. In other words, artificial intelligence is an integrated technology of computer science that attempts to understand the essence of intelligence and to produce a new intelligent machine that can react in a similar way to human intelligence. Taking the application of artificial intelligence in machine learning as an example for illustration:
Among them, machine Learning (ML) is a multi-domain interdisciplinary, and involves multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, belief networks, reinforcement learning, transfer learning, induction learning, and the like. The scheme of the application mainly comprises the steps of learning the characteristic data of different types of users by machine learning to obtain a classification model, wherein the classification model is used for classifying the characteristic data of the users to be classified to obtain the types corresponding to the users to be classified.
Before the detailed description, the terms related to the present application are explained as follows:
the initial model refers to an algorithm model for making decisions or assigning items to categories, and in the present application, the at least two initial models may be at least two algorithm models of a random forest algorithm model, a machine learning model, a logistic regression model, a decision tree model, a support vector machine model, a naive bayes model, and the like.
The training sample refers to the characteristic data of authenticated different types of users, which are obtained by the user in the steps of logging in the APP and using the APP. The training samples comprise positive training samples and negative training samples, wherein the positive training samples are the characteristic data of the first type of users, and the negative training samples are the characteristic data of the second type of users. The different types of users may refer to users in different ages (for example, the different types of users include users in a first age group and users in a second age group), users in different physical health states (for example, the different types of users include users in physical health and users in poor physical health), users in different consumption capacities (for example, the different types of users may include users in good consumption capacities and users in poor consumption capacities), users in different working conditions (for example, the different types of users include users with work and users without work), and the like. If the first type of user refers to a first age group of users, and the second type of user refers to a second age group of users, the first age group may be a minor age group, that is, an age group less than 18 years old, and the second age group may be an age group with an adult age group greater than or equal to 18 years old. The first age group may also be an age group less than a preset age, the second age group is an age group greater than or equal to the preset age, and the preset age may be 12 years old, 14 years old, 15 years old, or the like.
The training set comprises positive and negative training samples corresponding to at least one stage in the training samples, and it is understood that the types of feature data of the training samples corresponding to different stages are different, so that if the training set comprises the positive and negative training samples corresponding to two stages, all the feature types corresponding to the two stages can be used as the feature types of each training sample in the training set, and if the feature data of the training samples does not have a certain feature type, the feature types can be filled according to a specified value (such as zero). For example, if the types of the feature data of the training samples of the first stage include A, B, C and D4, the types of the feature data of the training samples of the second stage include B, C, D, E and F5, and if the training set includes the training samples of the first stage and the training samples of the second stage, the types of the feature data of the training samples included in the training set are A, B, C, D, E and F6.
The verification system comprises a verification set, wherein each verification sample in the verification set is obtained based on user type identification operation of a user in a registration stage or a use stage, and the feature types of training samples in the training set and the feature types of the verification samples in the corresponding verification set are the same. The verification set may include feature data of a registration stage or a use stage acquired when the trained model is deployed on the server, and a user type corresponding to the feature data obtained by identifying the user.
Training parameters including at least one of precision, recall, AUC value, and F1 score. The recall ratio is a ratio obtained by dividing the number of users of the first type (predicted as the number of training samples) by the number of users of the first type (actually the number of training samples). The higher the recall, the higher the probability that the user representing the actual first class is predicted. Accuracy, for the prediction result, means the probability that all training samples predicted to be positive are actually also positive training samples. AUC (Area Under Curve) is defined as the area enclosed by the axis of the ROC curve (receiver operating characteristic curve, sensitivity curve), which is obviously not larger than 1. Further, since the ROC curve is generally above the line y=x, the AUC has a value ranging between 0.5 and 1. The closer the AUC is to 1.0, the better the model effect; when the total content is equal to 0.5, the effect is worst, and the application value is not provided. F1 score F1 score is used to represent the relationship of precision and recall. The F1 fraction simultaneously considers the precision and recall, and the precision and recall are the highest, and a balance is taken, wherein the formula of the F1 fraction is f1=2, namely the precision, recall/(precision+recall).
Verification parameters, training parameters include at least one of precision, recall, AUC value, F1 score, and the like. Reference may be made to the foregoing detailed description of the training parameters for specific significance of each verification parameter.
At present, for various application programs (APP), a control method for preventing users (partial users) from being enthusiased in APP is started, taking a game application program as an example, because all network game enterprises can only provide a service for a period of time for minors in a partial period of time at present, no other time is required to provide network game services for minors (minors herein can refer to users under 18 years), network game account real-name registration and login requirements are required to be strictly implemented, game services cannot be provided for the users who do not do real-name registration and login in any form, therefore, in the related art, when the users use the game APP, registration and real-name authentication are usually required to authenticate whether the users are minors, as shown in fig. 1, a schematic diagram of enforcing a real-name popup window is implemented for the game for the undername users or newly registered users, the users need to output authentication information for real-name authentication in the popup window, such as a game control number is clicked, after the authentication information is input, authentication (a) can be sent to a corresponding table, and if the identity control is required to be further authenticated by the identity control, and the face authentication control is required to be further carried out in the face authentication control of the final authentication window (face of the end-user) and the face authentication control is required to be further confirmed, for example, if the identity control is required to be 2 is required to be further authenticated, then restriction is imposed with minor identity. For users, if each user performs the above login and face recognition verification operations, the users may be bothered, and even the game retention may be affected.
Therefore, the application provides a classification model training method, so that the classification model obtained by the method can accurately identify whether the user is a minor, and the problems can be effectively relieved. The method comprises the following steps: acquiring at least two initial models and a plurality of training sets obtained based on training samples, wherein the training samples comprise positive and negative training samples obtained in a registration stage and positive and negative training samples obtained in a use stage, each training set comprises at least one positive and negative training sample corresponding to one stage, the positive training sample is the characteristic data of a first type of user, and the negative training sample is the characteristic data of a second type of user; for each initial model, respectively training the initial model by utilizing positive and negative training samples in each training set to obtain a plurality of classification models corresponding to the initial model and training parameters of each classification model, wherein each classification model corresponds to one training set; determining a target initial model based on training parameters of the classification model corresponding to each initial model; acquiring verification sets corresponding to each training set, wherein each verification sample in the verification set is acquired based on user type identification operation of a user in a registration stage or a use stage, and the training samples in the training sets are the same as the verification samples in the corresponding verification sets in characteristic types; aiming at each classification model corresponding to the target initial model, verifying the classification model based on a verification set corresponding to a training set for training the classification model to obtain verification parameters of the classification model; a target classification model is determined based on the verification parameters of each classification model. By using the target classification model, the feature data of the user to be identified is classified to obtain the user type (such as obtaining the age bracket of the user), so that whether the user is limited when using the game APP is confirmed according to the user type obtained by identification. The above identification operation (such as face recognition verification operation) is not required to be executed for each user, so that the use experience of the user is improved.
The implementation details of the technical scheme of the embodiment of the application are described in detail below:
fig. 3 is a schematic diagram of an application scenario according to an embodiment of the present application, and as shown in fig. 3, the application scenario includes a terminal device 10 and a server 20 communicatively connected to the terminal device 10 through a network, which may be a wide area network or a local area network, or a combination of both. The terminal device 10 may be a smart phone, tablet computer, computer or the like. Only a schematic diagram of the terminal device 10 as a smartphone is shown in fig. 3.
The user may log in or use the APP through the terminal device 10 so that the terminal device 10 may obtain the feature data when the user logs in or uses the APP, and identify the user to obtain a user category (e.g., the age group of the user obtained by identifying the user using the interface shown in fig. 2 is shown in fig. 3) to associate the feature data with the user type to obtain a sample. The terminal device 10 may, for example, upload to the server 20 after obtaining the sample, or may store it directly for recall by the server 20.
The server 20 may obtain at least two initial models and multiple training sets obtained based on training samples, where the training samples include positive and negative training samples obtained in a registration stage and positive and negative training samples obtained in a use stage, each training set includes at least one positive and negative training sample corresponding to a stage, the positive training sample is feature data of a first type of user, and the negative training sample is feature data of a second type of user; for each initial model, respectively training the initial model by utilizing positive and negative training samples in each training set to obtain a plurality of classification models corresponding to the initial model and training parameters of each classification model, wherein each classification model corresponds to one training set; determining a target initial model based on training parameters of the classification model corresponding to each initial model; acquiring verification sets corresponding to each training set, wherein each verification sample in the verification set is acquired based on user type identification operation of a user in a registration stage or a use stage, and the training samples in the training sets are the same as the verification samples in the corresponding verification sets in characteristic types; aiming at each classification model corresponding to the target initial model, verifying the classification model based on a verification set corresponding to a training set for training the classification model to obtain verification parameters of the classification model; a target classification model is determined based on the verification parameters of each classification model.
By adopting the method, training of the target classification model can be completed, so that the target classification model can be utilized to predict the characteristic data of the user who subsequently logs in or uses the APP so as to obtain the user type.
It should be understood that the above prediction process may be performed on the server 20 or the terminal device 10. When the prediction process is performed on the server 20, the target classification model is deployed on the server 20; when the prediction process is performed on the terminal device 10, the object classification model is deployed on the terminal device 10. Similarly, the model training process described above may be performed on the terminal device 10, in addition to the server 20.
Fig. 4 is a flowchart illustrating a classification model training method according to an embodiment of the present application, which may be performed by an electronic device with processing capabilities, for example, by a server, a terminal, or by a server and terminal interacting to implement the present solution, etc., without specific limitation. Referring to fig. 4, the method at least includes steps S110 to S140, and is described in detail as follows:
step S110, a plurality of training sets obtained based on training samples and at least two initial models are obtained.
The training samples comprise positive and negative training samples obtained in a registration stage and positive and negative training samples obtained in a use stage, each training set comprises at least one positive and negative training sample corresponding to the stage, the positive training sample is the characteristic data of a first type of user, and the negative training sample is the characteristic data of a second type of user.
The first class of users and the second class of users may be classified according to age groups, may be classified according to physical health conditions, and may be classified according to consumption ability or working conditions, which are not limited herein.
In one embodiment, the first age group user may be a user having an age less than a preset age, and the second age group user may be a user having an age greater than or equal to the preset age. The preset age may be 12 years old, 14 years old, 15 years old, 18 years old, or the like. In one embodiment of the present application, the predetermined age is 18 years, i.e., the first age group user is an underage user and the second age group user is an adult user.
The registration stage may specifically be feature data obtained by the user on the day of registration of the APP, and the use stage may be feature data obtained after a period of use of the APP, such as feature data obtained by using three days, one week, two weeks, or one month, etc.
The characteristic data of the registration stage may include one or more of the number of times of attempting to register within a specified period of time (such as one week, two weeks, or one month) before the registration time, the number of times of attempting to register by the user since the APP is online with a real-name policy, the age filled in on the registration day, the attribute of the registration day (such as whether it is holiday, workday, or specified date), the earliest time and latest time of the registration day, the active time of the registration day in different periods, the data of device login used on the registration day (such as all account numbers, the first age group user account number, the natural number, the first age group user account number, etc.), the data of registration device login used on the registration day in a period of time (such as one week or one month, etc.), the account number of the first age group user, the account number of triggering the real-name authentication bullet window, etc.
The characteristic data of the usage stage may include one or more characteristic data of the above-mentioned registration stage, and may further include one or more of active days in the usage stage after registration, active day duty ratio, active holiday period, and usage duration in different active periods.
The at least two initial models obtained may be at least two algorithm models of a random forest algorithm model, a machine learning model, a logistic regression model, a decision tree model, a support vector machine model, a naive bayes model, etc., which are not particularly limited herein, and may be set according to actual requirements.
The above-mentioned ways of obtaining at least two initial models may be obtained from a database, or may be obtained from a memory of the electronic device. I.e. a database or an electronic device may have stored therein a plurality of algorithm models.
The manner of obtaining the plurality of training sets based on the training samples may be:
the training sample obtaining may specifically be obtaining feature data of a plurality of terminal devices when a user logs in or uses an APP stage after the user logs in, and identifying the user to obtain a sample associated with a user type (such as a user age bracket), where the feature data included in the training sample may refer to the foregoing specific description of the feature data, and will not be described in detail herein.
Typically, different types of users use different proportions of APP, e.g., for gaming APP, video play APP, and content interaction APP, etc., the number of teenager users using such APP is typically much smaller than the number of adult users. For educational APP or class live APP, etc., the number of teenager users using such APP is typically much smaller than the number of adult users. Therefore, the proportion of positive and negative samples in the obtained training samples is generally greatly different from the corresponding APP type.
For example, for gaming APP, video play APP, content interaction APP, etc., where the first age group is minors and the second age group is adults, the ratio between the positive training sample and the negative training sample is typically small in training samples obtained when the user uses such APP after registration and registration. For education APP or class live APP, the first age group is minors, the second age group is adults, and the ratio between the positive training sample and the negative training sample is usually larger in the training samples obtained when the user uses the APP after registration.
In order to avoid that the information quantity of the model which is biased to adults in the machine learning process is larger than that of the model which is biased to adults in the machine learning process due to unbalance of positive and negative training samples, the model can achieve higher accuracy and influence the final prediction effect. In the embodiment of the present application, in order to make the effect of the classification model obtained by training the positive and negative training samples in the subsequent training stage better, the obtaining of the plurality of training sets obtained based on the training samples may specifically include the following steps:
step S112: the method comprises the steps of obtaining training samples, wherein the training samples comprise positive training samples and negative training samples, the positive training samples comprise positive and negative training samples obtained in a registration stage and positive training samples obtained in a use stage, and the negative training samples comprise negative training samples obtained in the registration stage and negative training samples obtained in the use stage.
For a specific description of acquiring the training sample, reference may be made to the foregoing specific description, which is not repeated here.
Step S114: if the ratio of the positive training sample to the negative training sample in the training samples is smaller than a preset threshold, expanding the positive training sample in the training samples to obtain expanded training samples, and obtaining a plurality of training sets based on the expanded training samples.
The preset threshold may be any value such as 0.5, 0.8, or 0.9, as long as the difference between the number of positive training samples and the number of negative training samples is small.
The above-mentioned manner of expanding the positive training samples in the training samples may be to expand the positive training samples in the training samples by using an oversampling algorithm, where the oversampling algorithm may be a random oversampling algorithm, or may be an SMOTE algorithm, or may be a DOPING algorithm. The positive ones of the training samples may also be augmented with a data enhancement algorithm. It should be understood that there may be a variety of ways to augment the positive samples in the training samples, and that the above is exemplary only and should not be taken as limiting the present solution.
In an embodiment of the present application, the step S114 may be to expand the positive training samples in the training samples based on the SMOTE algorithm, so as to obtain the expanded training samples.
If the ratio of the training set of the underage sample to the adult sample is n: m x n (m is a positive integer), the SMOTE algorithm expands the non-adult class to generate a x n samples, where 0<a < = m, and a is a positive integer.
Referring to fig. 5, 6 and 7, taking as an example that the positive training sample includes feature data corresponding to the minor users in the registration stage and the training stage, the negative training sample includes feature data corresponding to the minor users in the registration stage and the training stage, the obtained training set is identified in fig. 5, wherein the triangle represents the positive training sample, and the circle identifies the negative training sample, and as can be seen from the figure, the sample size of the negative training sample is far greater than that of the positive training sample, so that the positive training sample needs to be expanded. When the SMOTE algorithm is used to expand the positive training samples in the training samples, the distances between the selected positive training samples (such as the positive training samples outlined by the solid line in fig. 6) and other positive training samples can be calculated respectively by the euclidean distance calculation formula, and K neighbor samples (such as the positive training samples in the dashed line in fig. 6) are selected from the selected positive training samples. And then, setting a proportion i (i is a positive integer), randomly selecting new sample points from the selected positive training samples and k neighbor samples, and finally increasing the number of underage training samples to i x n to reduce the difference between the underage training samples and the number of adult human samples. For example, as shown in fig. 7, a new sample may be obtained by calculating a sample expansion calculation formula between a selected positive training sample and any one of the target neighbor samples, and if the positive training sample is calculated N times by using the above expansion calculation formula, the positive training sample may be expanded from the original M number to m+n number.
It should be appreciated that if the preset ratio is greater than a specified threshold, the negative training samples in the training samples may also be expanded to obtain expanded training samples, and multiple training sets may be obtained based on the expanded training samples. The above specified threshold may be 1.5, 2, 3 or 5, etc., and may be set according to actual requirements.
The method for expanding the negative training samples in the training samples may be the expansion process of the positive training samples in the training samples, which is not described herein.
Step S120, for each initial model, training the initial model by utilizing positive and negative training samples in each training set to obtain a plurality of classification models corresponding to the initial model and training parameters of each classification model, wherein each classification model corresponds to one training set.
When the positive and negative training samples in each training set are used for respectively training the initial model, specifically, the training samples in each training set are respectively input into the initial model for training, model parameters are continuously adjusted in the training process until the model converges, a plurality of classification models corresponding to the initial model are obtained, and each classification model corresponds to one training set.
The training parameters of each classification model may be that after the classification model is obtained, part of positive and negative training samples from the corresponding training set may be selected and input into the classification model for prediction to obtain a prediction result (a predicted age group), and the training parameters of the classification model are calculated according to the user type (e.g., the user age group) and the prediction type (e.g., the predicted age group) corresponding to each positive and negative training sample.
The training parameters of the classification model may be parameters for evaluating the degree of merit of the classification effect of the classification model. Specifically, the training parameters of the classification model may include at least one of an accuracy rate, a recall rate, an AUC value, and an F1 score.
Step S130, a target initial model is determined based on training parameters of the classification model corresponding to each initial model.
If the training parameters of each classification model are multiple, the above-mentioned ways of determining the target initial model may be multiple.
In one embodiment, a training parameter may be selected from training parameters of each classification model, and since the training parameter is used to characterize the quality of the classification effect of the classification model, the classification model may be determined from a plurality of classification models based on the magnitude of the training parameter of each selected classification model, and the initial model corresponding to the determined classification model may be used as the target initial model.
In another embodiment, the training parameters of each classification model may be weighted and summed to obtain a training parameter calculation value of each classification model, and the classification model is determined based on the training parameter calculation value of each classification model, and the initial model corresponding to the determined classification model is used as the target initial model, where the larger the numerical value of the same kind of training parameter is, the larger the corresponding weight is.
Step S140, obtaining a verification set corresponding to each training set, wherein each verification sample in the verification set is obtained based on a user type identification operation of the user in a registration stage or a use stage, and the feature types of the training samples in the training set and the verification samples in the corresponding verification set are the same.
The mode of obtaining the verification set corresponding to each training set may be: after acquiring characteristic data acquired by a user in a registration stage, correlating the characteristic data acquired in the registration stage with a recognition result obtained by classifying and recognizing (face recognition) the characteristic data to obtain a verification sample; and acquiring the characteristic data acquired by the user at the registered use stage, classifying and identifying (face recognition) the characteristic data acquired at the use stage, and associating the identification result with the corresponding characteristic data to obtain a verification sample.
It should be appreciated that the proportion of positive and negative validation samples in each validation set obtained as described above is determined based on the user actually using the APP.
The method for obtaining the verification set corresponding to each training set may further include, for each classification model corresponding to the target initial model, deploying the classification model on a server, and performing classification prediction on a plurality of feature data to be predicted by using the classification model to obtain probability that each feature data to be predicted is classified and predicted as a user of a first age group, sorting the probabilities of the feature data to be predicted from big to small, obtaining each target feature data to be predicted and each user type corresponding to each target registered user, which are sorted before a release threshold corresponding to the classification model, as a verification sample in the verification set corresponding to the training set for training the classification model, where each user type corresponding to each target registered user is obtained based on performing a user type recognition operation (e.g., a face recognition operation) on a user in a registration stage or a use stage.
Specifically, since the verification set is generally obtained in the APP stage of user registration and the APP stage after registration, when the target feature data to be predicted and the user type (e.g., user age group) corresponding to each target registered user, which are sequenced before the release threshold corresponding to the classification model, are obtained, the face recognition verification window for verifying the user age may be specifically displayed in a popup manner for the user corresponding to the target feature data to be predicted, so as to verify that the age group of the user corresponds to the first age group or the second age group.
The above-mentioned drop threshold may be determined according to the proportion of positive and negative samples in the training samples. The determination may also be based on the ratio of positive and negative samples in the training samples and a predetermined confidence level. The method can be determined according to the practical rate of the APP, the proportion of positive and negative samples in the training samples and a preset confidence, and can be set according to actual requirements.
In one embodiment of the present application, the delivery threshold may be calculatedCalculating to obtain a throwing threshold value of each classification model corresponding to the target initial model, wherein X is the throwing threshold value, m is an absolute difference value, and the throwing threshold value is obtained according to the characteristic data of a certain designated number of user type labels (such as age group labels) of the classification model, and P * To train the ratio of the number of the first class users (e.g. the ratio of the number of the first age group users) in the sample, Z * The confidence level is a value obtained by following normal distribution when the confidence level is 95%.
Step S150, for each classification model corresponding to the target initial model, verifying the classification model based on a verification set corresponding to a training set for training the classification model to obtain verification parameters of the classification model.
The method for verifying the classification model based on the verification set corresponding to the training set for training the classification model to obtain the verification parameter of the classification model may specifically be: and respectively inputting each verification sample in the verification set into a corresponding classification model, respectively identifying the verification samples in the corresponding verification set by utilizing the classification model to obtain a user type prediction result of each verification sample, and obtaining model parameters of the classification model based on the user type labels corresponding to each verification sample and the user type prediction result of each verification sample.
The verification parameter of the classification model may be a degree of merit for evaluating the classification effect of the classification model. In particular, the verification parameters of the classification model may include at least one of precision, recall, AUC values, and F1 scores.
Step S160, determining a target classification model based on the verification parameters of each classification model.
If the verification parameters of each classification model are multiple, the above-mentioned ways of determining the target classification model may be multiple. The verification parameter may be at least one of an accuracy rate, a recall rate, an AUC value, and an F1 score.
In one embodiment, one verification parameter may be selected from verification parameters of each classification model, and since the verification parameters are used to characterize the advantages and disadvantages of the classification effect of the classification model, the target classification model may be determined from the multiple classification models based on the magnitude of the verification parameters of each selected classification model. For example, when determining the target classification model, the classification model having the largest value among the values of the verification parameters of the respective classification models may be determined as the target classification model.
In another embodiment, the verification parameters of each classification model may be weighted and summed to obtain a verification parameter calculation value of each classification model, and the target classification model is determined based on the verification parameter calculation value of each classification model, where the larger the value of the same kind of verification parameter is, the larger the corresponding weight is. Illustratively, when determining the target classification model, for example, a classification model having the largest calculation value among the verification parameter calculation values of each classification model is determined as the target classification model.
According to the classification model training method provided by the application, at least two initial models and a plurality of training sets obtained based on training samples are obtained, so that in the model training process, positive and negative training samples in each training set are utilized to train each initial model to obtain a plurality of classification models corresponding to the initial models and training parameters of each classification model, and a target initial model is determined based on the training parameters of the classification model corresponding to each initial model, so that the selected initial model type is effectively ensured to be the optimal type for user classification. In addition, since each verification sample in the verification set is obtained based on the user type identification operation of the user in the registration stage or the use stage, the sample ratio of the obtained positive and negative verification samples is generally large, and therefore, in the verification stage, the verification parameters of the classification model are obtained by verifying each classification model corresponding to the target initial model based on the verification set corresponding to the training set for training the classification model; the target classification model is determined based on the verification parameters of each classification model, so that good classification effect of the model can be effectively ensured under the condition that the difference between positive and negative verification samples is large. Therefore, the accuracy of the user age obtained when the target classification model is used for identifying the characteristic data later can be improved.
Referring to fig. 8, another embodiment of the present application provides a classification model training method, which includes:
step S210, a plurality of training sets obtained based on training samples and at least two initial models are obtained.
The training samples comprise positive and negative training samples obtained in a registration stage and positive and negative training samples obtained in a use stage, each training set comprises at least one positive and negative training sample corresponding to the stage, the positive training sample is the characteristic data of a first type of user, and the negative training sample is the characteristic data of a second type of user.
The specific description of step S210 may refer to the foregoing specific description of step S110, and will not be repeated in this embodiment.
Step S220, for each initial model, selecting positive and negative training samples from each training set according to a target preset ratio to train the initial model, and obtaining a plurality of first classification models corresponding to preset classification thresholds and training parameters of each first classification model.
The method comprises the steps of determining a first classification model, wherein a preset classification threshold corresponds to the first classification model, and a preset ratio is used for representing the ratio of a positive training sample to a negative training sample in a training set.
The preset ratio can be preset or determined from various ratios, and can be set according to actual requirements.
If the predetermined ratio is predetermined, the predetermined ratio may be 5:1, 2:1, 3:2, 1:1, 1:2, 2:3, or 1:5, etc.
If the above-mentioned preset ratio is determined from a plurality of ratios, the specific determination method may be: and selecting a plurality of groups of training samples with different ratios from a training set, and respectively inputting the training samples into at least one initial model to obtain a second classification model of each initial model after training by using the training samples of each group. For each second classification model, obtaining a prediction result obtained by predicting a plurality of specified feature data with user type labels (such as labels with age groups) by using the second classification model, and obtaining a prediction parameter of the second classification model based on the prediction result of each specified feature data and the labels of each feature data, wherein the prediction parameter can be at least one of accuracy, recall, AUC value and the like. To determine a target ratio based on the parameters of each second classification model. The specified feature data may be selected from the training set.
That is, before performing step S220, the method further includes:
for each initial model, selecting a plurality of groups of positive and negative training samples from a training set according to a plurality of preset ratios, and respectively training the initial model to obtain a second classification model and training parameters of the second classification model when classification thresholds respectively corresponding to the positive and negative training samples in each group are set, wherein each preset ratio corresponds to one group of positive and negative training samples; and determining a target preset ratio according to training parameters of the second classification model corresponding to the preset ratios.
The plurality of preset ratios may be preset, where the plurality of preset ratios may specifically include at least two of 10:1, 5:1, 2:1, 1:1, 1:2, 1:5, 1:10, and the like, and may be set according to actual requirements, and the embodiment is not limited specifically.
The training set for determining the target preset ratio may include positive and negative training samples obtained in a registration stage, positive and negative training samples obtained in a use stage, and positive and negative training samples obtained in a registration stage and obtained in a use stage, and may be set according to actual requirements.
In an embodiment of the present application, the training set for determining the target preset ratio is a training set formed by positive and negative training samples obtained in the registration stage.
If the verification parameters of each second classification model are multiple, multiple modes can be used for determining the target preset ratio according to the model parameters of the second classification model corresponding to the multiple preset ratios respectively.
In one embodiment, a training parameter may be selected from the training parameters of each second classification model, and because the training parameters are used to characterize the advantages and disadvantages of the classification effect of the classification model, the target preset ratio may be determined from the multiple classification models based on the magnitude of the training parameters of each selected classification model. For example, when determining the target preset ratio, the preset ratio corresponding to the second classification model with the largest value in the values of the training parameters of the second classification model corresponding to the preset ratio may be determined as the target preset ratio.
In another embodiment, the training parameters of the second classification model corresponding to each preset proportion are weighted and summed to obtain a training parameter calculation value of the second classification model corresponding to each preset proportion, a target second classification model is determined based on the training parameter calculation value of the second classification model corresponding to each preset proportion, and the preset proportion corresponding to the target second classification model is used as the target preset proportion, wherein the larger the value of the same kind of verification parameter is, the larger the corresponding weight is. For example, when determining the target preset proportion, the preset proportion corresponding to the second classification model with the largest calculated value in the training parameter calculated values of the second classification model corresponding to each preset proportion may be determined as the target preset proportion.
For the process of selecting positive and negative samples from each training set according to the target preset ratio to train each initial model, reference may be made to the foregoing detailed description of step S120, which is not repeated here.
It should be noted that, the classification threshold refers to a classification critical point, and when a classification calculation value obtained by performing classification calculation on certain feature data is greater than or equal to the classification threshold, it may be determined that a user type corresponding to the feature data is a first type user (i.e., an age group is a first age group); accordingly, if the user type corresponding to the feature data is less than the classification threshold, it may be determined that the user type is a second type of user (i.e., the age group is a second age group). In the training process of each initial model, the selected classification threshold values are different, the model precision of the classification model corresponding to the corresponding classification threshold values is also different, and the parameters of the classification model corresponding to the classification threshold values are also different.
The plurality of classification thresholds may be pre-selected or set and may include at least two of 0.5, 0.6, 0.65, 0.7, 0.75, 0.8, 0.9, etc. The setting may be performed according to actual needs, and the embodiment of the present application is not particularly limited.
Step S230, determining a target preset classification threshold and a target initial model according to training parameters of the first classification model corresponding to each initial model under a plurality of preset classification thresholds.
The first classification model corresponding to the target preset classification threshold value is obtained based on the target initial model training, and the first classification model is a classification model corresponding to the target initial model.
The training parameters of the first classification model are parameters for evaluating the degree of merit of the classification effect of the classification model, and may specifically include at least one of an accuracy rate, a recall rate, an AUC value, an F1 score, and the like.
If the training parameters of each first classification model are multiple, the above-mentioned ways of determining the target initial model may be multiple.
In one embodiment, for each first classification model, a training parameter may be selected from training parameters of the first classification model, and because the training parameter is used to characterize the quality of the classification effect of the classification model, a target classification threshold and a target first classification model may be determined from multiple classification thresholds based on the magnitude of the value of the parameter selected from the training parameters of the first classification model corresponding to the multiple classification thresholds, and an initial model corresponding to the target first classification model is used as the target initial model. For example, a preset ratio corresponding to the maximum training parameter may be selected as the target preset ratio.
In another embodiment, the training parameters of each first classification model may be weighted and summed to obtain a training parameter calculation value of each first classification model, and a target classification model and a target classification threshold value are determined based on the training parameter calculation value of each first classification model, and an initial model corresponding to the target classification model is used as the target initial model. For example, the preset ratio corresponding to the maximum training parameter calculation value may be selected as the target preset ratio.
In still another embodiment, for each initial model, a plurality of training parameters of the first classification model corresponding to the plurality of classification thresholds obtained based on the initial model may be weighted and summed to obtain a training parameter calculation value of each first classification model, and a candidate classification model and a candidate classification threshold may be determined based on the training parameter calculation value of each first classification model. And determining a target classification model and a target classification threshold according to training parameter calculation values of candidate classification models corresponding to each initial model, and taking the initial model corresponding to the target classification model as a target initial model.
In still another embodiment, for each preset classification threshold, the training parameters of the first classification model corresponding to each initial model under the preset classification threshold are weighted and summed to obtain a training parameter calculation value of each initial model under the preset classification threshold, and a target parameter calculation value is determined based on the training parameter value of each initial model, wherein the larger the value of the same training parameter is, the larger the corresponding weight is; determining a preset classification threshold corresponding to the maximum target parameter calculation value as a target preset classification threshold; and determining the initial model corresponding to the maximum training parameter calculated value in the training parameter calculated values of each initial model corresponding to the target classification threshold as a target initial model.
In this way, for each preset classification threshold, the method of determining the target parameter calculation value based on the training parameter value of each initial model may be that, for each preset classification threshold, the maximum value of the training parameter values of the plurality of initial models corresponding to the preset classification threshold is taken as the target training parameter value; for each preset classification threshold, the average value of the training parameter values of the plurality of initial models corresponding to the preset classification threshold may be used as the target training parameter value.
It should be understood that there may be a variety of ways of determining the initial model of the target and the classification threshold of the target, and that the above examples are illustrative only and should not be taken as limiting.
Step S240, obtaining verification sets corresponding to each training set, wherein each verification sample in the verification set is obtained based on user type identification operation of a user in a registration stage or a use stage, and the feature types of the training samples in the training set and the verification samples in the corresponding verification sets are the same.
Step S250, for each classification model corresponding to the target initial model, based on a verification set corresponding to a training set for training the classification model, verifying the classification model to obtain verification parameters of the classification model.
Step S260, determining a target classification model based on the verification parameters of each classification model.
For the specific description of the steps S240 to S160, reference may be made to the specific description of the steps S140 to S160 in the foregoing embodiment, which is not repeated in this embodiment.
According to the classification model training method, at least two initial models and a plurality of training sets obtained based on training samples are obtained, so that in the model training process, positive and negative training samples are selected from each training set according to a target preset ratio for each initial model to train the initial model, a plurality of first classification models corresponding to preset classification thresholds and training parameters of each first classification model are obtained, the target preset ratio is determined based on the preset ratios, and a target preset classification threshold and a target initial model are determined according to the training parameters of the first classification model corresponding to each initial model under the preset classification thresholds. Therefore, the optimal proportion of the positive and negative samples adopted can be effectively ensured, the classification threshold value is also optimal, and the selected type of the initial model is the optimal type used for user classification. In addition, since each verification sample in the verification set is obtained based on the user type identification operation of the user in the registration stage or the use stage, the sample ratio of the obtained positive and negative verification samples is generally large, and therefore, in the verification stage, the verification parameters of the classification model are obtained by verifying each classification model corresponding to the target initial model based on the verification set corresponding to the training set for training the classification model; the target classification model is determined based on the verification parameters of each classification model, so that good classification effect of the model can be effectively ensured under the condition that the difference between positive and negative verification samples is large. Thereby ensuring the accuracy of the subsequent identification of the feature data to determine the user type when the target classification model is actually used.
Referring to fig. 9, the present application further provides a user classification method, which may include the following steps:
step S310, obtaining the characteristic data of the users to be classified.
The feature data of the user to be classified may be obtained by the user in a stage of using the APP, or may be obtained in a registration stage, which is not limited herein.
Step S320: and processing the feature data by using the target classification model to obtain the user type corresponding to the feature data of the user to be classified, wherein the user type is a first type user or a second type user.
It should be noted that, the feature types included in the feature data of the user to be classified should be the same as the feature types included in the training sample for training the target classification model, so the specific description of the feature data may refer to the specific description of the feature data in the training sample, which is not described in detail herein.
The step S320 may specifically be to perform classification calculation on the feature data by using a target classification model to obtain a classification calculation result, and compare the classification calculation result with a target classification threshold in the target classification model to determine a class corresponding to the feature data, where when the classification calculation result is greater than or equal to the target classification threshold, the user type corresponding to the feature data of the user to be classified may be determined as the first class user, and when the classification calculation result is less than the target classification threshold, the user type corresponding to the feature data of the user to be classified may be determined as the second class user.
For example, if a positive training sample in the training samples of the target classification model is feature data of a user in a first age group, and a negative training sample is feature data of a user in a second age group, and accordingly, when the classification calculation result is greater than or equal to the target classification threshold, the age group corresponding to the feature data of the user to be classified is determined to be the first age group, and when the classification calculation result is less than the target classification threshold, it is determined that the user type corresponding to the feature data of the user to be classified is determined to be the second age group.
The process of obtaining the target classification model may refer to the foregoing specific description of the training method of the classification model, which is not described in detail in the embodiment of the present application.
By adopting the classification method, the user to be classified can be accurately classified, so that the restriction mode of APP used by the user to be classified is conveniently determined according to the age classification result of the user to be classified.
Referring to fig. 10 in combination, the training and use scenario of the classification model is taken as a game scenario, so the present application proposes a classification model training method for discriminating whether a user used in the registration and use process of a game is an adult user or a minor user. The method comprises the following steps:
Step S410, at least two initial models and training samples obtained based on a popup authentication mode are obtained.
The manner of obtaining the training samples based on the popup authentication may be specifically shown in fig. 1.
The at least two models obtained in the method specifically include a random forest algorithm model (RF model) and a machine learning model (GBDT model). Each training sample is a sample obtained in the manner as in fig. 3 at the user registration stage or the use stage, that is, a sample obtained after performing the face recognition operation.
Step S420, if the ratio of the positive training sample to the negative training sample in the training samples is smaller than the preset threshold, expanding the positive training samples in the training samples based on the oversampling algorithm to obtain expanded training samples, and obtaining a plurality of training sets based on the expanded training samples.
The plurality of training sets obtained based on the training samples are 3, specifically a first training set, a second training set and a third training set, wherein the first training set comprises the characteristic data of the user on the same day as the user on the same day (registration stage), the second training set comprises the characteristic data of the user on a period of time after the registration (use stage), and the third training set comprises the characteristic data of the user on the same day as the user on the same period of time after the registration. Each training set comprises positive and negative training samples, wherein the positive training samples are characteristic data of underage users, and the negative training samples are characteristic data of adult users. The specific description of step S110 above may be referred to in detail with respect to the categories of feature data of different training sets.
Step S430: for each initial model, selecting multiple groups of positive and negative training samples from the first training set according to multiple preset ratios, and respectively training the initial model to obtain a second classification model and training parameters of the second classification model when classification thresholds respectively corresponding to the positive and negative training samples of each group are set.
The parameters of the second classification model comprise precision rate, recall rate, AUC value and F1 fraction. As shown in table 1, training parameters of a plurality of second classification models were obtained by training each of the initial models (RF model or GBDT model) with positive/negative sample ratios of 1:1, 1:5, and 1:10, respectively. Table 1 is as follows:
step S440: and carrying out weighted summation on training parameters of the second classification model corresponding to each preset ratio to obtain a training parameter calculated value corresponding to each preset ratio, and selecting the preset ratio corresponding to the maximum training parameter calculated value as a target preset ratio.
The larger the value of the same training parameter is, the larger the corresponding weight is, and the target preset ratio obtained by the step S440 is 1:1 for table 1.
Step S450, for each initial model, selecting positive and negative training samples from each training set according to a target preset ratio to train the initial model, and obtaining a plurality of first classification models corresponding to preset classification thresholds and training parameters of each first classification model.
Wherein a predetermined classification threshold corresponds to a first classification model. The training parameters of the first classification model may include the precision, recall, AUC values, and F1 scores, as shown in table 2, table 3, and table 4, and the training parameters of the first classification models obtained by training each initial model (RF model or GBDT model) with classification thresholds of 0.5, 0.6, 0.7, 0.8, and 0.9, respectively.
Table 2 shows training parameters of each first classification model when the classification threshold is 0.5, 0.6, 0.7, 0.8 and 0.9, based on each model, by selecting a plurality of training samples from the first training set according to a target preset ratio of 1:1. Table 2 is as follows:
table 3 shows training parameters of each first classification model when the classification threshold is 0.5, 0.6, 0.7, 0.8 and 0.9, based on a target preset ratio of 1:1, by selecting a plurality of training samples from the second training set. Table 3 is as follows:
table 4 shows training parameters of each first classification model when the classification threshold is 0.5, 0.6, 0.7, 0.8 and 0.9, based on each model, by selecting a plurality of training samples from the third training set according to a target preset ratio of 1:1. Table 4 is as follows:
Step S460, determining a target preset classification threshold and a target initial model according to training parameters of the first classification model corresponding to each initial model under a plurality of preset classification thresholds.
The first classification model corresponding to the target preset classification threshold value is obtained based on the target initial model training, and the first classification model is a classification model corresponding to the target initial model.
Specifically, for each preset classification threshold, carrying out weighted summation on training parameters of a first classification model corresponding to each initial model under the preset classification threshold to obtain training parameter calculation values of each initial model under the preset classification threshold, and determining target parameter calculation values based on the training parameter values of each initial model, wherein the larger the value of the same training parameter is, the larger the corresponding weight is; determining a preset classification threshold corresponding to the maximum target parameter calculation value as a target preset classification threshold; and determining the initial model corresponding to the maximum training parameter calculated value in the training parameter calculated values of each initial model corresponding to the target classification threshold as a target initial model. That is, as shown in tables 2 to 4, it can be calculated by the above method, when the target preset classification threshold is 0.7, the classification effect is optimal, and under the same classification threshold, the classification effect of the GBDT model is significantly better than that of the RF model. I.e. the determined target initial model is the GBDT model.
And step S470, determining a throwing threshold value of each classification model corresponding to the target initial model according to the proportion of the positive training sample and the negative training sample in the training samples.
In particular, computational methods can be usedCalculating to obtain a throwing threshold value of each classification model corresponding to the target initial model, wherein X is the throwing threshold value, m is an absolute difference value, and the throwing threshold value is obtained according to the classification model for a certain specified number of feature data with age group labels, and P * To train the duty ratio of the number of users of the first age group in the sample, Z * The confidence level is a value obtained by following normal distribution when the confidence level is 95%.
Step S480, acquiring a verification set corresponding to each classification model based on the release threshold of each classification model.
Specifically, in the step S480, for each classification model corresponding to the target initial model, the classification model is deployed on a server, and classification prediction is performed on a plurality of feature data to be predicted by using the classification model, so as to obtain a probability that a classification prediction result of each feature data to be predicted is a user in a first age group, the probabilities of the feature data to be predicted are ordered from large to small, the feature data to be predicted and the age groups of users corresponding to the registered users are obtained, the feature data to be predicted and the age groups of users corresponding to the registered users are ordered before a release threshold corresponding to the classification model, and are used as verification samples in a verification set corresponding to a training set for training the classification model, and the age groups of users corresponding to the registered users are obtained based on performing face recognition operation on the users in a registration stage or a use stage.
Step S3490: and verifying the classification model according to a verification set corresponding to a training set for training the classification model aiming at each classification model corresponding to the target initial model to obtain verification parameters of the classification model, and determining the target classification model according to the verification parameters of each classification model.
After training of the target classification model is completed, the target classification model may be deployed on a server to predict whether a newly online or newly registered user is an adult user using the target classification model. Therefore, inconvenience caused by the fact that face authentication needs to be executed when each user logs in the game APP is avoided, and the efficiency and experience of age classification of the users are greatly improved.
As shown in table 5, when the classification models are deployed on the game server, the underage user login condition in the game server in one period (for example, in the period of 12 months 23 to 26 months of XX years) is analyzed, and specifically, table 5 shows the underage authentication duty ratio confirmed based on the first training set and the second training set and the authentication passing duty ratio and the authentication duty ratio determined as underage by the classification model corresponding to the first training set and the classification model corresponding to the second training set when the classification models corresponding to the target initial model are used for online prediction. And determining that the underage authentication passing duty ratio and the underage duty ratio are determined by using the classification model corresponding to the third training set when the underage authentication duty ratio and the underage duty ratio are confirmed based on the third training set. The game server uses the underage authentication duty ratio confirmed by the pop-up window and the newly registered authentication passing duty ratio determined when authenticating the underage duty ratio part and the duty ratio determined as underage when performing game login authentication. And a preset high-risk rule in which a minor authentication duty ratio is set, and a newly registered authentication passing duty ratio determined at the time of authenticating the minor duty ratio portion and a duty ratio determined as minor are determined. Table 5 is as follows:
As can be seen from table 5, the underage authentication ratio confirmed based on the first training set and the second training set is 10.09%, the authentication result is relatively improved by 601% compared with the popup window authentication result in the same period, and the authentication rule is relatively improved by 345% compared with the high-risk rule in the same period; and the underage proportion is 72%, the authentication result is improved by 6% compared with the same-period popup window, and the authentication result is reduced by 4% compared with the same-period high-risk rule. The underage authentication ratio confirmed based on the third training set is 10.76%, the authentication result is relatively improved by 647% compared with the authentication result of the popup window in the same period, and is relatively improved by 374% compared with the high-risk rule in the same period; and the underage duty ratio is 77%, the authentication result is relatively improved by 13% compared with the same-period popup window, and is relatively improved by 2% compared with the same-period high-risk rule. It can be seen that the classification model trained using the third training set works best. That is, the target classification model is a classification model obtained based on the third training set.
After the target classification model described above is obtained, the target classification model may be deployed to a game server to make online predictions of whether the user registering and using the game is an underage user. Corresponding game restrictions may also be performed on users predicted to be underage, e.g., users may be allowed to perform game operations only for a certain period of time.
The following describes embodiments of the apparatus of the present application that may be used to perform the methods of the above-described embodiments of the present application. For details not disclosed in the embodiments of the apparatus of the present application, please refer to the above-described method embodiments of the present application.
Referring to fig. 11, the embodiment of the application further provides a classification model training apparatus 500 applicable to an electronic device, where the apparatus 500 includes: a first acquisition module 510, a training module 520, an initial model determination module 530, a second acquisition module 540, a verification module 550, and a classification model determination module 560.
The first obtaining module 510 is configured to obtain at least two initial models and a plurality of training sets obtained based on training samples, where the training samples include positive and negative training samples obtained in a registration stage and positive and negative training samples obtained in a use stage, each training set includes at least one positive and negative training sample corresponding to a stage, the positive training sample is feature data of a first type of user, and the negative training sample is feature data of a second type of user; the training module 520 is configured to train each initial model by using positive and negative training samples in each training set to obtain a plurality of classification models corresponding to the initial model and training parameters of each classification model, where each classification model corresponds to one training set; an initial model determining module 530, configured to determine a target initial model based on training parameters of the classification model corresponding to each initial model; a second obtaining module 540, configured to obtain a verification set corresponding to each training set, where each verification sample in the verification set is obtained based on a user type identification operation of the user in a registration stage or a use stage, and feature types of the training samples in the training set are the same as feature types of the verification samples in the corresponding verification set; the verification module 550 is configured to verify, for each classification model corresponding to the target initial model, the classification model based on a verification set corresponding to a training set used for training the classification model to obtain verification parameters of the classification model; the classification model determination module 560 is configured to determine a target classification model based on the verification parameters of each classification model.
In one embodiment, the first acquisition module 510 includes a first acquisition sub-module and an expansion sub-module. The first acquisition submodule is used for acquiring training samples, wherein the training samples comprise positive training samples and negative training samples, the positive training samples comprise positive and negative training samples obtained in a registration stage and positive training samples obtained in a use stage, and the negative training samples comprise negative training samples obtained in the registration stage and negative training samples obtained in the use stage; and the expansion sub-module is used for expanding the positive training samples in the training samples when the ratio of the positive training samples to the negative training samples in the training samples is smaller than a preset threshold value, obtaining expanded training samples and obtaining a plurality of training sets based on the expanded training samples.
Under this embodiment, the expansion sub-module is further configured to expand a positive training sample in the training samples based on an oversampling algorithm, so as to obtain an expanded training sample.
In an embodiment, the training module 520 is further configured to select, for each initial model, positive and negative training samples from each training set according to a target preset ratio, and train the initial model to obtain a plurality of first classification models corresponding to preset classification thresholds and training parameters of each first classification model, where one preset classification threshold corresponds to one first classification model, and the target preset ratio is used to characterize a ratio of the positive training samples to the negative training samples in the training set. The initial model determining module is further configured to determine a target preset classification threshold and a target initial model according to training parameters of first classification models corresponding to each initial model under a plurality of preset classification thresholds, where the first classification model corresponding to the target preset classification threshold obtained based on training of the target initial model is a classification model corresponding to the target initial model.
In this manner, the classification model training apparatus 500 further includes: the preset ratio determining module, the training module 520, is further configured to select multiple sets of positive and negative training samples from a training set according to multiple preset ratios for each initial model, and respectively train the initial model to obtain a second classification model and training parameters of the second classification model when classification thresholds corresponding to the positive and negative training samples in each set are set, where each preset ratio corresponds to one set of positive and negative training samples. The preset ratio determining module is used for determining a target preset ratio according to training parameters of the second classification model corresponding to the preset ratios.
In one embodiment, the training parameters of the second classification model are at least two, and the preset ratio determining module is further configured to perform weighted summation on the training parameters of the second classification model corresponding to each preset ratio to obtain a training parameter calculation value corresponding to each preset ratio, where the larger the number of the same training parameters is, the larger the corresponding weight is; and selecting a preset ratio corresponding to the maximum training parameter calculated value as a target preset ratio.
In one embodiment, the initial model determining module 530 includes a calculating sub-module, a classification threshold determining sub-module, and an initial model determining sub-module, where the calculating sub-module is configured to, for each preset classification threshold, perform weighted summation on training parameters of a first classification model corresponding to each initial model under the preset classification threshold, obtain a training parameter calculation value of each initial model under the preset classification threshold, and determine a target parameter calculation value based on the training parameter value of each initial model, where the larger the value of the same training parameter is, the larger the corresponding weight is; the classification threshold determining submodule is used for determining a preset classification threshold corresponding to the maximum target parameter calculation value as a target preset classification threshold; the initial model determining submodule is used for determining an initial model corresponding to the largest training parameter calculated value in the training parameter calculated values of each initial model corresponding to the target classification threshold as a target initial model.
In one implementation, the first class of users includes a first age group of users, the second class of users includes a second age group of users, and the second acquisition module 540 includes a threshold determination sub-module and a second acquisition sub-module. The threshold value determining submodule is used for determining a throwing threshold value of each classification model corresponding to the target initial model according to the proportion of the positive training sample and the negative training sample in the training samples; the second obtaining sub-module is used for aiming at each classification model corresponding to the target initial model, deploying the classification model in a server, carrying out classification prediction on a plurality of feature data to be predicted by utilizing the classification model to obtain the probability that the classification prediction result of each feature data to be predicted is a first type user, sequencing the probability of each feature data to be predicted from big to small, obtaining each target feature data to be predicted and each user type corresponding to each target registered user which are sequenced before a release threshold corresponding to the classification model, and taking the target feature data to be predicted and the user type corresponding to each target registered user as verification samples in a verification set corresponding to a training set for training the classification model, wherein the user type corresponding to each target registered user is obtained based on user type identification operation performed on users in a registration stage or a use stage.
In one embodiment, the verification parameters of the first classification model are at least two, and the classification model determination module 560 includes a calculation sub-module and a classification model determination sub-module. The computing sub-module is used for carrying out weighted summation on the verification parameters of each classification model to obtain a verification parameter computing value corresponding to each classification model, wherein the larger the numerical value of the same type of verification parameter is, the larger the corresponding weight value is; the classification model determining sub-module is used for selecting a classification model corresponding to the maximum verification parameter calculation value as a target classification model.
In one embodiment, the training parameters include at least one of an accuracy rate, a recall rate, an AUC value, and an F1 score, and the model parameters of the validation parameters include at least one of an accuracy rate, a recall rate, an AUC value, and an F1 score.
Referring to fig. 12, an embodiment of the present application further provides a user classification apparatus 600 applicable to an electronic device, where the apparatus 600 includes a feature data obtaining module 610 and a user classification module 620.
A feature data obtaining module 610, configured to obtain feature data of a user to be classified; the user classification module 620 is configured to process the feature data by using the target classification model to obtain a user type corresponding to the feature data of the user to be classified, where the user type is a first type user or a second type user.
It should be noted that, in the present application, the device embodiment and the foregoing method embodiment correspond to each other, and specific principles in the device embodiment may refer to the content in the foregoing method embodiment, which is not described herein again.
An electronic device 100 according to the present application will be described with reference to fig. 13.
Referring to fig. 13, based on the classification model training method and the user classification method provided by the foregoing embodiments, another electronic device 100 including a processor 102 that may execute the foregoing method is provided in the present application, where the electronic device 100 may be a server 10 or a terminal device, and the terminal device may be a smart phone, a tablet computer, a computer, or a portable computer.
The electronic device 100 also includes a memory 104. The memory 104 stores therein a program capable of executing the contents of the foregoing embodiments, and the processor 102 can execute the program stored in the memory 104.
Processor 102 may include one or more cores for processing data and a message matrix unit, among other things. The processor 102 utilizes various interfaces and lines to connect various portions of the overall electronic device 100, perform various functions of the electronic device 100, and process data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 104, and invoking data stored in the memory 104. Alternatively, the processor 102 may be implemented in hardware in at least one of digital signal processing (Digital Signal Processing, DSP), field programmable gate array (Field-Programmable Gate Array, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 102 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), and a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for being responsible for rendering and drawing of display content; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 102 and may be implemented solely by a single communication chip.
The Memory 104 may include random access Memory (Random Access Memory, RAM) or Read-Only Memory (RAM). Memory 104 may be used to store instructions, programs, code sets, or instruction sets. The memory 104 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for implementing at least one function, instructions for implementing the various method embodiments described below, and the like. The storage data area may also store data (e.g., training samples and verification samples) acquired by the electronic device 100 during use, and so forth.
The electronic device 100 may further include a network module and a screen, where the network module is configured to receive and transmit electromagnetic waves, and implement mutual conversion between the electromagnetic waves and the electrical signals, so as to communicate with a communication network or other devices, such as an audio playing device. The network module may include various existing circuit elements for performing these functions, such as an antenna, a radio frequency transceiver, a digital signal processor, an encryption/decryption chip, a Subscriber Identity Module (SIM) card, memory, and the like. The network module may communicate with various networks such as the internet, intranets, wireless networks, or with other devices via wireless networks. The wireless network may include a cellular telephone network, a wireless local area network, or a metropolitan area network. The screen may display interface content and perform data interaction.
In some embodiments, the electronic device 100 may further include: a peripheral interface 106 and at least one peripheral device. The processor 102, memory 104, and peripheral interface 106 may be connected by a bus or signal lines. The individual peripheral devices may interface with the peripheral devices via buses, signal lines or circuit boards. Specifically, the peripheral device includes: at least one of the radio frequency assembly 108, camera 114, audio assembly 116, display screen 118, power supply 122, etc
The peripheral interface 106 may be used to connect at least one Input/Output (I/O) related peripheral device to the processor 102 and the memory 104. In some embodiments, the processor 102, the memory 104, and the peripheral interface 106 are integrated on the same chip or circuit board; in some other embodiments, either or both of the processor 102, the memory 104, and the peripheral interface 106 may be implemented on separate chips or circuit boards, as embodiments of the application are not limited in this respect.
The Radio Frequency (RF) component 108 is configured to receive and transmit RF (Radio Frequency) signals, also known as electromagnetic signals. The radio frequency component 108 communicates with a communication network and other communication devices via electromagnetic signals. The radio frequency component 108 converts electrical signals to electromagnetic signals for transmission or converts received electromagnetic signals to electrical signals. Optionally, the radio frequency assembly 108 includes: antenna systems, RF transceivers, one or more amplifiers, tuners, oscillators, digital signal processors, codec chipsets, subscriber identity module cards, and so forth. The radio frequency component 108 can communicate with other terminals via at least one wireless communication protocol. The wireless communication protocol includes, but is not limited to: the world wide web, metropolitan area networks, intranets, generation mobile communication networks (2G, 3G, 4G, and 5G), wireless local area networks, and/or WiFi (Wireless Fidelity ) networks. In some embodiments, the radio frequency component 108 may also include NFC (Near Field Communication, short range wireless communication) related circuitry, which is not limiting of the application.
The camera 114 is used to capture images or video (e.g., capture images to be detected in the present scenario). Optionally, the camera 114 includes a front camera and a rear camera. Typically, the front camera is disposed on the front panel of the electronic device 100, and the rear camera is disposed on the back of the electronic device 100. In some embodiments, the at least two rear cameras are any one of a main camera, a depth camera, a wide-angle camera and a tele camera, so as to realize that the main camera and the depth camera are fused to realize a background blurring function, and the main camera and the wide-angle camera are fused to realize a panoramic shooting and Virtual Reality (VR) shooting function or other fusion shooting functions. In some embodiments, camera 114 may also include a flash. The flash lamp can be a single-color temperature flash lamp or a double-color temperature flash lamp. The dual-color temperature flash lamp refers to a combination of a warm light flash lamp and a cold light flash lamp, and can be used for light compensation under different color temperatures.
The audio component 116 may include a microphone and a speaker. The microphone is used for collecting sound waves of users and the environment, converting the sound waves into electric signals, and inputting the electric signals to the processor 102 for processing, or inputting the electric signals to the radio frequency component 108 for voice communication. For purposes of stereo acquisition or noise reduction, the microphone may be multiple and separately disposed at different locations of the electronic device 100. The microphone may also be an array microphone or an omni-directional pickup microphone. The speaker is used to convert electrical signals from the processor 102 or the radio frequency assembly 108 into sound waves. The speaker may be a conventional thin film speaker or a piezoelectric ceramic speaker. When the speaker is a piezoelectric ceramic speaker, not only the electric signal can be converted into a sound wave audible to humans, but also the electric signal can be converted into a sound wave inaudible to humans for ranging and other purposes. In some embodiments, the audio component 114 may also include a headphone jack.
The display screen 118 is used to display a UI (User Interface). The UI may include graphics, text, icons, video, and any combination thereof. When the display screen 118 is a touch display screen, the display screen 118 also has the ability to collect touch signals at or above the surface of the display screen 118. The touch signal may be input to the processor 102 as a control signal for processing. At this point, the display screen 118 may also be used to provide virtual buttons and/or a virtual keyboard, also referred to as soft buttons and/or a soft keyboard. In some embodiments, the display screen 118 may be one, providing a front panel of the electronic device 100; in other embodiments, the display screen 118 may be at least two, respectively disposed on different surfaces of the electronic device 100 or in a folded design; in still other embodiments, the display screen 118 may be a flexible display screen disposed on a curved surface or a folded surface of the electronic device 100. Even more, the display screen 118 may be arranged in a non-rectangular irregular pattern, i.e., a shaped screen. The display screen 118 may be made of LCD (Liquid Crystal Display ), OLED (Organic Light-Emitting Diode), or other materials.
The power supply 122 is used to power the various components in the electronic device 100. The power source 122 may be alternating current, direct current, disposable or rechargeable. When the power source 122 comprises a rechargeable battery, the rechargeable battery may be a wired rechargeable battery or a wireless rechargeable battery. The wired rechargeable battery is a battery charged through a wired line, and the wireless rechargeable battery is a battery charged through a wireless coil. The rechargeable battery may also be used to support fast charge technology.
The embodiment of the application also provides a computer readable storage medium. The computer readable medium has stored therein program code which is callable by a processor to perform the method described in the method embodiments described above.
The computer readable storage medium may be an electronic memory such as a flash memory, an EEPROM (electrically erasable programmable read only memory), an EPROM, a hard disk, or a ROM. Optionally, the computer readable storage medium comprises a non-volatile computer readable medium (non-transitory computer-readable storage medium). The computer readable storage medium has storage space for program code to perform any of the method steps described above. The program code can be read from or written to one or more computer program products. The program code may be compressed, for example, in a suitable form.
Embodiments of the present application also provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computer device, and executed by the processor, cause the computer device to perform the methods described in the various alternative implementations described above.
It should be noted that, the information of the user (for example, one or more of the device information of the user, the identity information of the user (information for performing real-name authentication), and the like) and the feature data (including, but not limited to, one or more of login data, registration time, active duration, and the like for analysis) related to the above-described embodiments of the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use, and processing of the related data are required to comply with the related laws and regulations and standards of the related country and region.
It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functions of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the application. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.
From the above description of embodiments, those skilled in the art will readily appreciate that the example embodiments described herein may be implemented in software, or may be implemented in software in combination with the necessary hardware. Thus, the technical solution according to the embodiments of the present application may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (may be a CD-ROM, a U-disk, a mobile hard disk, etc.) or on a network, and includes several instructions to cause a computing device (may be a personal computer, a server, a touch terminal, or a network device, etc.) to perform the method according to the embodiments of the present application.
Other embodiments of the application will be apparent to those skilled in the art from consideration of the specification and practice of the embodiments disclosed herein. This application is intended to cover any variations, uses, or adaptations of the application following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the application pertains.
It is to be understood that the application is not limited to the precise arrangements and instrumentalities shown in the drawings, which have been described above, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the application is limited only by the appended claims.