CN117035003A

CN117035003A - Hyperparameter optimization methods, devices, computing equipment, storage media and program products

Info

Publication number: CN117035003A
Application number: CN202211206130.1A
Authority: CN
Inventors: 孙中宇; 周雪; 杜朦旭
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2022-09-30
Filing date: 2022-09-30
Publication date: 2023-11-10

Abstract

A hyperparameter optimization method is described, which includes executing an initialization step, executing a confidence determination step, an optimal candidate step and a judgment step according to predetermined logic: the initialization step includes obtaining a candidate hyperparameter set; the confidence determination step includes determining the non-first part of the hyperparameters. The confidence that each hyperparameter in the parameters is the optimal hyperparameter; the optimal candidate step includes determining the optimal candidate hyperparameter among the non-first part of the hyperparameters; the judgment step includes determining whether the performance index of the optimal candidate hyperparameter meets predetermined conditions, If the predetermined conditions are not met, the optimal candidate hyperparameters are added to the first part of the hyperparameters, their performance indicators are added to the candidate hyperparameters set, and the confidence level determination step is performed. If the predetermined conditions are met, the optimal candidate hyperparameters are determined as the optimal ones. hyperparameters. The hyperparameter optimization method according to the embodiments of the present disclosure can achieve fast, efficient and accurate hyperparameter optimization.

Description

Super-parameter optimization method, device, computing equipment, storage medium and program product

Technical Field

The present disclosure relates to the field of computers, and more particularly, to a method, apparatus, computing device, storage medium, and program product for ultra-parameter optimization.

Background

With the development of computer technology, machine learning models have been widely used. The super-parameters of the machine learning model refer to parameters set before the learning process is started, not parameters learned during the training process. Super-parameters define higher-level concepts of machine learning models, such as model complexity or learning capabilities, than parameters, so selecting a set of optimal super-parameters for a machine learning model is critical to improving model learning performance and effectiveness.

In the prior art, the super-parameters of the machine learning model are usually configured based on manual experience, however, the super-parameters obtained by the adjustment based on the manual experience are often sub-optimal solutions and are not optimal super-parameters, and the super-parameter adjustment process based on the super-parameters consumes long time and has poor applicability.

Disclosure of Invention

The inventor notes how to achieve fast, efficient and accurate hyper-parametric optimization is a problem to be solved. In view of the above, the present application provides a super parameter optimization method, apparatus, computer-readable storage medium, and computer program product, which are expected to solve the above-mentioned problems.

According to a first aspect of the present disclosure, a method for optimizing a super parameter of a machine learning model is disclosed, which is characterized in that the method includes performing an initializing step, performing a confidence determining step, an optimal candidate step, and a judging step according to predetermined logic: initializing: acquiring a candidate hyper-parameter set, wherein the candidate hyper-parameter set comprises a plurality of candidate hyper-parameters and performance indexes of a first part of hyper-parameters in the plurality of candidate hyper-parameters, and the performance indexes are used for indicating the performance of a machine learning model configured by the corresponding hyper-parameters; confidence determining step: determining a confidence level that the first part of the super-parameters are optimal super-parameters according to the performance indexes of the first part of the super-parameters in the plurality of candidate super-parameters and aiming at each super-parameter in the non-first part of the super-parameters in the plurality of candidate super-parameters, wherein the optimal super-parameters refer to super-parameters with optimal performance indexes; optimal candidate step: determining optimal candidate superparameters in the non-first part superparameters based on the confidence that each superparameter in the non-first part superparameters is an optimal superparameter, and acquiring performance indexes of the optimal candidate superparameters; and a judging step: determining whether the performance index of the optimal candidate superparameter meets a preset condition, adding the optimal candidate superparameter into the first part of superparameter in response to the performance index of the optimal candidate superparameter not meeting the preset condition, adding the performance index of the optimal candidate superparameter into the candidate superparameter set, turning to a confidence determining step, and determining the optimal candidate superparameter as the optimal superparameter in response to the performance index of the optimal candidate superparameter meeting the preset condition.

According to a second aspect of the present disclosure, there is disclosed a super-parameter optimizing apparatus of a machine learning model, the super-parameter optimizing apparatus configured to perform an initializing step, perform a confidence determining step, an optimal candidate step, and a judging step in predetermined logic, the super-parameter optimizing apparatus comprising: an initialization module configured to perform an initialization step: acquiring a candidate hyper-parameter set, wherein the candidate hyper-parameter set comprises a plurality of candidate hyper-parameters and performance indexes of a first part of hyper-parameters in the plurality of candidate hyper-parameters, and the performance indexes are used for indicating the performance of a machine learning model configured by the corresponding hyper-parameters; a confidence module configured to perform a confidence determining step: determining a confidence level that the first part of the super-parameters are optimal super-parameters according to the performance indexes of the first part of the super-parameters in the plurality of candidate super-parameters and aiming at each super-parameter in the non-first part of the super-parameters in the plurality of candidate super-parameters, wherein the optimal super-parameters refer to super-parameters with optimal performance indexes; an optimal candidate module configured to perform an optimal candidate step: determining optimal candidate superparameters in the non-first part superparameters based on the confidence that each superparameter in the non-first part superparameters is an optimal superparameter, and acquiring performance indexes of the optimal candidate superparameters; and a judgment module configured to perform the judgment step of: determining whether the performance index of the optimal candidate superparameter meets a preset condition, adding the optimal candidate superparameter into the first part of superparameter in response to the performance index of the optimal candidate superparameter not meeting the preset condition, adding the performance index of the optimal candidate superparameter into the candidate superparameter set, turning to a confidence determining step, and determining the optimal candidate superparameter as the optimal superparameter in response to the performance index of the optimal candidate superparameter meeting the preset condition.

In the super parameter optimizing apparatus according to some embodiments of the present application, the confidence determining step further includes: performing first transformation on the super parameters in the super parameter set, and determining a super parameter transformation value of the super parameters, wherein the super parameter transformation value meets normal distribution; performing second transformation on the performance indexes of the first part of the super parameters of the super parameter set, and determining the performance index transformation values of the first part of the super parameters, wherein the performance index transformation values meet normal distribution; determining a first confidence coefficient with the maximum performance index conversion value for each super-parameter conversion value in the non-first part super-parameters based on the super-parameter conversion value of the first part super-parameters and the performance index conversion value thereof; and determining a first confidence level of the superparameter transform value of each superparameter as a confidence level that each corresponding superparameter is an optimal superparameter.

In a hyper-parametric optimization apparatus according to some embodiments of the application, performing a first transformation on a hyper-parameter in a hyper-parameter set, determining a hyper-parameter transformation value for the hyper-parameter comprises: acquiring a normalization value of the super parameter, wherein the normalization value is between 0 and 1; performing first mapping on the normalized value of the super parameter, and determining a first mapping value of the normalized value of the current super parameter; and determining a first mapping value of the normalized value of the superparameter as a superparameter transform value of the current superparameter, wherein the normalized value of the superparameter and the first mapping value of the normalized value of the current superparameter satisfy the equation:

Wherein h is _i Is the normalized value of the super parameter, H _i A first mapping value which is a normalized value of the super parameter, a being the superThe normalized value of the smallest superparameter in the parameter set, b is the normalized value of the largest superparameter in the superparameter set.

In the super-parameter optimizing apparatus according to some embodiments of the present application, performing a second transformation on the performance index of the first partial super-parameter of the super-parameter set, determining the performance index transformation value of the first partial super-parameter includes: acquiring performance indexes of the super parameters from the super parameter set; performing second mapping on the performance index of the super parameter, and determining a second mapping value of the performance index of the super parameter; and determining a second mapped value of the performance index of the superparameter as a performance index transformed value of the superparameter, wherein the performance index of the superparameter and the second mapped value of the performance index of the superparameter satisfy the equation:

wherein P is _i A second mapping value p which is the performance index of the super parameter _i Is the performance index of the super-parameter,is the mean value of the performance index in the hyper-parameter set,/->Is the standard deviation of the performance index in the hyper-parameter set.

In a hyper-parametric optimization apparatus according to some embodiments of the application, determining, for each hyper-parametric transformation value of the non-first portion of the hyper-parameters, a first confidence level thereof with a maximum performance index transformation value based on the hyper-parametric transformation values of the first portion of the hyper-parameters and the performance index transformation values thereof, comprises: establishing a statistical model based on the hyper-parametric transformation values of the first part of hyper-parameters and the performance index transformation values of the first part of hyper-parametric transformation values, wherein the statistical model is used for determining the expectation and variance of the performance index transformation values of the hyper-parametric transformation values; determining, for each of the superparameter transformed values of the superparameter in the non-first portion of superparameter, a desired and variance of the performance index transformed values thereof based on the statistical model; and determining a first confidence that the superparameter transform value of the corresponding superparameter has the largest performance index transform value based on the expected and variance of the performance index transform value of the superparameter transform value of each of the superparameters other than the first portion of the superparameters.

In a hyper-parametric optimization apparatus according to some embodiments of the application, determining, based on the expected and variance of the performance index transform values of the hyper-parametric transform values of each of the non-first portion of the hyper-parameters, a first confidence that the hyper-parametric transform value of the corresponding hyper-parameter has the largest performance index transform value comprises: acquiring a confidence function, wherein the confidence function is used for determining the confidence based on the expectation and the variance; substituting the expected sum variance of the performance index transformation values of the super-parameter transformation values of each super-parameter in the non-first part of super-parameters into a confidence function; and determining an output of the confidence function as a first confidence that the corresponding hyper-parametric transform value of the hyper-parameter has the largest performance index transform value.

In a hyper-parametric optimization device according to some embodiments of the application, obtaining a confidence function comprises: acquiring a first acquisition function, a second acquisition function and a third acquisition function; and determining a confidence function based on the first, second, and third acquisition functions, the confidence function, the first, second, and third acquisition functions satisfying the equation:

wherein,is a confidence function, ++ >Is a first acquisition function, +.>Is a second acquisition function, +.>Is a third acquisition function, +.>Is the weight of the first acquisition function, +.>Is the weight of the second acquisition function, +.>Is the weight of the third acquisition function and。

in the super parameter optimization apparatus according to some embodiments of the present application, the first acquisition function, the second acquisition function, and the third acquisition function each include: one or more of the expected increment acquisition function, the probability increment acquisition function and the confidence upper bound acquisition function, and the weights of the first acquisition function, the second acquisition function and the third acquisition function are all between 0 and 1 and are randomly generated.

In the super parameter optimization device according to some embodiments of the present application, the statistical model includes one or both of a gaussian mixture model and a gaussian model.

In a hyper-parameter optimization apparatus according to some embodiments of the application, obtaining a candidate hyper-parameter set comprises: obtaining a plurality of candidate hyper-parameters; determining a first partial hyper-parameter of the plurality of candidate hyper-parameters; acquiring performance indexes of a first part of super parameters in a plurality of candidate super parameters; and establishing a candidate superparameter set, wherein the candidate superparameter set comprises a plurality of candidate superparameters and performance indexes of a first part of superparameters in the plurality of candidate superparameters.

In a hyper-parameter optimization apparatus according to some embodiments of the application, determining a first partial hyper-parameter of a plurality of candidate hyper-parameters comprises: dividing the candidate hyper-parameters into a first number of candidate hyper-parameter sets, wherein the probabilities that each candidate hyper-parameter set in the first number of candidate hyper-parameter sets contains the optimal hyper-parameters are equal; and randomly acquiring a second number of candidate hyper-parameters from each candidate hyper-parameter set in the first number of candidate hyper-parameter sets respectively as first partial hyper-parameters.

In a hyper-parameter optimization apparatus according to some embodiments of the application, obtaining performance metrics of a first part of hyper-parameters in a plurality of candidate hyper-parameters includes: acquiring a training set for training a machine learning model; configuring a machine learning model with each of the first portion of superparameters; training the machine learning model with a training set to determine parameters of the machine learning model; testing the performance of the trained machine learning model to determine performance metrics of the trained machine learning model; and determining the performance index of the trained machine learning model as the performance index of the corresponding one of the first portion of superparameters.

In the super parameter optimizing apparatus according to some embodiments of the present application, the predetermined condition includes: the performance index of the optimal candidate super-parameter exceeds a predetermined threshold or the number of execution times of the judging step exceeds a predetermined number.

In the super-parameter optimizing apparatus according to some embodiments of the present application, obtaining the performance index of the optimal candidate super-parameter includes: acquiring a training set for training a machine learning model; configuring a machine learning model with each of the optimal candidate hyper-parameters; training the machine learning model with a training set to determine parameters of the machine learning model; testing the performance of the trained machine learning model to determine performance metrics of the trained machine learning model; and determining the performance index of the trained machine learning model as the performance index of the corresponding super-parameter in the optimal candidate super-parameters.

According to a third aspect of the present disclosure, a computing device is disclosed that includes a memory configured to store computer-executable instructions; a processor configured to perform any of the methods described above when the computer-executable instructions are executed by the processor.

According to a fourth aspect of the present disclosure, a computer-readable storage medium is disclosed, storing computer-executable instructions that, when executed, perform any of the methods as described above.

According to a fifth aspect of the present disclosure, a computer program product is disclosed comprising computer executable instructions, wherein the computer executable instructions, when executed by a processor, perform any of the methods as described above.

In the super parameter optimizing method and apparatus according to some embodiments of the present disclosure, an initializing step is first performed, and then a confidence determining step, an optimal candidate step, and a judging step are performed in predetermined logic. The initialization step includes obtaining a candidate superparameter set, wherein the candidate superparameter set includes a plurality of candidate superparameters and performance indexes of a first part of the plurality of candidate superparameters, and the performance indexes are used for indicating performance of a machine learning model configured with corresponding superparameters. The confidence determining step includes determining, for each of the non-first portion of the plurality of candidate superparameters, a confidence that it is an optimal superparameter based on performance metrics of the first portion of the plurality of candidate superparameters, the optimal superparameter being a superparameter whose performance metrics are optimal. The optimal candidate step includes determining an optimal candidate superparameter in the non-first partial superparameter based on a confidence that each superparameter in the non-first partial superparameter is an optimal superparameter, and obtaining a performance index of the optimal candidate superparameter. After the optimal candidate step is performed, the process goes to the judgment step. The judging step includes determining whether the performance index of the optimal candidate super-parameter meets a predetermined condition, adding the optimal candidate super-parameter to the first part of super-parameters in response to the performance index of the optimal candidate super-parameter not meeting the predetermined condition, adding the performance index of the optimal candidate super-parameter to the candidate super-parameter set, turning to the confidence determining step, and determining the optimal candidate super-parameter as the optimal super-parameter in response to the performance index of the optimal candidate super-parameter meeting the predetermined condition. It can be seen that by executing the above steps according to predetermined logic, the present disclosure avoids configuring the hyper-parameters of the machine learning model based on human experience, and achieves fast, efficient and accurate hyper-parameter optimization.

These and other advantages of the present disclosure will become apparent from and elucidated with reference to the embodiments described hereinafter.

Drawings

Embodiments of the present disclosure will now be described in more detail and with reference to the accompanying drawings, in which:

FIG. 1 illustrates an exemplary application scenario of a hyper-parametric optimization method according to some embodiments of the application;

FIG. 2 illustrates an exemplary flow chart of a hyper-parameter optimization method, according to some embodiments of the application;

FIG. 3 illustrates an exemplary flow chart of a confidence level determination method according to some embodiments of the application;

FIG. 4 illustrates an exemplary flow chart of a hyper-parameter optimization method, according to some embodiments of the application;

FIG. 5 illustrates an exemplary schematic diagram of a hyper-parametric optimization system, according to some embodiments of the application;

FIG. 6 illustrates performance of a hyper-parametric optimization method according to some embodiments of the application;

FIG. 7 illustrates performance of a hyper-parametric optimization method according to some embodiments of the application;

FIG. 8 illustrates performance of a super parameter optimization method according to some embodiments of the application;

FIG. 9 illustrates an exemplary block diagram of a super parameter optimization device, according to some embodiments of the application; the method comprises the steps of,

FIG. 10 illustrates an example system including an example computing device that represents one or more systems and/or devices that can implement the various methods described herein.

Detailed Description

The following description provides specific details of various embodiments of the disclosure so that those skilled in the art may fully understand and practice the various embodiments of the disclosure. It should be understood that the technical solutions of the present disclosure may be practiced without some of these details. In some instances, well-known structures or functions have not been shown or described in detail to avoid obscuring the description of embodiments of the present disclosure with such unnecessary description. The terminology used in the present disclosure should be understood in its broadest reasonable manner, even though it is being used in conjunction with a particular embodiment of the present disclosure.

First, some terms or expressions involved in the embodiments of the present application are described so as to be easily understood by those skilled in the art.

Super parameters: in the context of machine learning, a super-parameter is a parameter that is set to a value prior to starting the learning process, rather than parameter data obtained through training. In general, the super parameters need to be optimized, and a group of optimal super parameters are selected for the learning machine so as to improve the learning performance and effect.

Machine learning: machine Learning (ML) is a multi-domain interdisciplinary discipline involving multiple disciplines such as probability theory, statistics, approximation theory, convex analysis, algorithm complexity theory, etc. It is specially studied how a computer simulates or implements learning behavior of a human to acquire new knowledge or skills, and reorganizes existing knowledge structures to continuously improve own performance. Machine learning is the core of artificial intelligence, a fundamental approach to letting computers have intelligence, which is applied throughout various areas of artificial intelligence. Machine learning and deep learning typically include techniques such as artificial neural networks, confidence networks, reinforcement learning, transfer learning, induction learning, teaching learning, and the like.

Latin hypercube sampling: latin hypercube sampling (Latin hypercube sampling, abbreviated LHS) is a method of approximately random sampling from a multivariate parameter distribution, and belongs to a hierarchical sampling technique, and is commonly used in computer experiments or Monte Carlo integration and the like.

Gaussian model: the gaussian model precisely quantizes things by using a gaussian probability density function (normal distribution curve), and decomposes one thing into a plurality of models formed based on the gaussian probability density function (normal distribution curve).

Gaussian mixture model: the gaussian model precisely quantizes things by using a gaussian probability density function (normal distribution curve), and decomposes one thing into a plurality of models formed based on the gaussian probability density function (normal distribution curve).

Acquisition function: functions used to determine the next batch of sample points in statistical learning, such as a desired delta function (Expectation Improvement), a probability delta function (Probability of improvement), a confidence upper bound function (Upper Confidence Bound), and the like.

Fig. 1 illustrates an exemplary application scenario 100 of a hyper-parametric optimization method according to some embodiments of the application. As shown in fig. 1, the application scenario 100 includes a server 110, a terminal 120, a server 130, and a network 140. Server 110, terminal 120, and server 130 are coupled together via network 140. As an example, the super-parameter optimization of the machine learning model may be performed on the server 110 and then the optimal super-parameters are transmitted to the terminal 120 or the server 130 through the server 140.

As an example, the initializing step may be performed on the server 110, and then the confidence determining step, the optimal candidate step, and the judging step may be performed in predetermined logic. The initialization step includes obtaining a candidate superparameter set, wherein the candidate superparameter set includes a plurality of candidate superparameters and performance indexes of a first part of the plurality of candidate superparameters, and the performance indexes are used for indicating performance of a machine learning model configured with corresponding superparameters. The confidence determining step includes determining, for each of the non-first partial superparameters of the plurality of candidate superparameters, a confidence that the superparameter is an optimal superparameter, based on performance metrics of the first partial superparameter of the plurality of candidate superparameters, the optimal superparameter being a superparameter whose performance metrics are optimal. The optimal candidate step includes determining an optimal candidate superparameter in the non-first partial superparameter based on a confidence that each superparameter in the non-first partial superparameter is an optimal superparameter, and obtaining a performance index of the optimal candidate superparameter. The judging step includes determining whether the performance index of the optimal candidate super-parameter meets a predetermined condition, adding the optimal candidate super-parameter to the first part of super-parameters in response to the performance index of the optimal candidate super-parameter not meeting the predetermined condition, adding the performance index of the optimal candidate super-parameter to the candidate super-parameter set, turning to the confidence determining step, and determining the optimal candidate super-parameter as the optimal super-parameter in response to the performance index of the optimal candidate super-parameter meeting the predetermined condition.

As an example, the initializing step may also be performed on the server 130 or the terminal 120, and then one or more of the confidence determining step, the optimal candidate step, and the judging step may be performed in predetermined logic.

In some embodiments, one or more of the above-described initialization step, confidence determination step, optimal candidate step, and judgment step may also be performed on the server 110, the server 130, or the terminal 120, respectively, and implemented using the network 140 to perform these steps in accordance with predetermined logic.

Alternatively, the server 110 may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server that provides cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms. The terminals 120, 130 may include, but are not limited to, at least one of: terminals capable of presenting content, such as mobile phones, tablet computers, notebook computers, desktop PCs, digital televisions, and the like. The network 140 may be, for example, a Wide Area Network (WAN), a Local Area Network (LAN), a wireless network, a public telephone network, an intranet, and any other type of network known to those skilled in the art. It should also be noted that the above-described scenario is merely one example in which embodiments of the present disclosure may be implemented, and is not limiting.

It should be noted that the scenario described above is only one example in which embodiments of the present disclosure may be implemented, and is not limiting. For example, in some example scenarios, it is also possible to implement super-parameter optimization on a particular terminal or server.

FIG. 2 illustrates an exemplary flow chart of a hyper-parameter optimization method 200, according to some embodiments of the application. The method 200 may be implemented, for example, on a server 110 such as in fig. 1, although this is not limiting. As shown in fig. 2, method 200 includes steps 210, 220, 230, 240, 250, and 260.

In step 210, a set of candidate superparameters is obtained, the set of candidate superparameters including a plurality of candidate superparameters and performance metrics for a first portion of the plurality of candidate superparameters, the performance metrics being indicative of performance of a machine learning model configured with the superparameters. In some embodiments, the performance indicator of the first portion of the superparameter may be obtained by testing the performance of a machine learning model configured with the first portion of the superparameter. In some embodiments, step 210 may be used as an initialization step.

In some embodiments, obtaining the candidate hyper-parameter set may include: obtaining a plurality of candidate hyper-parameters, as an example, a range of candidate hyper-parameters may be sampled to obtain a plurality of candidate hyper-parameters; determining a first partial superparameter of the plurality of candidate superparameters, the first partial superparameter being determined by, as an example, randomly sampling from the candidate superparameters, or by a particular algorithm; acquiring performance indexes of a first part of super parameters in a plurality of candidate super parameters; and establishing a candidate superparameter set, wherein the candidate superparameter set comprises a plurality of candidate superparameters and performance indexes of a first part of superparameters in the plurality of candidate superparameters. For example, the candidate hyper-parameters may be a plurality of learning rates, and the performance index of the first portion of hyper-parameters may be a performance of the model after the machine learning model is configured at the first portion of learning rates.

As an example, determining a first portion of the plurality of candidate hyper-parameters may include: dividing the candidate hyper-parameters into a first number of candidate hyper-parameter sets, wherein the probabilities that each candidate hyper-parameter set in the first number of candidate hyper-parameter sets contains the optimal hyper-parameters are equal; and randomly acquiring a second number of candidate hyper-parameters from each candidate hyper-parameter set in the first number of candidate hyper-parameter sets respectively as first partial hyper-parameters. For example, a cumulative density function of a plurality of candidate superparameters may be obtained, then the plurality of candidate superparameters may be divided into n partitions according to a Cumulative Density Function (CDF) thereof, the probabilities of each partition may be equal, then the same number of candidate superparameters may be selected from each partition, and the selected superparameters may be used as the first partial superparameters.

In some embodiments, obtaining the performance indicator for a first portion of the plurality of candidate superparameters may include: acquiring a training set for training a machine learning model, for example, historical data of a target object to be learned may be acquired; for each of the first partial hyper-parameters, configuring a machine learning model therewith, e.g., the machine learning model may be configured at a first partial learning rate; training the machine learning model with a training set to determine parameters of the machine learning model, for example, the machine learning model configured with the learning rate may be trained with the training set to obtain a trained machine learning model; testing the performance of the trained machine learning model to determine performance metrics of the trained machine learning model; and determining the performance index of the trained machine learning model as the performance index of the corresponding one of the first portion of superparameters. As an example, when an application scenario requires that a hyper-parameter be configured for a machine learning model for predicting target object behavior, a training set for training the machine learning model may include historical data of the target object, and then test the prediction accuracy of the trained machine learning model for the target object as a performance index of the hyper-parameter.

In step 220, a confidence level that each of the non-first partial superparameters of the plurality of candidate superparameters is an optimal superparameter is determined based on the performance index of the first partial superparameter of the plurality of candidate superparameters, the optimal superparameter being the superparameter whose performance index is optimal. In some embodiments, the confidence level may be obtained using an acquisition function. In some embodiments, step 220 may serve as a confidence determination step.

In step 230, an optimal candidate superparameter in the non-first partial superparameter is determined based on the confidence that each superparameter in the non-first partial superparameter is an optimal superparameter, and a performance indicator of the optimal candidate superparameter is obtained. In some embodiments, it will be determined whether it is the optimal superparameter based on its performance index. In some embodiments, step 230 may be the best candidate step.

In some embodiments, obtaining the performance index for the optimal candidate hyper-parameters may include: acquiring a training set for training a machine learning model, for example, historical data of a target object to be learned may be acquired; for each of the optimal candidate hyper-parameters, configuring a machine learning model therewith, e.g., the machine learning model may be configured with an optimal candidate learning rate; training the machine learning model with a training set to determine parameters of the machine learning model, for example, the machine learning model configured with the learning rate may be trained with the training set to obtain a trained machine learning model; testing the performance of the trained machine learning model to determine performance metrics of the trained machine learning model; and determining the performance index of the trained machine learning model as the performance index of the corresponding super-parameter in the optimal candidate super-parameters. As an example, when an application scenario requires that a hyper-parameter be configured for a machine learning model for predicting target object behavior, a training set for training the machine learning model may include historical data of the target object, and then test the prediction accuracy of the trained machine learning model for the target object as a performance index of the hyper-parameter.

In step 240, it is determined whether the performance index of the optimal candidate hyper-parameters satisfies a predetermined condition.

In some embodiments, the predetermined condition may include: the performance index of the optimal candidate super-parameter exceeds a predetermined threshold or the number of execution times of the judging step exceeds a predetermined number. As an example, the number of iterations of the steps may be limited to end the super parameter optimization, e.g. limiting the predetermined condition to 100 execution of the judging step.

In step 250, in response to the performance index of the optimal candidate superparameter not meeting the predetermined condition, adding the optimal candidate superparameter to the first portion of superparameters, adding the performance index of the optimal candidate superparameter to the candidate superparameter set, and then moving to step 220 for the next iteration.

In step 260, the optimal candidate hyper-parameters are determined to be optimal hyper-parameters in response to the performance index of the optimal candidate hyper-parameters meeting a predetermined condition. In some embodiments, steps 240, 250 and 260 may be part of the determining step.

As can be seen, the method 200 achieves fast, efficient and accurate hyper-parametric optimization by performing the steps described above in predetermined logic.

FIG. 3 illustrates an exemplary flow chart of a confidence level determination method 300 according to some embodiments of the application. As an example, the method 300 may be performed in step 220 described above. As shown in fig. 3, method 300 includes steps 310, 320, 330 and 340.

In step 310, a first transformation is performed on the superparameters in the superparameter set to determine superparameter transformed values for the superparameters, the superparameter transformed values satisfying a normal distribution. As an example, the first transformation may include a Kumaraswamy transformation such that the transformed hyper-parameters satisfy a normal distribution.

In some embodiments, performing a first transformation on the superparameters in the superparameter set, determining superparameter transformed values for the superparameters may include: acquiring a normalized value of the current super parameter, wherein the normalized value is between 0 and 1, for example, the normalized value can be acquired by normalizing the super parameter; performing first mapping on the normalized value of the current super-parameter, and determining a first mapping value of the normalized value of the current super-parameter; and determining a first mapping value of the normalized value of the current superparameter as a superparameter transform value of the current superparameter, wherein the normalized value of the current superparameter and the first mapping value of the normalized value of the current superparameter satisfy the equation:

wherein h is _i Normalized value of finger hyper-parameters, H _i A refers to a first mapping value of normalized values of the superparameters, a refers to normalized values of the smallest superparameters in the superparameter set, and b refers to normalized values of the largest superparameters in the superparameter set.

In step 320, a second transformation is performed on the performance indicators of the first portion of the superparameter set to determine performance indicator transformed values of the first portion of the superparameter, the performance indicator transformed values satisfying a normal distribution. As an example, the second transformation may include a z-score normalization transformation.

In some embodiments, performing a second transformation on the performance indicator of the first portion of the superparameter set may include: acquiring performance indexes of the super parameters from the super parameter set; performing second mapping on the performance index of the super parameter, and determining a second mapping value of the performance index of the super parameter; and determining a second mapped value of the performance index of the superparameter as a performance index transformed value of the superparameter, wherein the performance index of the superparameter and the second mapped value of the performance index of the superparameter satisfy the equation:

wherein P is _i Second mapping value, p, of performance index of super parameter _i Refers to the performance index of the super-parameter,mean value of performance index in super parameter set, < ->Refers to the standard deviation of performance indicators in the super parameter set.

In step 330, a first confidence level is determined for each of the superparameter transformed values of the non-first portion of the superparameters that has a maximum performance index transformed value based on the superparameter transformed values of the first portion of the superparameters and their performance index transformed values. In some embodiments, the determination confidence may be determined using an acquisition function.

In step 340, the first confidence of the superparameter transform value of each superparameter is determined as the confidence that each corresponding superparameter is the optimal superparameter.

It can be seen that the method 300 performs the first transformation on the superparameter, performs the second transformation on the performance index of the first part of the superparameter to eliminate noise interference as much as possible, and determines the confidence level of each superparameter in the non-first part of the superparameter having the maximum performance index transformation value based on the superparameter transformation value of the first part of the superparameter and the performance index transformation value thereof, which results in obtaining more accurate and anti-interference confidence level. The method 300 lays a foundation for achieving rapid, efficient and accurate super parameter optimization by performing the above steps.

In some embodiments, determining a first confidence level for each of the superparameter in the non-first portion of superparameter based on the superparameter transform values of the first portion of superparameter and the performance index transform values thereof may comprise, at step 330: establishing a statistical model based on the hyper-parametric transformation values of the first part of hyper-parameters and the performance index transformation values of the first part of hyper-parametric transformation values, wherein the statistical model is used for determining the expectation and variance of the performance index transformation values of the hyper-parametric transformation values; determining, for each of the superparameter transformed values of the superparameter in the non-first portion of superparameter, a desired and variance of the performance index transformed values thereof based on the statistical model; and determining a first confidence that the superparameter transform value of the corresponding superparameter has the largest performance index transform value based on the expected and variance of the performance index transform value of the superparameter transform value of each of the superparameter in the non-first portion of superparameters. As an example, the statistical model may estimate performance metrics of other superparameters based on performance metrics of known superparameters, as well as provide a desired and variance of possible distributions of performance metrics. As an example, the statistical model may include one or both of a gaussian mixture model, a gaussian model.

In some embodiments, determining, based on the expected and variance of the performance indicator transform values of the superparameter transform values of each of the superparameters in the non-first portion of the superparameters, a first confidence that the superparameter transform value of the corresponding superparameter has the largest performance indicator transform value comprises: acquiring a confidence function, wherein the confidence function is used for determining the confidence based on the expectation and the variance; substituting the expected and variance of the performance index transformation values of the super-parameter transformation values of each super-parameter in the non-first part of super-parameters into the confidence function; and determining an output of the confidence function as a first confidence that the corresponding hyper-parametric transform value of the hyper-parameter has the largest performance index transform value. As an example, the confidence function may include an acquisition function for determining a confidence based on the expectations and the variance.

In some embodiments, obtaining the confidence function includes: acquiring a first acquisition function, a second acquisition function and a third acquisition function; and determining a confidence function based on the first, second, and third acquisition functions, the confidence function, the first, second, and third acquisition functions satisfying the equation:

Wherein,indicate confidence function, ++>Refers to the first acquisition function,>refers to the second acquisition function, ">Refer to the third acquisition function, ++>Is the weight of the first acquisition function, +.>Is the weight of the second acquisition function, +.>Is the weight of the third acquisition function and。

in some embodiments, the first, second, and third acquisition functions each comprise: one or more of the expected increment acquisition function, the probability increment acquisition function and the confidence upper bound acquisition function, and the weights of the first acquisition function, the second acquisition function and the third acquisition function are all between 0 and 1 and are randomly generated. Because the confidence coefficient function comprises three acquisition functions, the value of the confidence coefficient does not excessively depend on a certain acquisition function, and the universality of the finally determined optimal super-parameters is improved. And because the weight of the acquisition function is randomly generated, the probability of occurrence of overfitting of the machine learning model to certain training sets is reduced, the sparsity of the machine learning model is increased, and the adaptability and the accuracy of the obtained optimal super-parameters are better.

FIG. 4 illustrates an exemplary flow chart 400 of a hyper-parameter optimization method, according to some embodiments of the application. As shown in fig. 4, the method 400 may include step S410, step S420, step S430, step S440, and step S450.

In step S410, first, candidate hyper-parameters are acquired, and the candidate hyper-parameters are sampled by using a latin hyper-cube sampling method, and performance indexes of the sampled hyper-parameters are acquired. For example, M candidate superparameters are divided into n equiprobable spaces by accumulating a density function, then M superparameters are randomly extracted from each equiprobable space, and a total of m×n superparameters are taken as sampled superparameters, and performance indexes of the m×n superparameters are obtained.

In step S420, kumaraswamy transformation is performed on the M candidate superparameters to obtain superparameter transformed values, z-score transformation is performed on the performance indexes of the M superparameters to obtain performance index transformed values, and then a gaussian mixture model is built with the superparameter transformed values of the M superparameters and the performance index transformed values for predicting the performance index transformed values of the M candidate superparameters.

In step S430, the expected and variance of the performance index transformation values of the M candidate hyper-parameters are obtained according to the established Gaussian mixture model, and then the optimal candidate hyper-parameters in the M candidate hyper-parameters are determined by using a confidence function combined with an expected increment function (EI: expectation Improvement), a probability increment function (PI: probability of improvement) and a confidence upper bound function (UCB: upper Confidence Bound). For example, the confidence function may be determined by adding weights to the EI, PI, UCB functions, respectively.

In step S440, it is determined whether the performance index of the optimal candidate super-parameter satisfies a predetermined condition, and if the predetermined condition is satisfied, the process goes to step S450, i.e., the optimal candidate super-parameter is determined as the optimal super-parameter; if the predetermined condition is not satisfied, go to step 420, add the optimal candidate super-parameters and their performance index to the next optimization process, and continue the iteration. The predetermined condition may include that the value of the performance index is greater than a certain threshold or that the number of iterations exceeds an iteration limit, for example, it may be set that if step S440 has been performed 100 times, the iteration may be ended.

FIG. 5 illustrates an exemplary schematic diagram of a hyper-parameter optimization system 500, according to some embodiments of the application. The super parameter optimization system 500 is used for online parameter tuning to determine optimal super parameters in real time. It should be noted that in other embodiments, an off-line parameter adjustment may also be performed to determine the optimal super-parameters off-line. As depicted in FIG. 5, the hyper-parametric optimization system 500 includes a call section and a business section, wherein the business section includes a business background 510 and a database 520, and the call section includes a scheduler 530, a call background 540, and a management background 550.

First, the service background 510 will notify the management background 550 to start the parameter tuning task, and then the management background 550 will drop the old super-parameter based machine learning model in the service background 510 and put the new super-parameter based machine learning model on the service background 510 online, where the new super-parameter may be the best candidate super-parameter mentioned above. The business background 510 then drops the machine learning model based on the new super parameters into a plurality of application scenarios, and collects real-time data (e.g., performance metrics) of the machine learning model at each scenario in real-time for storage in the database 520. The scheduler 530 will then schedule portions of the data in the database 520 according to the requirements of the call, and send to the call background 510. After acquiring the data of the machine learning model, the tuning background 540 may determine optimal candidate superparameters based on the embodiments illustrated by the method 200, method 400, etc. described above, and based thereon train a new machine learning model and send it to the management background 550, where the management background 550 may bring the latest machine learning model up to the business background 510 and down to the old machine learning model. Alternatively, the iteration may be terminated by setting the number of times the management daemon 550 comes on-line with the latest machine learning model in the service daemon 510. In some embodiments, the scheduler 530 may divide the scheduled data into multiple groups that are passed to the scheduling background, respectively, to enable subsequent experimental collation.

As an example, the hyper-parametric optimization system 500 may be used for tuning experiments of the deep learning model, for example, for tuning the learning rate of the deep learning model to obtain an optimal learning rate. Illustratively, when an optimal learning rate is to be determined from the range of (0,0.2), it may be implemented by the system 500.

Firstly, the service background 510 sets the created parameter adjustment experiment, the range of learning rate to be searched (0.001,0.2), the number of groups for searching parameters in each batch to be 3, and the optimization index to be human-average viewing time. After the parameter adjustment is started, the service background 510 equally divides all users into six parts, each part corresponds to one super parameter, two super parameters can be set as a pair, wherein each pair of super parameters comprises a comparison version and an experimental version, the learning rate of the comparison version can be set to be gradually increased by 0.001, and the learning rate of the experimental version is configured as the optimal candidate super parameter. The dispatcher 530 then transmits the collected learning rates of six parts and the corresponding time periods of user average viewing to the parameter tuning background 540 every 72 hours, and the parameter tuning background calculates 3 groups of learning rates to be searched for in the next batch (i.e. the next batch control group and the next batch optimal candidate super parameters) according to the time periods of user average viewing and the learning rate range to be searched for. Then, the tuning background 540 sends the tuning result to the management background 550, and the management background 550 turns off the six deep learning models corresponding to the currently on-line 3 pairs of super parameters, and turns on the six deep learning models corresponding to the next 3 groups of to-be-searched parameters. Repeating the steps until the average viewing time length of the service satisfaction is obtained, wherein the corresponding learning rate is the optimal learning rate (namely the optimal super parameter) which is finally searched.

To test the performance of the various hyper-parametric optimization methods shown in this disclosure, a first set of data sets and a second set of data sets were randomly acquired for performing the first set of experiments and the second set of experiments, respectively. In each group of experiments, 10 rounds of tests are carried out, each round of tests is iterated for 100 times to obtain the optimal super-parameters, and finally, the model score of the optimal super-parameters is obtained as the performance index of the super-parameters by scoring the machine learning model configured by the optimal super-parameters. In order to more intuitively represent the comparison effect of each method, normalization processing is performed on scoring results of each model. Fig. 6, 7 and 8 illustrate schematic diagrams of performance of a super-parameter optimization method according to some embodiments of the application, respectively.

As shown in fig. 7 and 8, the abscissa in the graph represents the serial number of the test (i.e., the serial number in 10 rounds of test, e.g., 3 represents the third round of test), and the ordinate in the graph represents the performance index of the hyper-parameters (i.e., the normalized value of the score of the hyper-parameter corresponding model). Wherein the random search represents a random search of the optimal superparameter among the candidate superparameters, PI represents a superparameter optimization method with PI sampling function as a confidence function, EI represents a superparameter optimization method with EI sampling function as a confidence function, UCB represents a superparameter optimization method with UCB sampling function as a confidence function, and method 400 represents the superparameter optimization method illustrated by the embodiment of method 400 described above.

As can be seen from fig. 7 and 8, the optimal super-parameters determined by the method 400 have the most stable and highest performance metrics in both sets of experiments. Random search algorithms, although occasionally able to achieve higher performance metrics, perform very unstably. And, the performance stability and performance index of the PI, UCB, EI algorithm proposed by the present disclosure are both better than the random search algorithm, but inferior to the method 400 proposed by the present disclosure. In some experiments, some algorithms even have a performance index of 0. It can be seen that the method of the present disclosure is optimal in terms of the stability exhibited.

FIG. 8 illustrates performance of a hyper-parametric optimization method according to some embodiments of the application. Wherein the average score represents the average score of the scores of the several algorithms in the first set of experiments and the second set of experiments. As can be seen in connection with fig. 8, the determination of optimal superparameters using the method 400 presented in the present disclosure is well behaved, both in terms of accuracy and stability.

Fig. 9 illustrates an exemplary block diagram of a super parameter optimization device 900, according to some embodiments of the application. As shown in fig. 9, the super parameter optimization apparatus 900 includes an initialization module 910, a confidence module 920, an optimal candidate module 930, and a judgment module 940. The super parameter optimizing apparatus 900 is configured to perform an initializing step, a confidence determining step, an optimal candidate step, and a judging step in accordance with predetermined logic.

The initialization module 910 is configured to perform an initialization step: and obtaining a candidate hyper-parameter set, wherein the candidate hyper-parameter set comprises a plurality of candidate hyper-parameters and performance indexes of a first part of hyper-parameters in the plurality of candidate hyper-parameters, and the performance indexes are used for indicating the performance of a machine learning model configured by the corresponding hyper-parameters. In some embodiments, the performance indicator of the first portion of the superparameter may be obtained by testing the performance of a machine learning model configured with the first portion of the superparameter. For example, the candidate hyper-parameters may be a plurality of learning rates, and the performance index of the first portion of hyper-parameters may be a performance of the model after the machine learning model is configured at the first portion of learning rates.

The confidence module 920 is configured to perform a confidence determination step: based on performance indexes of the first part of the super-parameters in the plurality of candidate super-parameters, for each super-parameter in the non-first part of the super-parameters in the plurality of candidate super-parameters, determining the confidence that the super-parameter is the optimal super-parameter, wherein the optimal super-parameter refers to the super-parameter with the optimal performance index. In some embodiments, the confidence level may include being obtained using an acquisition function. In some embodiments, step 220 may serve as a confidence determination step.

The optimal candidate module 930 is configured to perform an optimal candidate step: and determining the optimal candidate superparameter in the non-first part superparameter based on the confidence that each superparameter in the non-first part superparameter is the optimal superparameter, and acquiring the performance index of the optimal candidate superparameter. In some embodiments, it will be determined whether it is the optimal superparameter based on its performance index.

The determination module 940 is configured to perform the determination steps of: determining whether the performance index of the optimal candidate superparameter meets a preset condition, adding the optimal candidate superparameter into the first part of superparameter in response to the performance index of the optimal candidate superparameter not meeting the preset condition, adding the performance index of the optimal candidate superparameter into the candidate superparameter set, turning to a confidence determining step, and determining the optimal candidate superparameter as the optimal superparameter in response to the performance index of the optimal candidate superparameter meeting the preset condition. In some embodiments, the predetermined condition may include: the performance index of the optimal candidate super-parameter exceeds a predetermined threshold or the number of execution times of the judging step exceeds a predetermined number. As an example, the number of iterations of the steps may be limited to end the super parameter optimization, e.g. limiting the predetermined condition to 100 execution of the judging step.

Alternatively, the hyper-parameter optimization apparatus 900 may be used to perform the above-described methods 200, 300, 400, etc. It can be seen that with the super parameter optimization apparatus 900, the above steps can be performed according to predetermined logic to achieve fast, efficient and accurate super parameter optimization.

FIG. 10 illustrates an example system 1000 that includes an example computing device 1010 that represents one or more systems and/or devices that can implement the various techniques described herein. Computing device 1010 may be, for example, a server of a service provider, a device associated with a server, a system-on-chip, and/or any other suitable computing device or computing system. The apparatus 900 for super parameter optimization described above with reference to fig. 9 may take the form of a computing device 1010. Alternatively, the hyper-parameter optimization apparatus 900 may be implemented as a computer program in the form of an application 1016.

The example computing device 1010 as illustrated includes a processing system 1011, one or more computer-readable media 1012, and one or more I/O interfaces 1013 communicatively coupled to each other. Although not shown, computing device 1010 may also include a system bus or other data and command transfer system that couples the various components to one another. The system bus may include any one or combination of different bus structures, such as a memory bus or memory controller, a peripheral bus, a universal serial bus, and/or a processor or local bus that utilizes any of a variety of bus architectures. Various other examples are also contemplated, such as control and data lines.

The processing system 1011 represents functionality that performs one or more operations using hardware. Thus, the processing system 1011 is illustrated as including hardware elements 1014 that may be configured as processors, functional blocks, and the like. This may include implementation in hardware as application specific integrated circuits or other logic devices formed using one or more semiconductors. The hardware elements 1014 are not limited by the materials from which they are formed or the processing mechanisms employed therein. For example, the processor may be comprised of semiconductor(s) and/or transistors (e.g., electronic Integrated Circuits (ICs)). In such a context, the processor-executable instructions may be electronically-executable instructions.

Computer-readable medium 1012 is illustrated as including memory/storage 1015. Memory/storage 1015 represents memory/storage capacity associated with one or more computer-readable media. Memory/storage 1015 may include volatile media such as Random Access Memory (RAM) and/or nonvolatile media such as Read Only Memory (ROM), flash memory, optical disks, magnetic disks, and so forth. The memory/storage 1015 may include fixed media (e.g., RAM, ROM, a fixed hard drive, etc.) and removable media (e.g., flash memory, a removable hard drive, an optical disk, and so forth). The computer-readable medium 1012 may be configured in a variety of other ways as described further below.

The one or more I/O interfaces 1013 represent functions that allow a user to input commands and information to the computing device 1010 using various input devices, and optionally also allow information to be presented to the user and/or other components or devices using various output devices. Examples of input devices include keyboards, cursor control devices (e.g., mice), microphones (e.g., for voice input), scanners, touch functions (e.g., capacitive or other sensors configured to detect physical touches), cameras (e.g., motion that does not involve touches may be detected as gestures using visible or invisible wavelengths such as infrared frequencies), and so forth. Examples of output devices include a display device, speakers, printer, network card, haptic response device, and the like. Accordingly, computing device 1010 may be configured in a variety of ways to support user interaction as described further below.

Computing device 1010 also includes applications 1016. The application 1016 may be, for example, a software instance of the hyper-parameter optimization apparatus 900, and implements the techniques described herein in combination with other elements in the computing device 1010.

Various techniques may be described herein in the general context of software hardware elements or program modules. Generally, these modules include routines, programs, objects, elements, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The terms "module," "functionality," and "component" as used herein generally represent software, firmware, hardware, or a combination thereof. The features of the techniques described herein are platform-independent, meaning that the techniques may be implemented on a variety of computing platforms having a variety of processors.

An implementation of the described modules and techniques may be stored on or transmitted across some form of computer readable media. Computer-readable media can include a variety of media that are accessible by computing device 1010. By way of example, and not limitation, computer readable media may comprise "computer readable storage media" and "computer readable signal media".

"computer-readable storage medium" refers to a medium and/or device that can permanently store information and/or a tangible storage device, as opposed to a mere signal transmission, carrier wave, or signal itself. Thus, computer-readable storage media refers to non-signal bearing media. Computer-readable storage media include hardware such as volatile and nonvolatile, removable and non-removable media and/or storage devices implemented in methods or techniques suitable for storage of information such as computer-readable instructions, data structures, program modules, logic elements/circuits or other data. Examples of a computer-readable storage medium may include, but are not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital Versatile Disks (DVD) or other optical storage, hard disk, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or other storage devices, tangible media, or articles of manufacture adapted to store the desired information and which may be accessed by a computer.

"computer-readable signal medium" refers to a signal bearing medium configured to hardware, such as to send instructions to computing device 1010 via a network. Signal media may typically be embodied in computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave, data signal, or other transport mechanism. Signal media also include any information delivery media. The term "modulated data signal" means a signal that has one or more of its characteristics set or changed in such a manner as to encode information in the signal. By way of example, and not limitation, communication media includes wired media such as a wired network or direct-wired connection, and wireless media such as acoustic, RF, infrared and other wireless media.

As previously described, the hardware elements 1014 and computer-readable media 1012 represent instructions, modules, programmable device logic, and/or fixed device logic implemented in hardware that may be used in some embodiments to implement at least some aspects of the techniques described herein. The hardware elements may include integrated circuits or components of a system on a chip, application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs), complex Programmable Logic Devices (CPLDs), and other implementations in silicon or other hardware devices. In this context, the hardware elements may be implemented as processing devices that perform program tasks defined by instructions, modules, and/or logic embodied by the hardware elements, as well as hardware devices that store instructions for execution, such as the previously described computer-readable storage media.

Combinations of the foregoing may also be used to implement the various techniques and modules described herein. Thus, software, hardware, or program modules and other program modules may be implemented as one or more instructions and/or logic embodied on some form of computer readable storage medium and/or by one or more hardware elements 1014. Computing device 1010 may be configured to implement particular instructions and/or functions corresponding to software and/or hardware modules. Thus, for example, by using the computer-readable storage medium of the processing system and/or the hardware elements 1014, a module may be implemented at least in part in hardware as a module executable by the computing device 1010 as software. The instructions and/or functions may be executable/operable by one or more articles of manufacture (e.g., one or more computing devices 1010 and/or processing systems 1011) to implement the techniques, modules, and examples described herein.

In various implementations, the computing device 1010 may take on a variety of different configurations. For example, computing device 1010 may be implemented as a computer-like device including a personal computer, desktop computer, multi-screen computer, laptop computer, netbook, and the like. Computing device 1010 may also be implemented as a mobile appliance-like device including mobile devices such as mobile telephones, portable music players, portable gaming devices, tablet computers, multi-screen computers, and the like. Computing device 1010 may also be implemented as a television-like device that includes devices having or connected to generally larger screens in casual viewing environments. Such devices include televisions, set-top boxes, gaming machines, and the like.

The techniques described herein may be supported by these various configurations of computing device 1010 and are not limited to the specific examples of techniques described herein. The functionality may also be implemented in whole or in part on the "cloud" 1020 through the use of a distributed system, such as through the platform 1022 described below.

Cloud 1020 includes and/or is representative of a platform 1022 for resources 1024. The platform 1022 abstracts underlying functionality of hardware (e.g., servers) and software resources of the cloud 1020. The resources 1024 may include applications and/or data that can be used when executing computer processing on servers remote from the computing device 1010. The resources 1024 may also include services provided over the internet and/or over subscriber networks such as cellular or Wi-Fi networks.

The platform 1022 may abstract resources and functions to connect the computing device 1010 with other computing devices. The platform 1022 may also be used to abstract a hierarchy of resources to provide a corresponding level of hierarchy of encountered demand for resources 1024 implemented via the platform 1022. Thus, in an interconnect device embodiment, implementation of the functionality described herein may be distributed throughout system 1000. For example, the functionality may be implemented in part on the computing device 1010 and by the platform 1022 that abstracts the functionality of the cloud 1020.

The present disclosure provides a computer readable storage medium having stored thereon computer readable instructions that when executed implement the above-described method for hyper-parameter optimization.

The present disclosure provides a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The computer instructions are read from the computer-readable storage medium by a processor of a computing device, and executed by the processor, cause the computing device to perform the methods for hyper-parameter optimization provided in the various alternative implementations described above.

It should be understood that for clarity, embodiments of the present disclosure have been described with reference to different functional units. However, it will be apparent that the functionality of each functional unit may be implemented in a single unit, in a plurality of units or as part of other functional units without departing from the present disclosure. For example, functionality illustrated to be performed by a single unit may be performed by multiple different units. Thus, references to specific functional units are only to be seen as references to suitable units for providing the described functionality rather than indicative of a strict logical or physical structure or organization. Thus, the present disclosure may be implemented in a single unit or may be physically and functionally distributed between different units and circuits.

It will be understood that, although the terms first, second, third, etc. may be used herein to describe various devices, elements, components or sections, these devices, elements, components or sections should not be limited by these terms. These terms are only used to distinguish one device, element, component, or section from another device, element, component, or section.

Although the present disclosure has been described in connection with some embodiments, it is not intended to be limited to the specific form set forth herein. Rather, the scope of the present disclosure is limited only by the appended claims. Additionally, although individual features may be included in different claims, these may possibly be advantageously combined, and the inclusion in different claims does not imply that a combination of features is not feasible and/or advantageous. The order of features in the claims does not imply any specific order in which the features must be worked. Furthermore, in the claims, the word "comprising" does not exclude other elements, and the term "a" or "an" does not exclude a plurality. Reference signs in the claims are provided merely as a clarifying example and shall not be construed as limiting the scope of the claims in any way.

It will be appreciated that in the specific embodiment of the present application, data such as an account number and a password for user registration is involved. When the above embodiments of the present application relating to such data are applied to a specific product or technology, user approval or consent is required, and the collection, use and processing of the relevant data is required to comply with relevant laws and regulations and standards of the relevant country and region.

Claims

1. A method for optimizing super parameters of a machine learning model, the method comprising performing an initialization step, performing a confidence level determination step, an optimal candidate step, and a judgment step according to predetermined logic:

the initializing step: acquiring a candidate hyper-parameter set, wherein the candidate hyper-parameter set comprises a plurality of candidate hyper-parameters and performance indexes of first part of hyper-parameters in the plurality of candidate hyper-parameters, and the performance indexes are used for indicating the performance of a machine learning model configured by corresponding hyper-parameters;

the confidence determining step: determining a confidence level that the first part of the super-parameters in the plurality of candidate super-parameters are optimal super-parameters according to the performance indexes of the first part of the super-parameters in the plurality of candidate super-parameters, wherein the optimal super-parameters refer to super-parameters with optimal performance indexes;

The optimal candidate step: determining optimal candidate superparameters in the non-first partial superparameters based on the confidence that each superparameter in the non-first partial superparameters is an optimal superparameter, and acquiring performance indexes of the optimal candidate superparameters; the method comprises the steps of,

the judging step comprises the following steps: determining whether the performance index of the optimal candidate hyper-parameters meets a predetermined condition,

in response to the performance indicator of the optimal candidate superparameter not meeting the predetermined condition, adding the optimal candidate superparameter to the first portion of superparameter, and adding the performance indicator of the optimal candidate superparameter to the set of candidate superparameters, proceeding to the confidence determining step,

and determining the optimal candidate super-parameters as optimal super-parameters in response to the performance indexes of the optimal candidate super-parameters meeting the preset conditions.

2. The method of claim 1, wherein the confidence determining step further comprises:

performing first transformation on the super parameters in the super parameter set, and determining super parameter transformation values of the super parameters, wherein the super parameter transformation values meet normal distribution;

performing second transformation on the performance index of the first part of the super parameters of the super parameter set, and determining the performance index transformation value of the first part of the super parameters, wherein the performance index transformation value meets normal distribution;

Determining a first confidence coefficient of each super-parameter conversion value of the non-first part super-parameters based on the super-parameter conversion value of the first part super-parameters and the performance index conversion value thereof; the method comprises the steps of,

and determining the first confidence of the superparameter transformation value of each superparameter as the confidence that the corresponding superparameter is the optimal superparameter.

3. The method of claim 2, wherein first transforming a superparameter of the superparameter set, determining a superparameter transform value for the superparameter comprises:

acquiring a normalization value of the super parameter, wherein the normalization value is between 0 and 1;

performing first mapping on the normalized value of the super parameter, and determining a first mapping value of the normalized value of the current super parameter; the method comprises the steps of,

determining a first mapped value of the normalized value of the superparameter as a superparameter transform value of the current superparameter,

wherein the normalized value of the superparameter and the first mapped value of the normalized value of the current superparameter satisfy the equation:

wherein h is _i Is the normalized value of the super parameter, H _i Is a first mapping value of the normalized values of the superparameters, a is the normalized value of the smallest superparameter in the superparameter set, and b is the normalized value of the largest superparameter in the superparameter set.

4. The method of claim 2, wherein performing a second transformation on performance metrics of a first portion of the superparameter set, determining performance metric transformation values for the first portion of the superparameter comprises:

acquiring performance indexes of the super parameters from the super parameter set;

performing second mapping on the performance index of the super parameter, and determining a second mapping value of the performance index of the super parameter; the method comprises the steps of,

determining a second mapped value of the performance index of the super parameter as a performance index transformed value of the super parameter,

wherein the performance index of the hyper-parameter and the second mapped value of the performance index of the hyper-parameter satisfy the equation:

wherein P is _i Is the second mapping value, p, of the performance index of the super parameter _i Is a performance index of the super-parameter,is the mean value of the performance indicators in the hyper-parameter set,/->Is the standard deviation of the performance index in the hyper-parameter set.

5. The method of claim 2, wherein determining, for each of the superparameter's superparameter transform values in the non-first portion of superparameters, a first confidence level for the superparameter's transform value having a maximum performance index transform value based on the superparameter transform value of the first portion of superparameter and its performance index transform value comprises:

Establishing a statistical model based on the hyper-parameter transformation values of the first part of hyper-parameters and the performance index transformation values thereof, wherein the statistical model is used for determining the expectation and variance of the performance index transformation values of the hyper-parameter transformation values;

determining, for each of the non-first portion of superparameter, a desired and variance of its performance index transform values based on the statistical model; the method comprises the steps of,

based on the expected and variance of the performance index transform values of the superparameter transform values of each of the non-first portion of superparameters, a first confidence that the corresponding superparameter transform value has the largest performance index transform value is determined.

6. The method of claim 5, wherein determining, based on the expected and variance of the performance indicator transform values of the superparameter transform values of each of the non-first portion of superparameters, a first confidence that the corresponding superparameter transform value has a maximum performance indicator transform value comprises:

obtaining a confidence function, wherein the confidence function is used for determining confidence based on expectations and variances;

substituting the expected and variance of the performance index transformation values of the super-parameter transformation values of each super-parameter in the non-first part of super-parameters into the confidence function; the method comprises the steps of,

And determining the output of the confidence function as the first confidence that the corresponding hyper-parameter transformation value of the hyper-parameter has the maximum performance index transformation value.

7. The method of claim 6, wherein obtaining a confidence function comprises:

acquiring a first acquisition function, a second acquisition function and a third acquisition function; the method comprises the steps of,

determining the confidence function based on the first, second, and third acquisition functions, the confidence function, the first, second, and third acquisition functions satisfying an equation:

wherein,is the confidence function,/->Is said first acquisition function, +.>Is said second acquisition function, +.>Is the third acquisition function, +.>Is the weight of the first acquisition function,/-or->Is the weight of the second acquisition function,/-or->Is the weight of the third acquisition function and +.>。

8. The method of claim 7, the first, second, and third acquisition functions each comprising: one or more of an expected increment acquisition function, a probability increment acquisition function, and a confidence upper bound acquisition function, and weights of the first acquisition function, the second acquisition function, and the third acquisition function are all between 0 and 1 and are randomly generated.

9. The method of claim 5, the statistical model comprising one or both of a gaussian mixture model, a gaussian model.

10. The method of claim 1, wherein obtaining a candidate hyper-parameter set comprises:

obtaining a plurality of candidate hyper-parameters;

determining the first partial hyper-parameter of the plurality of candidate hyper-parameters;

acquiring performance indexes of the first partial super parameters in the plurality of candidate super parameters; the method comprises the steps of,

and establishing the candidate super-parameter set, wherein the candidate super-parameter set comprises the plurality of candidate super-parameters and performance indexes of the first part of super-parameters in the plurality of candidate super-parameters.

11. The method of claim 10, wherein determining a first partial-superparameter of the plurality of candidate superparameters comprises:

dividing the candidate hyper-parameters into a first number of candidate hyper-parameter groups, wherein the probability that each candidate hyper-parameter group in the first number of candidate hyper-parameter groups contains the optimal hyper-parameters is equal; the method comprises the steps of,

and randomly acquiring a second number of candidate superparameters from each candidate superparameter in the first number of candidate superparameter groups respectively to serve as the first partial superparameter.

12. The method of claim 10, wherein obtaining a performance indicator of a first partial-superparameter of the plurality of candidate superparameters comprises:

acquiring a training set for training the machine learning model;

configuring the machine learning model therewith for each of the first partial superparameters;

training the machine learning model with the training set to determine parameters of the machine learning model;

testing performance of a trained machine learning model to determine performance metrics of the trained machine learning model; the method comprises the steps of,

a performance index of the trained machine learning model is determined as a performance index of a corresponding one of the first portion of superparameters.

13. The method of claim 1, the predetermined condition comprising: the performance index of the optimal candidate super-parameters exceeds a preset threshold value or the execution times of the judging step exceeds a preset number.

14. The method of claim 1, wherein obtaining a performance indicator of the optimal candidate hyper-parameter comprises:

acquiring a training set for training the machine learning model;

configuring the machine learning model with each of the optimal candidate hyper-parameters;

and determining the performance index of the trained machine learning model as the performance index of the corresponding super-parameter in the optimal candidate super-parameters.

15. A super-parametric optimization device of a machine learning model, wherein the super-parametric optimization device is configured to perform an initialization step, a confidence determination step, an optimal candidate step, and a judgment step according to predetermined logic, the super-parametric optimization device comprising:

an initialization module configured to perform the initialization step: acquiring a candidate hyper-parameter set, wherein the candidate hyper-parameter set comprises a plurality of candidate hyper-parameters and performance indexes of first part of hyper-parameters in the plurality of candidate hyper-parameters, and the performance indexes are used for indicating the performance of a machine learning model configured by corresponding hyper-parameters;

a confidence module configured to perform the confidence determining step: determining a confidence level that the first part of the super-parameters in the plurality of candidate super-parameters are optimal super-parameters according to the performance indexes of the first part of the super-parameters in the plurality of candidate super-parameters, wherein the optimal super-parameters refer to super-parameters with optimal performance indexes;

An optimal candidate module configured to perform the optimal candidate step: determining optimal candidate superparameters in the non-first partial superparameters based on confidence that each superparameter in the non-first partial superparameters is an optimal superparameter, and acquiring performance indexes of the optimal candidate superparameters; the method comprises the steps of,

a judgment module configured to perform the judgment step: determining whether the performance index of the optimal candidate hyper-parameters meets a predetermined condition,

16. A computing device, comprising:

a memory configured to store computer-executable instructions; and

a processor configured to perform the method according to any of claims 1-14 when the computer executable instructions are executed by the processor.

17. A computer readable storage medium storing computer executable instructions which when executed implement the method of any one of claims 1-14.

18. A computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to any one of claims 1 to 14.