[go: up one dir, main page]

CN111401546A - Training method of neural network model and its medium and electronic device - Google Patents

Training method of neural network model and its medium and electronic device Download PDF

Info

Publication number
CN111401546A
CN111401546A CN202010086380.0A CN202010086380A CN111401546A CN 111401546 A CN111401546 A CN 111401546A CN 202010086380 A CN202010086380 A CN 202010086380A CN 111401546 A CN111401546 A CN 111401546A
Authority
CN
China
Prior art keywords
network layer
weights
initial
weight
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010086380.0A
Other languages
Chinese (zh)
Other versions
CN111401546B (en
Inventor
刘默翰
周力
白立勋
石文元
俞清华
隋志成
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010086380.0A priority Critical patent/CN111401546B/en
Publication of CN111401546A publication Critical patent/CN111401546A/en
Application granted granted Critical
Publication of CN111401546B publication Critical patent/CN111401546B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

The application relates to the technical field of neural networks, and discloses a training method of a neural network model, a medium and electronic equipment thereof. The training method of the neural network model comprises the following steps: a first network layer in the n network layers acquires sample data and inputs the sample data into a second network layer; for the ith network layer of the n network layers, the following operations are performed: and when i is equal to 2, obtaining output data of the ith network layer based on the initial input data and a plurality of initial weights of the ith network layer, and when 2< i is equal to or less than n, obtaining output data of the ith network layer based on the output data of the (i-1) th network layer and a plurality of initial weights of the ith network layer, wherein the plurality of initial weights of the ith network layer are obtained based on m discrete values. According to the method and the device, the plurality of initial weights of the neural network model are set to be low-bit discrete values, the problem that the gradient of the neural network model disappears in the low-bit privilege weight training process can be effectively avoided, and convergence of the neural network model is accelerated.

Description

Training method of neural network model, medium thereof, and electronic device
Technical Field
The present disclosure relates to the field of neural network technologies, and in particular, to a training method for a neural network model, a medium and an electronic device thereof.
Background
The neural network model is an operational model formed by connecting a large number of nodes (or called neurons) with each other. A common neural network model includes an input layer, an output layer, and a plurality of hidden layers (also referred to as hidden layers). The inputs to each node of each layer are typically weighted, thus generating a weighted sum (or other weighted operation result) at each node. The weight of each layer may be adjusted during training.
During training of the traditional neural network model, each training process adopts a random initialization mode to initialize the weight of the neural network model. The weights of the traditional neural network model are generally floating point numbers within a certain value range, and the random initialization mode is to train from any floating point number within the value range. In this training process, a large number of floating point numbers and a plurality of training processes make training of the neural network model require a long time.
Disclosure of Invention
The embodiment of the application provides a training method of a neural network model, a medium and electronic equipment thereof.
In a first aspect, an embodiment of the present application provides a method for training a neural network model, where the neural network model includes n network layers, where n is a positive integer greater than 1; and the method comprises:
a first network layer in the n network layers acquires sample data and inputs the sample data into a second network layer, wherein the sample data comprises initial input data and expected result data;
for the ith network layer of the n network layers, performing the following operations:
when i is 2, a plurality of initial weights based on the initial input data and the ith network layer
Figure BDA0002382198810000011
The output data of the ith network layer is obtained,
when 2 is in<When i is less than or equal to n, based on output data of the i-1 th network layer and multiple initial weights of the i-th network layer
Figure BDA0002382198810000012
Obtaining output data of the ith network layer, wherein,
the plurality of initial weights of the ith network layer
Figure BDA0002382198810000013
Is derived based on m discrete values, wherein the plurality of initial weights
Figure BDA0002382198810000014
Has a numerical value range of
Figure BDA0002382198810000015
And m ═ {2, 3}, i.e., the discrete values can be two or three;
the plurality of initial weights for the i-th network layer based on an error between output data of the n network layers and expected result data in the sample data
Figure BDA0002382198810000016
And (6) carrying out adjustment.
For example, the plurality of initial weights of the ith network layer
Figure BDA0002382198810000017
Can be set to either-1, 1 or-1, 0, 1. That is, in the present embodiment, in order to make the final weights defined as 1 and-1 and convert the multiplication operation into an exclusive or operation between bits to reduce the memory access rate and the occupancy rate, a plurality of initial weights of the neural network model are used
Figure BDA0002382198810000018
Set as the distances of { -1, 1} or { -1, 0, 1 { -Divergence, accelerating the convergence of the model while avoiding disappearance of the gradient of the model.
In a possible implementation of the first aspect, the method further includes: the plurality of initial weights of the ith network layer
Figure BDA0002382198810000021
Each of which is one of m discrete values.
In a possible implementation of the first aspect, the method further includes: the m discrete values are-1 and 1, and the plurality of initial weights of the i-th network layer
Figure BDA0002382198810000022
Has a mean value of 0 and a variance of 1.
In a possible implementation of the first aspect, the method further includes: the m discrete values are-1, 0 and 1, and the plurality of initial weights of the i-th network layer
Figure BDA0002382198810000023
Has a mean value of 0 and a variance of 2/3.
In a possible implementation of the first aspect, the method further includes: the ith network layer has p initial weights
Figure BDA0002382198810000024
And the p initial weights of the i-th network layer
Figure BDA0002382198810000025
Calculated by the following formula:
Figure BDA0002382198810000026
wherein, WbFor one of the m discrete values, the WbHas a value range of-1 to Wb≦ 1, and corresponding to the p initial weights
Figure BDA0002382198810000027
P of WbIf a value is simply selected from discrete values as an initial weight, there may be a case where the input data and the output data are not distributed uniformly, so in order to make the distributions of the input data and the output data of the network layers substantially uniform, the scaling factor is set here, wherein the scaling factor is obtained by a normalization method to scale the variance of the initial weight of the neural network model, so that the neural network model can be propagated to a deeper layer.
In a possible implementation of the first aspect, the method further includes: corresponding to the p initial weights
Figure BDA0002382198810000028
P of WbIs 1 and the m discrete values are-1 and 1.
In a possible implementation of the first aspect, the method further includes: corresponding to the p initial weights
Figure BDA0002382198810000029
P of WbIs 2/3, the m discrete values are-1, 0 and 1.
In a possible implementation of the first aspect, the method further includes: the scaling factor is obtained by the following formula:
Figure BDA00023821988100000210
wherein,
Figure BDA00023821988100000211
is the p WbCorresponding to the p initial weights
Figure BDA00023821988100000212
Dispersion of jth initial weight in (1)The value of the one or more of,
Figure BDA00023821988100000213
is p Ws of the ith network layerbAverage value of liThe number of input channels of the ith network layer.
In a possible implementation of the first aspect, the method further includes: the plurality of initial weights of the ith network layer
Figure BDA00023821988100000214
Calculated by the following formula:
Figure BDA00023821988100000215
wherein, WtAny one of a plurality of weights determined for a previous training of the ith network layer, the WtHas a value range of-1 to WtIf a value is simply selected from discrete values as an initial weight, there may be a case where the input data and the output data are not distributed uniformly, so in order to make the distributions of the input data and the output data of the network layers substantially uniform, a scaling factor is set here, wherein the scaling factor is obtained by a normalization method to scale the variance of the initial weight of the neural network model, so that the neural network model can be propagated to a deeper layer.
In a possible implementation of the first aspect, the method further includes: the scaling factor is obtained by the following formula:
Figure BDA0002382198810000031
wherein p is the number of weights determined by the ith network layer from the previous training, Wj tRepresenting the jth weight of the weights determined from the p previous trains,
Figure BDA0002382198810000032
the average of the weights determined for the p previous trains.
In a possible implementation of the first aspect, the method further includes: the scaling factor is obtained by the following formula:
Figure BDA0002382198810000033
wherein liIs the input channel number, l, of the ith network layeri+1The number of input channels of the (i + 1) th network layer.
In a second aspect, an embodiment of the present application provides a method for training a neural network model, where the neural network model includes n network layers and the neural network model has converged, n is a positive integer greater than 1; and is
The method is used for a converged neural network model, and is used for carrying out low-bit quantization on the full-precision weight of a training number of the neural network model so as to convert multiplication operation into exclusive-nor operation among bits and reduce memory access rate and occupancy rate. Specifically, the method comprises the following steps:
a first network layer in the n network layers acquires sample data and inputs the sample data into a second network layer, wherein the sample data comprises initial input data and expected result data;
for the ith network layer of the n network layers, performing the following operations:
when i is 2, performing symbol dereferencing on the multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure BDA0002382198810000034
And based on the initial input data and the plurality of initial weights
Figure BDA0002382198810000035
The output data of the ith network layer is obtained,
when 2 is in<i≤When n, carrying out symbol dereferencing on the multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure BDA0002382198810000036
And based on the output data of the i-1 st network layer and the plurality of initial weights
Figure BDA0002382198810000037
Obtaining output data of the ith network layer, wherein,
the plurality of initial weights of the ith network layer
Figure BDA0002382198810000038
Is derived based on m discrete values, and the plurality of initial weights
Figure BDA0002382198810000039
Has a numerical value range of
Figure BDA00023821988100000310
And m ═ {2, 3}, i.e., the discrete values can be two or three;
the plurality of initial weights for the i-th network layer based on an error between output data of the n network layers and expected result data in the sample data
Figure BDA00023821988100000311
And (6) carrying out adjustment.
In a possible implementation of the second aspect, the method further includes: the m discrete values are-1 and 1, the plurality of initial weights of the i network layer
Figure BDA00023821988100000312
Has a mean value of 0 and a variance of 1; and is
The obtaining of the plurality of initial weights of the ith network layer by performing symbol dereferencing on the plurality of full-precision weights of the ith network layer includes:
if the full-precision weight is less than or equal to 0, taking-1 as an initial weight corresponding to the full-precision weight;
and if the full-precision weight is larger than 0, taking 1 as an initial weight corresponding to the full-precision weight.
Namely, symbol dereferencing is carried out on the trained full-precision weight in the converged neural network model, and the symbol dereferencing is converted into one of 1 and-1.
In a possible implementation of the second aspect, the method further includes: the m discrete values are-1, 0 and 1, the plurality of initial weights of the i network layer
Figure BDA00023821988100000313
Has a mean value of 0 and a variance of 2/3; and is
Performing symbol dereferencing on the multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure BDA0002382198810000041
The method comprises the following steps:
if the full-precision weight is less than 0, taking-1 as an initial weight corresponding to the full-precision weight
Figure BDA0002382198810000042
If the full-precision weight is equal to 0, taking 0 as an initial weight corresponding to the full-precision weight
Figure BDA0002382198810000043
If the full-precision weight is larger than 0, taking 1 as an initial weight corresponding to the full-precision weight
Figure BDA0002382198810000044
Namely, symbol value taking is carried out on the trained full-precision weight in the converged neural network model, and the value is converted into one of 1, 0 and-1.
In a possible implementation of the second aspect, the method further includes: the m discrete values are-1 and 1, and,
performing symbol dereferencing on the multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure BDA0002382198810000045
The method comprises the following steps:
if the full-precision weight is less than or equal to 0, taking the product of-1 and the scaling factor as the initial weight corresponding to the full-precision weight
Figure BDA0002382198810000046
If the full-precision weight is greater than 0, taking the product of 1 and the scaling factor as the initial weight corresponding to the full-precision weight
Figure BDA0002382198810000047
Wherein the scaling factor is a positive number smaller than 1, and is used for adjusting the distribution of the output data of the ith network layer.
Namely, the symbol value is taken for the trained full-precision weight in the converged neural network model, and the symbol value is converted into the product of one of 1 and-1 and the scaling factor. If the value is simply selected from 1 and-1 as the initial weight after the value is taken as the sign, there may be a case that the input data and the output data of the network layer are not distributed uniformly, so in order to make the distribution of the input data and the output data of the network layer substantially uniform, a scaling factor is set here, wherein the scaling factor is obtained by a normalization method to scale the variance of the initial weight of the neural network model, so that the neural network model can be propagated deeper.
In a possible implementation of the second aspect, the method further includes: the m discrete values are-1, 0 and 1, and the symbol dereferencing is performed on the multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure BDA0002382198810000048
The method comprises the following steps:
if the full-precision weight is less than 0, taking the product of-1 and the scaling factor as the initial weight corresponding to the full-precision weight
Figure BDA0002382198810000049
If the full-precision weight is equal to 0, taking 0 as an initial weight corresponding to the full-precision weight
Figure BDA00023821988100000410
If the full-precision weight is greater than 0, taking the product of 1 and the scaling factor as the initial weight corresponding to the full-precision weight
Figure BDA00023821988100000411
Wherein the scaling factor is a positive number smaller than 1, and is used for adjusting the distribution of the output data of the ith network layer.
Namely, the symbol value is taken for the trained full-precision weight in the converged neural network model, and the symbol value is converted into the product of one of 1 and-1 and the scaling factor. If the value is simply selected from 1 and-1 as the initial weight after the value is taken as the sign, there may be a case that the input data and the output data of the network layer are not distributed uniformly, so in order to make the distribution of the input data and the output data of the network layer substantially uniform, a scaling factor is set here, wherein the scaling factor is obtained by a normalization method to scale the variance of the initial weight of the neural network model, so that the neural network model can be propagated deeper.
In a possible implementation of the second aspect, the method further includes: the scaling factor is obtained by the following formula:
Figure BDA0002382198810000051
where α is the scaling factor,/iIs the input channel number, l, of the ith network layeri+1The number of input channels of the (i + 1) th network layer.
In a possible implementation of the second aspect, the method further includes: the scaling factor is obtained by the following formula:
Figure BDA0002382198810000052
wherein α is a scaling factor, p represents the number of initial weights of the ith network layer, and Wj zIs one of-1 and 1, and corresponds to a jth initial weight of the p initial weights, and p W initial weights corresponding to the p initial weightsj zHas a mean value of 0 and a variance of 1;
Figure BDA0002382198810000053
is the p Wj zAverage value of (d); liThe number of input channels of the ith network layer.
In a possible implementation of the second aspect, the method further includes: the scaling factor is obtained by the following formula:
Figure BDA0002382198810000054
wherein α is a scaling factor, p represents the number of initial weights of the ith network layer, and Wj qIs one of-1, 0, 1, and corresponds to jth initial weight of p initial weights, and p W of the p initial weightsj qHas a mean value of 0 and a variance of 2/3;
Figure BDA0002382198810000055
is the p Wj qAverage value of (d); liThe number of input channels of the ith network layer.
In a possible implementation of the second aspect, the method further includes: the sample data comprises image data, and the neural network model is used for image recognition.
In a third aspect, an embodiment of the present application provides an electronic device for training a neural network model, including:
the first data acquisition module is used for acquiring sample data and inputting the sample data into a second network layer, wherein the sample data comprises initial input data and expected result data;
a first data processing module for performing the following operations:
when i is 2, a plurality of initial weights based on the initial input data and the ith network layer
Figure BDA0002382198810000056
The output data of the ith network layer is obtained,
when 2 is in<When i is less than or equal to n, based on output data of the i-1 th network layer and multiple initial weights of the i-th network layer
Figure BDA0002382198810000057
Obtaining output data of the ith network layer, wherein,
the plurality of initial weights of the ith network layer
Figure BDA0002382198810000058
Is derived based on m discrete values, wherein the plurality of initial weights
Figure BDA0002382198810000059
Has a numerical value range of
Figure BDA00023821988100000510
And m ═ {2, 3 };
a first weight adjustment module for adjusting the plurality of initial weights of the ith network layer based on an error between output data of the n network layers and expected result data in the sample data.
In a fourth aspect, an embodiment of the present application provides an electronic device for training a neural network model, including:
the second data acquisition module is used for acquiring sample data and inputting the sample data into a second network layer, wherein the sample data comprises initial input data and expected result data;
a second data processing module for performing the following operations
When i is 2, performing symbol dereferencing on the multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure BDA0002382198810000061
And based on the initial input data and the plurality of initial weights
Figure BDA0002382198810000062
The output data of the ith network layer is obtained,
when 2 is in<When i is less than or equal to n, carrying out symbol dereferencing on the multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure BDA0002382198810000063
And based on the output data of the i-1 st network layer and the plurality of initial weights
Figure BDA0002382198810000064
Obtaining output data of the ith network layer, wherein,
the plurality of initial weights of the ith network layer
Figure BDA0002382198810000065
Is derived based on m discrete values, and the plurality of initial weights
Figure BDA0002382198810000066
Has a numerical value range of
Figure BDA0002382198810000067
And m ═ {2, 3 };
a second weight adjustment module for adjusting the weight of the network based on the n networksError between output data of a layer and expected result data in the sample data, the plurality of initial weights to the i-th network layer
Figure BDA0002382198810000068
And (6) carrying out adjustment.
In a fifth aspect, an embodiment of the present application provides a computer-readable medium, where instructions are stored on the computer-readable medium, and when the instructions are executed on a computer, the instructions cause the computer to perform a method for training a neural network model according to any one of the first and second aspects.
In a sixth aspect, an embodiment of the present application provides an electronic device, including:
a memory for storing instructions for execution by one or more processors of the system, an
The processor is one of the processors of the system, and is configured to execute the method for training the neural network model according to any one of the first aspect and the second aspect.
Drawings
FIG. 1 illustrates a block diagram of an electronic device, according to some embodiments of the present application;
FIG. 2 is a schematic diagram of a neural network model;
FIG. 3 illustrates a schematic diagram of a computational process of a node of a neural network model, according to some embodiments of the present application;
FIG. 4 is a graph of an output distribution of activation functions of several layers near the output layer in a randomly initialized neural network model using a Gaussian distribution with a mean of 0 and a variance of 1;
FIG. 5(a) is a weight distribution graph illustrating 1-bit weight initialization of a convolutional neural network model, according to some embodiments of the present application;
FIG. 5(b) is a graph showing a weight distribution over a period of time during model convergence after a convolutional neural network model is initialized with 1-bit weights, according to some embodiments of the present application;
FIG. 5(c) is a graph illustrating a convolutional neural network model trained after initialization with 1-bit weights, the weight distribution after model convergence, according to some embodiments of the present application;
FIG. 6(a) is a graph illustrating training of a 1-bit fixed-point quantization model using an Xavier initialization function;
FIG. 6(b) illustrates a graph for training a 1-bit fixed-point quantization model using the weight initialization method illustrated in FIG. 2, according to some embodiments of the present application;
FIG. 7 illustrates a schematic diagram of an electronic device for training a neural network model, according to some embodiments of the present application;
FIG. 8 illustrates a schematic structural diagram of another electronic device for training a neural network model, in accordance with some embodiments of the present application;
fig. 9 illustrates a schematic structural diagram of an electronic device, according to some embodiments of the present application.
Detailed Description
Illustrative embodiments of the present application include, but are not limited to, a weight initialization method, apparatus, medium, and electronic device for a neural network.
It is to be appreciated that as used herein, the term module may refer to or include an Application Specific Integrated Circuit (ASIC), an electronic circuit, a processor (shared, dedicated, or group) and/or memory that executes one or more software or firmware programs, a combinational logic circuit, and/or other suitable hardware components that provide the described functionality, or may be part of such hardware components.
It is to be appreciated that in various embodiments of the present application, the processor may be a microprocessor, a digital signal processor, a microcontroller, or the like, and/or any combination thereof. According to another aspect, the processor may be a single-core processor, a multi-core processor, the like, and/or any combination thereof.
Embodiments of the present application will be described in further detail below with reference to the accompanying drawings.
It is to be understood that the Neural Network model provided in the present application may be any artificial Neural Network model that employs multiply-add operation, such as Convolutional Neural Network (CNN), Deep Neural Network (DNN), and Recurrent Neural Network (RNN), Binary Neural Network (BNN), and the like.
It is to be appreciated that the method of weight initialization for neural networks provided herein can be implemented on a variety of electronic devices including, but not limited to, a server, a distributed server cluster of multiple servers, a cell phone, a tablet, a laptop, a desktop computer, a wearable device, a head mounted display, a mobile email device, a portable game console, a portable music player, a reader device, a personal digital assistant, a virtual reality or augmented reality device, a television or other electronic device having one or more processors embedded or coupled therein, and the like.
Particularly, the weight initialization of the neural network provided by the application is suitable for edge equipment, edge calculation is a distributed open platform (framework) which integrates network, calculation, storage and application core capabilities at the edge side of a network close to an object or a data source, edge intelligent service is provided nearby, and the key requirements of real-time business, data optimization, application intelligence, safety, privacy protection and the like can be met. For example, the edge device may be a device capable of performing edge calculation on video data near a video data source (network smart camera) in a video surveillance system.
The following describes a weight initialization scheme of the neural network disclosed in the present application, taking the electronic device 100 as an example.
Fig. 1 illustrates a block diagram of an electronic device 100, according to some embodiments of the present application. Specifically, as shown in FIG. 1, electronic device 100 includes one or more processors 104, system control logic 108 coupled to at least one of processors 104, system memory 112 coupled to system control logic 108, non-volatile memory (NVM)116 coupled to system control logic 108, and network interface 120 coupled to system control logic 108.
In some embodiments, the processor 104 may include one or more single-core or multi-core processors. In some embodiments, the processor 104 may include any combination of general-purpose processors and special-purpose processors (e.g., graphics processors, application processors, baseband processors, etc.). In embodiments where the electronic device 100 employs an enhanced Node B (eNB) or a Radio Access Network (RAN) controller, the processor 104 may be configured to perform various consistent embodiments.
In some embodiments, the processor 104 may be configured to invoke training information to train out a neural network model. Specifically, for example, the processor 104 may obtain initialization information of weights of the neural network model and input data information (e.g., image information, voice information, etc.), and train the neural network model. The neural network model can be quantized into a binary network or a ternary network, and the weight of the neural network model can be set to a preset discrete numerical value. In each layer of training of the neural network model, the processor 104 continuously adjusts the weights according to the obtained training information until the model converges. The processor 104 may also periodically update the neural network model to better accommodate changes in the various actual requirements of the neural network model.
In some embodiments, system control logic 108 may include any suitable interface controllers to provide any suitable interface to at least one of processors 104 and/or any suitable device or component in communication with system control logic 108.
In some embodiments, system control logic 108 may include one or more memory controllers to provide an interface to system memory 112. System memory 112 may be used to load and store data and/or instructions. Memory 112 of electronic device 100 may comprise any suitable volatile memory in some embodiments, such as suitable Dynamic Random Access Memory (DRAM). In some embodiments, system memory 112 may be used to load or store instructions that implement the neural network model described above, or system memory 112 may be used to load or store instructions that implement an application that utilizes the neural network model described above.
NVM/memory 116 may include one or more tangible, non-transitory computer-readable media for storing data and/or instructions. In some embodiments, NVM/memory 116 may include any suitable non-volatile memory, such as flash memory, and/or any suitable non-volatile storage device, such as at least one of a Hard Disk Drive (HDD), Compact Disc (CD) Drive, and Digital Versatile Disc (DVD) Drive. NVM/memory 116 may also be used to store the trained weights for the neural network model described above.
NVM/memory 116 may comprise a portion of a storage resource on a device on which electronic device 100 is installed, or it may be accessible by, but not necessarily a part of, the device. For example, NVM/storage 116 may be accessed over a network via network interface 120.
In particular, system memory 112 and NVM/storage 116 may each include: a temporary copy and a permanent copy of instructions 124. The instructions 124 may include: instructions that, when executed by at least one of the processors 104, cause the electronic device 100 to implement the method as shown in fig. 3. In some embodiments, the instructions 124, hardware, firmware, and/or software components thereof may additionally/alternatively be disposed in the system control logic 108, the network interface 120, and/or the processor 104.
Network interface 120 may include a transceiver to provide a radio interface for electronic device 100 to communicate with any other suitable device (e.g., front end module, antenna, etc.) over one or more networks. In some embodiments, the network interface 120 may be integrated with other components of the electronic device 100. For example, network interface 120 may be integrated with at least one of processor 104, system memory 112, NVM/storage 116, and a firmware device (not shown) having instructions that, when executed by at least one of processors 104, electronic device 100 implements the method shown in fig. 3.
The network interface 120 may further include any suitable hardware and/or firmware to provide a multiple-input multiple-output radio interface. For example, network interface 120 may be a network adapter, a wireless network adapter, a telephone modem, and/or a wireless modem.
In some embodiments, at least one of the processors 104 may be packaged together with logic for one or more controllers of the system control logic 108 to form a System In Package (SiP). In some embodiments, at least one of the processors 104 may be integrated on the same die with logic for one or more controllers of the system control logic 108 to form a system on a chip (SoC).
The electronic device 100 may further include: input/output (I/O) devices 132. I/O device 132 may include a user interface to enable a user to interact with electronic device 100; the design of the peripheral component interface enables peripheral components to also interact with the electronic device 100. In some embodiments, the electronic device 100 further comprises a sensor for determining at least one of environmental conditions and location information associated with the electronic device 100.
It is understood that the method for weight initialization provided by the embodiments of the present application is applicable to example applications of neural network models including, but not limited to, image recognition, voice recognition in the field of machine vision, and the like.
In the following, according to some embodiments of the present application, a technical solution for training the neural network model 200 shown in fig. 2 by using the electronic device 100 shown in fig. 1 is described in detail, taking image recognition as an example (for performing facial recognition, recognizing facial features such as mouth shape, eyebrow feature, and eye feature in a face image, for example).
Specifically, as shown in fig. 2, the neural network model 200 includes n network layers of an input layer, a plurality of hidden layers, and an output layer. Wherein the first layer is called an input layer, the last layer is called an output layer, and the other layers are called hidden layers. Each layer has a number of nodes (e.g., s nodes for the input layer in fig. 2), each node having a corresponding weight. All cross connections are formed between layers of the n network layers, and the output of the previous layer is the input of the next adjacent layer. The calculation formula of each node in the network layer is as follows:
y=f(Wx+b)
where W is the weight, b is the offset, x is the input, y is the output, and f is the activation function.
The following describes in detail a specific process of training the neural network model 200 shown in fig. 2 by using sample images when performing image recognition, for example, face recognition.
In training the neural network model 200, a large amount of sample image data and expected result data may be input into the model 200, where the image data of each sample image is input into s nodes of the input layer of the neural network model 200 shown in fig. 2, subjected to the calculation of the hidden layer, and finally calculated by the output layer to generate face recognition result data. It should be noted that, when a large number of sample images are used to train the model, each complete training process corresponds to only one image, for example, there are 1000 images in total, the first image may be trained first, and after the training of the first image is completed, the second image may be trained, and so on, until the neural network model 200 converges. After each image training is completed, the face recognition result data finally output by the neural network model 200 is compared with the expected result data to calculate an error, a partial derivative is calculated according to the error, and the weight of each node in the network layer other than the input layer is adjusted based on the calculated partial derivative. In this way, the neural network model 200 is trained by inputting the image data of the image, the weights are adjusted continuously, and when the error between the face recognition result data finally output by the neural network model 200 and the expected result data is smaller than the error threshold, it is determined that the neural network model 200 converges.
In particular, FIG. 3 illustrates the computational process of the network layers in the neural network model 200 in some embodiments. As shown in fig. 3, the calculation process of each network layer in the neural network model 200 includes:
1. input layer computation
Image data of a sample image a is input to the input layer as input data.
2. Computation of hidden layers
a) The input layer outputs image data of the sample image a to the first hidden layer. For example, the input image data may be color information (e.g., numbers between 0 and 255 of an RGB color space) of each pixel point in the image, and the image data is input into s nodes (i.e., inputs x1, x2 to xs) of the input layer (first layer) of the model shown in fig. 2.
b) And initializing the weight of each hidden layer to obtain an initial weight.
In embodiments of the present application, the initialization model with a low bit weight is used to obtain the initial weight of each hidden layer, for example, a 1-bit weight initialization or a 2-bit weight initialization model is used, the weight range in the 1-bit weight initialization is (1, -1), and the weight range in the 2-bit weight initialization is (1, 0, -1), that is, when the 1-bit weight initialization model is used to initialize the weight, the initial weight value of a certain node in the hidden layer is set to one of values 1 and-1, and all weights of the hidden layer need to satisfy a distribution with a mean value of 0 and a variance of 1. When the 2-bit weight initialization model is used for initializing the weights, the initial weight value of a certain node in the hidden layer is set to be one of three values of-1, 0 and 1, and all the weights of the hidden layer need to satisfy the distribution that the mean value is 0 and the variance is 2/3.
Specifically, for example, in some embodiments, the floating-point type weights of the untrained neural network model may be initialized using the following 1-bit weight initialization method:
the initial weights of the neural network model 200 are quantized 1 bit, and the quantized initial weights take one of discrete values 1 or-1, and since a distribution with a mean of 0 and a variance of 1 is satisfied, the number of weights taking 1 and-1 in all weights in the same network layer is substantially the same (e.g., 1 and-1 are uniformly sampled by generating a uniform probability through a uniform distribution function uniform).
In addition, in other embodiments, the input and output distribution of each layer of the neural network model is kept substantially consistent to alleviate the problem of gradient disappearance during propagation to deeper layers. The discrete value W selected from 1 and-1 for a node in the quantization process can be obtained based on a preset scaling factorbCompressing to obtain initial weight of neural network model
Figure BDA0002382198810000101
The preset scaling factor is obtained by a normalization method to scale the variance of the weights of the neural network model, so that the network can be propagated to a deeper layer.
For example, in some embodiments, W for discrete values that have already been chosen may be usedbThe compression is as follows:
Figure BDA0002382198810000102
where α is a scaling factor, generally α is a positive decimal number less than 1, for adjusting the distribution of the output data of the ith network layer.
In some embodiments, the scaling factor α may be calculated as follows:
Figure BDA0002382198810000103
wherein liNumber of input channels, l, representing the ith network layer of the neural network modeli+1The number of input channels of the (i + 1) th network layer of the neural network model is represented. It will be appreciated that for the input layer, i is 1 here; i +1 denotes the next network layer to the ith network layer in the neural network model.
In some embodiments, assume that the ith network layer has p initial weights
Figure BDA0002382198810000104
Then there are p W values of the selected discrete valuebCorresponding to p initial weights
Figure BDA0002382198810000105
The scaling factor α may also be calculated as follows:
Figure BDA0002382198810000106
wherein, Wj bRepresents p WbMiddle pairShould P initial weights
Figure BDA0002382198810000107
The discrete value of the jth initial weight in (a),
Figure BDA0002382198810000108
is p Ws of the ith network layerbAverage value of liIndicating the number of input channels of the ith network layer.
In this way, the scaling factor α of the weight of each network layer in the neural network model 200 is calculated, the scaling factor α is multiplied by the weight of the corresponding network layer, and in the training process of the neural network model, the input data of each layer and the compressed weight are subjected to weighting operation, so that the input data and the output data of each layer of the neural network model are transmitted forwards, and the distribution of the input data and the output data of each layer of the neural network model can be ensured to be basically consistent, so that the problem of gradient disappearance in the process of transmitting the input data and the output data to a deeper layer is solved.
It is to be appreciated that the above method of calculating the scaling factor α is merely exemplary and not limiting, and in other embodiments, other normalization methods may be employed to calculate the scaling factor α.
In some embodiments, the weights of the neural network model may be initialized using a 2-bit weight initialization method as follows:
first, in some embodiments, the weights of the floating point type of the untrained neural network model may be quantized 2 bits, the initial weights after quantization may be one of discrete-1, 0, or 1, and the number of weights of 1 and-1 in all weights in the same network layer is substantially the same (e.g., 1 and-1 are uniformly sampled by generating a uniform probability through a uniform distribution function uniform) because of the distribution of mean 0 and variance 2/3.
In particular, in some other embodiments, to ensure that the distribution of inputs and outputs of each layer of the neural network model remains substantially consistent, the problem of gradient disappearance during propagation to deeper layers is alleviated. The discrete value of W selected from 1 and-1 for a node in the quantization process can be calculated based on a preset scaling factorbIs compressedObtaining the initial weight of the neural network model
Figure BDA0002382198810000111
The preset scaling factor is obtained by a normalization method to scale the variance of the weights of the neural network model, so that the network can be propagated to a deeper layer. The quantized discretized values of W may be initialized in a similar manner to the 1-bit weight initialization method described abovebCompressing to obtain initial weight of neural network model
Figure BDA0002382198810000112
For a specific compression method, please refer to the above, which is not described herein again.
Because the full-precision neural network model occupies a large amount of storage space, and the floating-point multiply-add operation consumes a large amount of computing resources, especially for edge devices, the operation and storage resources are limited, and a large amount of floating-point multiply-add operation and a large amount of floating-point numbers cannot be borne and stored generally. 8-bit quantization is a common solution at present, but can only support maximum 4 times of compression, and although integer arithmetic is adopted to replace floating-point arithmetic, the consumption of arithmetic resources is still large. Therefore, the above-mentioned method of initializing the 1-bit weight or the method of initializing the 2-bit weight performs low-bit quantization (1&2 bits) on the weight of the full-precision neural network model, and is far beyond the 8-bit quantization model in terms of both storage space and operation efficiency, so that the method is very suitable for operating in edge devices, and can reduce the power consumption of the devices. Compared with the existing initialization method, the adoption of the 1-bit weight initialization or the 2-bit weight initialization model can enable the neural network model to be more easily converged. As mentioned above, when training the BNN network, it is desirable to limit the final weights to 1 and-1 and convert the multiplication operation into an exclusive or operation between bits to reduce the memory access rate and the occupancy rate, and the solution of the present application can avoid the disappearance of the model gradient and accelerate convergence by directly setting the initial weights to { -1, 1} or { -1, 0, 1 }.
It will be appreciated that in some embodiments, in addition to 1-bit or 2-bit quantization of the initial weights of the neural network model, the inputs to the neural network model (e.g., image data of sample image a) may be quantized, and thus the matrix multiplication between the original weights and the inputs can be equivalently replaced by an XNOR operation, which can better speed up convergence.
It can be understood that the above-mentioned 1-bit quantized discrete value WbThe value range of { -1, 1} is exemplary only and not limiting. In some embodiments, the 1-bit quantized discretized value of WbFor other integer discrete values, for example, taking { -2, 2} or {1, 100}, etc., to satisfy the condition that the mean is 0 and the variance is 1, when calculating each network layer in the neural network model, both { -2, 2} and {1, 100} are converted into { -1, 1 }. After the calculation of each network layer in the neural network model is completed, the weights of each network layer in the model are reduced according to a preset proportion, for example, 0.8 is reduced to 90, and 0.5 is reduced to 2.
It can be understood that the above-mentioned 2-bit quantized discrete value WbThe value range of { -1, 0, 1} is also exemplary only and not limiting, and in some embodiments, the 2-bit quantized discretized value of WbIt can be a discrete value of other integer, such as { -2, 0, 2} or {0, 50, 100}, etc. To satisfy the conditions of mean 0 and variance 2/3, both-2, 0, 2 and 0, 50, 100 are converted to-1, 0, 1 when computing the network layers in the neural network model. After the calculation of each network layer in the neural network model is completed, the weight of each network layer in the model is reduced according to a preset proportion, for example, 1 is reduced to 100.
c) Input data of each hidden layer and corresponding initial weight
Figure BDA0002382198810000121
For example, the image data of the sample image A is divided into s data blocks, the s data blocks are input as s input data to s nodes of the input layer in FIG. 2, and the weighting operation is performed with the weight of the first hidden layer (for example, the weighting operation of the first node of the first hidden layer is: x1 × w11+ x2 × w12+ x3 × w13+ … + xs × w1s + b). d) for activationAnd the function carries out activation operation on the result of the weighting operation.
For example, in some embodiments, the activation function may be a Sigmoid function, a Tanh function, or the like. Specifically, the feature data of the sample image a output by each node of the input layer and the initial weights of each node (h nodes shown in fig. 2) of the first hidden layer
Figure BDA0002382198810000122
A weighting operation is performed to generate the outputs of the nodes of the first hidden layer (i.e. the inputs of the second hidden layer) by the activation functions (e.g. Sigmoid function, Tanh function) of the first hidden layer. Input to the second hidden layer and initial weights of the respective nodes (u nodes shown in FIG. 2) of the second hidden layer
Figure BDA0002382198810000123
After the weighting operation is carried out, the output of each node of the second hidden layer (namely the input of the third hidden layer) is generated through the activation function of the second hidden layer, and the input of the n-2 hidden layer (n-2 hidden layers in the figure 2), the input of the v nodes of the n-2 hidden layer (the output of the n-3 hidden layer) and the corresponding initial weight are sequentially calculated in the same way
Figure BDA0002382198810000124
After the weighting operation is carried out, the output (namely the input of the output layer) of each node of the n-2 th hidden layer is generated through the activation function of each node of the n-2 th hidden layer.
It will be appreciated that in some embodiments, the initial weights of the neural network model are used as the initial weights
Figure BDA0002382198810000125
After the input and the output are quantized according to the 1-bit or 2-bit quantization method, the corresponding sign function or equivalent function can only be used as an activation function, namely, the input is transformed into { -1, 1} or { -1, 0, 1}, so as to ensure that XNOR (exclusive OR) and popcount bit calculation are used, and the calculation resources and the storage resources are saved.
3. Computation of output layers
Also the computation of the output layer is similar to the computation of the hidden layer described above. Specifically, the input of each node of the output layer (i.e., the output of each node of the n-2 th hidden layer) and the initial weight of each node of the output layer
Figure BDA0002382198810000126
After weighting, a final output of a training process of the neural network model by using the sample image a is generated through an activation function (such as Relu, Tanh, Sigmoid and the like) of each node of the output layer. Wherein the initial weight of the output layer
Figure BDA0002382198810000127
Or may be obtained by using the above-mentioned low-bit-weight initialization scheme (using 1-bit-weight initialization or 2-bit-weight initialization model). For detailed description, please refer to the above, which is not repeated herein.
4. Adjusting weights
After outputting the face recognition result value output by the layer each time, the face recognition result data is compared with the expected result data of the image data corresponding to the input sample image, and the error is calculated. From the error, a partial derivative is obtained, and the weight of each node in the network layer other than the input layer is adjusted based on the obtained partial derivative. The weights are continually adjusted until the final error reaches an error threshold.
The output of the model (i.e., the training result of the model trained using the sample image a) is then compared with the actual image characteristics of the sample image a to determine an error (i.e., the difference between the two), a partial derivative is determined for the error, and the weights are updated based on the partial derivative. Other sample image data may be input subsequently to train the model, so that in training of a large amount of sample image data, by continuously adjusting the weight, when the output error reaches a small value (for example, a predetermined error threshold is met), the neural network model 200 is considered to be converged, and the model training is completed.
After each training of input sample image data, the weights of the network layers of the neural network model 200 are adjusted, and the adjusted weights may be directly used as initial weights for next training of input sample image data, or may be scaled by a scaling factor and used as initial weights for next training of input sample image data.
For example, in some embodiments, after the neural network model is trained via sample image a, the weights of the network layers determined after the neural network model is trained via sample image a may be used as the initial weights of the neural network model at the next training (e.g., training the neural network model trained via sample image a using sample image B).
In other embodiments, in order to ensure that the input and output distributions of each layer of the neural network model are substantially consistent to alleviate the problem of gradient disappearance during the process of transferring to a deeper layer, after the neural network model is trained through the sample image a, the product of the weight of each network layer determined after the neural network model is trained through the sample image a and the scaling factor may be used as the initial weight of the neural network model during the next training (for example, the neural network model trained through the sample image a is trained through the sample image B).
For example, in some embodiments, the plurality of initial weights for the ith network layer of the n network layers of the neural network model
Figure BDA0002382198810000131
Can be calculated by the following formula:
Figure BDA0002382198810000132
wherein, WtW is any one of a plurality of weights of the ith network layer determined after training for sample image AtHas a value range of-1 to Wtα is a scaling factor and is a positive number less than 1 for adjusting the distribution of output data for the ith network layer, wherein, in some embodiments, the scaling factor α may be calculated as follows:
Figure BDA0002382198810000133
wherein liRepresenting the number of input channels, l, of the ith network layer of the n network layers of the neural network modeli+1The number of input channels of the (i + 1) th network layer of the neural network model is represented. It will be appreciated that for the input layer, i is 1 here; i +1 denotes the next network layer to the ith network layer in the neural network model.
In other embodiments, the scaling factor α may also be calculated according to the following formula:
Figure BDA0002382198810000134
where p is the number of weights of the ith network layer determined after training for sample image A, Wj tRepresenting the jth weight of the p weights determined after training for sample image a,
Figure BDA0002382198810000135
is the average of the p weights determined after training for sample image a.
In addition, for a detailed process of training the neural network model trained by the sample image a by using the sample image B, please refer to the above description of the process of training the neural network model by using the sample image a, which is not described herein again. In addition, it can be understood that, in the present application, in order to ensure that the distribution of the input and the output of each layer of the neural network model is kept substantially consistent, so as to alleviate the problem of gradient disappearance in the process of transferring to a deeper layer, a scaling factor is provided, wherein the calculation method of the scaling factor is not limited to the above formula, and other calculation methods may also be adopted, which are not limited herein.
As described above, the weights generated by the weight initialization method of the present application can avoid the problem that the gradient of the neural network model easily disappears in the convergence process, and can enable the neural network model to converge quickly. Fig. 4 and 5 respectively show the convergence of the model when the initial weight is obtained by the gaussian random initialization method of the prior art and the initial weight is obtained by the initialization method of the present application. It can be seen that the initial weight distribution of the model with the initial weight of the discrete value obtained by the initialization method of the application is almost the same as the weight distribution after convergence, and the convergence rate and the stability of the model are better.
It is understood that the above description of the technical solution for training the neural network model 200 shown in fig. 2 is only exemplary and not limiting, and in other embodiments, the weight initialization method of the present application may also be used for speech recognition and the like.
Specifically, as shown in fig. 4, when the neural network model uses the Sign function as the activation function, when the number of layers of the neural network model increases, since the neural network model generates a gradient only when its weight is between-k and k (for example, between-1 and 1), it is found that the output value of the activation function of the layer further behind is almost close to 0, easily causing the model gradient to disappear.
As shown in fig. 5(a) and 5(b), a weight distribution graph of a convolutional neural network model of the present application with 1-bit weight initialization is shown, according to some embodiments of the present application. FIG. 5(b) illustrates a weight distribution graph for a certain period of time during model convergence after a convolutional neural network model is initialized with 1-bit weights, according to some embodiments of the present application. FIG. 5(c) is a graph illustrating a convolutional neural network model trained after initialization with 1-bit weights, the model having a converged weight distribution, according to some embodiments of the present application. Wherein the horizontal axis is the value of the weight and the vertical axis is the sampling number of the weight.
In the illustrated embodiment, the convolution kernel of the trained convolutional neural network model is 3x3x64x 128. Referring to fig. 5(a), the number of weights of-1 and the number of weights of 1 in the weight at the time of initialization are equal (the number of weights of-1 is 35000, and the number of weights of 1 is also 35000). Referring to fig. 5(b), it can be seen that only a very small number of weights take values between-0.004 and 0.004 during a certain period of model convergence, the weights tend to take discrete values of-0.004 and 0.004, and it should be noted that the weights participating in training are compressed weights. For a specific compression method, please refer to the above, which is not described herein again. Referring to fig. 5(c), the weight distribution of the converged model is substantially the same as the distribution of the initial weights, but the number of-1 s is greater than the number of 1s (the number of s is less different). In other embodiments, the number of-1 may be less than the number of 1. Therefore, after the 1-bit weight initialization method is adopted to initialize the weights of the target neural network model, the model is trained, the initial weight distribution and the converged weight distribution of the model are almost the same, and the convergence speed and the stability of the model are good.
It is understood that in some embodiments, the convolutional neural network model is trained after initialization with 2-bit weights (initial weights are-1, 0, and 1), and after the model converges, the weights of the network layers are also-1, 0, and 1.
Fig. 6(a) and fig. 6(b) respectively show model convergence conditions of training an existing model through an Xavier initialization function and the weight initialization method provided by the present application in the prior art, and it can be seen that the training result of training a 1-bit fixed-point quantization model by the weight initialization method provided by the present application is high in precision, the model is more stable, and the convergence is easier.
Specifically, as shown in fig. 6(a), the weight of the ResNet-32 (depth residual network, ResNet) model is initialized by the Xavier initialization function, and the ResNet-32 model with 1-bit fixed point quantization is trained on the cfar 10 data set, where the cfar-10 is composed of 60000 RGB color images of 32 × 32, and 10 classes (airplane, car, bird, cat, deer, dog, frog, horse, boat, truck) are provided, referring to fig. 6(a), where the abscissa is the number of training steps and the ordinate is the precision, it can be seen that the accuracy of the ResNet-32 model is about 82% at most, and the model precision swing amplitude is large, i.e., the noise is large, and the model is unstable (150Epoch stops after stopping, and the precision cannot be improved).
Referring to fig. 6(b), where the abscissa is the number of training steps and the ordinate is the accuracy, it can be seen that, when the ResNet-32 model is initialized by using the weight initialization method provided in the embodiment of the present application, the accuracy of the ResNet-32 model is stable at about 98% with increasing number of training steps, and the noise is small, compared with the result of training the 1-bit fixed-point quantization model by using the Xavier initialization function in fig. 6(a), the training result of training the 1-bit fixed-point quantization model by using the weight initialization method provided in the present application is high, the model is more stable, and the convergence is easier (300Epoch), it can be understood that in the embodiment of fig. 6(a) and fig. 6(b), the training result of the 1-bit fixed-point quantization model by using the weight initialization method provided in the present application is high, and the model is more stable, and the model is applicable to the neural network model after the initialization method of the ResNet-32 (e.g., the ResNet-32 model is initialized by using the weight initialization method provided in the embodiment of the present application, and the present application, the ResNet-32 is applicable to the neural network initialization method of initializing the ResNet-network model, such as the example, and the algorithm 3632, the algorithm of the example, the algorithm of the present application, and the example, the algorithm of the example, the algorithm of the present application, and the example, the algorithm of the present application, and the algorithm of the algorithm.
For example, when the method for low-ratio-privilege reinitializing provided by the embodiment of the present application is used for image recognition, the acquired image information to be learned is subjected to necessary preprocessing (such as sampling, analog-to-digital conversion, feature extraction, and the like) to form data to be subjected to neural network model operation, the data to be trained is input into the neural network model for training, and the method for low-ratio-privilege reinitializing provided by the embodiment of the present application is applied to the model during training, so that the model convergence efficiency can be improved and the stability is high under the condition of ensuring that the accuracy of operation meets the requirement.
The following describes in detail a technical solution for performing weight initialization on a trained neural network model by using the terminal device 100 shown in fig. 1 according to some embodiments of the present application.
1. Input layer computation
The image data of the sample image C is input to the input layer as input data.
2. Computation of hidden layers
a) The input layer outputs the image data of the sample image C to the first hidden layer. For example, the input image data may be color information (e.g., numbers between 0 and 255 of an RGB color space) of each pixel point in the image, and the image data is input into s nodes (i.e., inputs x1, x2 to xs) of the input layer (first layer) of the model shown in fig. 2.
b) And initializing the weight of each hidden layer to obtain an initial weight.
In embodiments of the present application, the initialization model with low bit weight is used to obtain the initial weight of each hidden layer, for example, the following 1-bit weight initialization or 2-bit weight initialization model is used to initialize the weight of a trained neural network model (for example, a converged 8-bit model). The weight value range in 1-bit weight initialization is (1, -1), and the weight value range in 2-bit weight initialization is (1, 0, -1), that is, when the weight is initialized by using the 1-bit weight initialization model, the initial weight of a certain node in the hidden layer is set to one of two values of 1 and-1, and all weights of the hidden layer need to satisfy the distribution that the mean value is 0 and the variance is 1. When the 2-bit weight initialization model is used for initializing the weights, the initial weight value of a certain node in the hidden layer is set to be one of three values of-1, 0 and 1, and all the weights of the hidden layer need to satisfy the distribution that the mean value is 0 and the variance is 2/3.
Specifically, for example, in some embodiments, assuming that the trained model has converged, the following 1-bit weight initialization method may be used to initialize the full-precision weights of the neural network model:
that is, the full-precision weight of the trained neural network model can be valued according to the sign of the full-precision numerical value thereof through a sign function sign (w), that is: when the full-precision weight is a value greater than 0 (e.g., 0.23), it is rotatedBy changing to 1, when the full-precision weight is a value less than 0 (e.g., -0.15) or equal to 0, it is converted to-1, and the initial weight after quantization is thus changed to
Figure BDA0002382198810000151
Taking one of discrete values 1 or-1 and initial weight
Figure BDA0002382198810000152
The distribution of (c) still satisfies a mean of 0 and a variance of 1.
In addition, in some embodiments, to ensure that the input and output distributions of each layer of the neural network model remain substantially consistent to mitigate the loss of gradient in the process of propagating to deeper layers, the discrete values of W selected from 1 and-1 for a node in the quantization process may be scaled by a predetermined scaling factor αzCompressing to obtain initial weight of neural network model
Figure BDA0002382198810000161
Wherein the preset scaling factor is obtained by a normalization method. For example, in some embodiments, quantized discretized values of W may be combinedzThe compression is as follows:
Figure BDA0002382198810000162
where α is a scaling factor, usually α is a positive decimal smaller than 1, and is used to adjust the distribution of the output data of the ith network layer.
In some embodiments, the scaling factor α may be calculated as follows:
Figure BDA0002382198810000163
wherein liIs the input channel number of the ith network layer, li+1The number of input channels of the (i + 1) th network layer. It will be appreciated that for the input layer, i is 1 here; i +1 represents the lower part of the ith network layer in the neural network modelA network layer.
In addition, in some other embodiments, the scaling factor α may be calculated according to the following formula:
Figure BDA0002382198810000164
wherein α is a scaling factor, p is the number of initial weights of the ith network layer, and Wj zIs one of-1 and 1, and corresponds to the jth initial weight of the p initial weights, and p W initial weights of the p initial weightsj zHas a mean value of 0 and a variance of 1;
Figure BDA0002382198810000165
is p Wj zAverage value of (d); liIs the input channel number of the ith network layer.
Thus, the scaling factor α for the weight of each layer in the neural network model 200 is calculated, and the discrete values of W corresponding to the network layer in the scaling factor α are setzAnd multiplying, namely performing weighting operation on the input data of each layer and the compressed weight in the training process of the neural network model, and transmitting the weighted data and the compressed weight forwards to ensure that the input and output of the neural network model are distributed consistently so as to relieve the problem of gradient disappearance in the process of transmitting the weighted data and the compressed weight to a deeper layer.
It is to be appreciated that the above method of calculating the scaling factor α is merely exemplary and not limiting, and in other embodiments, other normalization methods may be employed to calculate the scaling factor α.
In some embodiments, the full-precision weights of the trained neural network model may be initialized using a 2-bit weight initialization method as follows:
that is, the full-precision weight of the trained model can be valued according to the sign of the full-precision numerical value thereof through a sign function sign (w), that is: when the full-precision weight is a value greater than 0 (e.g., 0.31), it is converted to 1, when the full-precision weight is a value less than 0 (e.g., -0.17), it is converted to-1, and when the full-precision weight is 0,take it to 0, the initial weight after quantization
Figure BDA0002382198810000166
Taking one of discrete values-1, 0 or 1, and initial weight
Figure BDA0002382198810000167
The distribution of (c) still satisfies a mean of 0 and a variance of 2/3.
In order to ensure that the input and output distribution of each layer of the neural network model is kept basically consistent, the problem of gradient disappearance in the process of transferring to a deeper layer is relieved. The discrete value of W selected from-1, 0 or 1 for a node in the quantization process can be obtained based on a preset scaling factorzCompressing to obtain initial weight of neural network model
Figure BDA0002382198810000168
The preset scaling factor is obtained by a normalization method to scale the variance of the weights of the neural network model, so that the network can be propagated to a deeper layer. For example, in some embodiments, the scaling factor is calculated as follows:
Figure BDA0002382198810000169
wherein liIs the input channel number of the ith network layer, li+1The number of input channels of the (i + 1) th network layer.
For another example, in some other embodiments, the scaling factor α may also be calculated according to the following formula:
Figure BDA0002382198810000171
wherein α is a scaling factor, p represents the number of initial weights of the ith network layer, and Wj qIs one of-1, 0, 1, and corresponds to jth initial weight of the p initial weights, and p W of the p initial weightsj qHas a mean value of 0 and a variance of 2/3;
Figure BDA0002382198810000172
is p Wj qAverage value of (d); liIs the input channel number of the ith network layer.
Therefore, compared with the traditional full-precision floating point operation and the commonly used 8-bit model in the prior art, the full-precision neural network model is quantized by 1 bit or 2 bits, so that the size of the model can be greatly reduced, the operation resource is reduced, and the power consumption is reduced.
c) Input data of each hidden layer and corresponding initial weight
Figure BDA0002382198810000173
For example, the image data of the sample image C is divided into s data blocks, the s data blocks are input as s input data to the s nodes of the input layer in FIG. 2, and the weighting operation is performed with the weights of the first hidden layer (for example, the weighting operation of the first node of the first hidden layer is x1 × w11+ x2 × w12+ x3 × w13+ … + xs × w1s + b).
d) And performing activation operation on the result of the weighting operation by using an activation function.
For example, in some embodiments, the activation function may be a Sigmoid function, a Tanh function, or the like. Specifically, the feature data of the sample image C output by each node of the input layer and the initial weights of each node (h nodes shown in fig. 2) of the first hidden layer
Figure BDA0002382198810000174
A weighting operation is performed to generate the outputs of the nodes of the first hidden layer (i.e. the inputs of the second hidden layer) by the activation functions (e.g. Sigmoid function, Tanh function) of the first hidden layer. Input to the second hidden layer and initial weights of the respective nodes (u nodes shown in FIG. 2) of the second hidden layer
Figure BDA0002382198810000175
After the weighted operation is carried out, a second hidden layer is generated through an activation function of the second hidden layerThe output of each node (i.e. the input of the third hidden layer) of the (n-2) th hidden layer (n-2 hidden layers in fig. 2), the input of the (u) nodes of the (n-2) th hidden layer (the output of the (n-3) hidden layer) and the corresponding initial weight are calculated in the same way in turn
Figure BDA0002382198810000176
After the weighting operation is carried out, the output (namely the input of the output layer) of each node of the n-2 th hidden layer is generated through the activation function of each node of the n-2 th hidden layer.
It is understood that the hidden layer calculation is similar to the hidden layer calculation method described in the above-mentioned untrained neural network model 200 training scheme, and the difference is only that the weight initialization method samples the sign function sign (w) to take the full-precision weight of the trained model into value according to the sign of the full-precision value. For detailed description, please refer to the above, which is not repeated herein.
3. Computation of output layers
The calculation of the output layer is similar to the calculation method of the output layer described in the above-mentioned scheme of training the untrained neural network model 200, and the difference is only that the initialization method of the weight samples the sign function sign (w) to take the value of the full-precision weight of the trained model according to the sign of the full-precision numerical value. For detailed description, please refer to the above, which is not repeated herein.
4. The weights are adjusted, and the specific adjusting method is similar to the weight adjusting method described in the above-mentioned scheme for training the untrained neural network model 200, and for the detailed description, refer to the above, and are not repeated here.
Although the above embodiments exemplify face recognition of an image, the weight initialization model of the present application may be applied to any Neural Network model, such as a Convolutional Neural Network (CNN), a Deep Neural Network (DNN), a Recurrent Neural Network (RNN), and the like.
After each training of input sample image data, the weights of the network layers of the neural network model 200 are adjusted, and the adjusted weights may be directly used as initial weights for next training of input sample image data, or may be scaled by a scaling factor and used as initial weights for next training of input sample image data.
For example, in some embodiments, after the trained neural network model is trained through the sample image C, the weights of the network layers determined after the trained neural network model is trained through the sample image C may be used as initial weights of the neural network model in the next training (for example, the neural network model trained through the sample image C is trained through the sample image D).
In other embodiments, in order to ensure that the input and output distributions of each layer of the neural network model are substantially consistent to alleviate the problem of gradient disappearance during the process of transferring to a deeper layer, after the neural network model is trained through the sample image C, the product of the weight of each network layer determined after the neural network model is trained through the sample image C and the scaling factor may be used as the initial weight of the neural network model during the next training (for example, the neural network model trained through the sample image C is trained through the sample image D).
For example, in some embodiments, the plurality of initial weights for the ith network layer of the n network layers of the neural network model
Figure BDA0002382198810000181
Can be calculated by the following formula:
Figure BDA0002382198810000182
wherein, WrW is any one of a plurality of weights of the ith network layer determined after training for sample image CrHas a value range of-1 to WrA 1 ≦ α and a positive decimal fraction less than 1 α for adjusting the distribution of the output data for the ith network layerThe formula is calculated as follows:
Figure BDA0002382198810000183
wherein liRepresenting the number of input channels, l, of the ith network layer of the n network layers of the neural network modeli+1The number of input channels of the (i + 1) th network layer of the neural network model is represented. It will be appreciated that for the input layer, i is 1 here; i +1 denotes the next network layer to the ith network layer in the neural network model.
In other embodiments, the scaling factor α may also be calculated according to the following formula:
Figure BDA0002382198810000184
where p is the number of weights of the ith network layer determined after training for sample image C, Wj rRepresenting the jth weight of the p weights determined after training for sample image C,
Figure BDA0002382198810000185
is the average of the p weights determined after training for sample image C.
In addition, for a detailed process of training the neural network model trained by the sample image C by using the sample image D, please refer to the above description of the process of training the neural network model by using the sample image C, which is not described herein again.
In addition, it can be understood that, in the present application, in order to ensure that the distribution of the input and the output of each layer of the neural network model is kept substantially consistent, so as to alleviate the problem of gradient disappearance in the process of transferring to a deeper layer, a scaling factor is provided, wherein the calculation method of the scaling factor is not limited to the above formula, and other calculation methods may also be adopted, which are not limited herein.
FIG. 7 provides a block diagram of an electronic device 700 for training a neural network model, according to some embodiments of the present application. As shown in fig. 7, the electronic device 700 includes:
a first data obtaining module 702, configured to obtain sample data, and input the sample data to a second network layer, where the sample data includes initial input data and expected result data;
a first data processing module 704 configured to perform the following operations:
when i is 2, a plurality of initial weights based on the initial input data and the ith network layer
Figure BDA0002382198810000191
The output data of the ith network layer is obtained,
when i is more than 2 and less than or equal to n, the output data of the i-1 network layer and a plurality of initial weights of the i network layer are used
Figure BDA0002382198810000192
Obtaining output data of the ith network layer, wherein a plurality of initial weights of the ith network layer
Figure BDA0002382198810000193
Is obtained based on m discrete values, wherein a plurality of initial weights
Figure BDA0002382198810000194
Has a numerical value range of
Figure BDA0002382198810000195
And m ═ {2, 3 };
a first weight adjusting module 706, configured to adjust a plurality of initial weights of the ith network layer based on an error between output data of the n network layers and expected result data in the sample data.
It can be understood that the electronic device 700 for training the neural network model shown in fig. 7 corresponds to the training method of the neural network model provided in the present application, and the technical details in the above detailed description about the training method of the neural network model provided in the present application are still applicable to the electronic device 700 for training the neural network model shown in fig. 7, and the detailed description is referred to above and is not repeated herein.
FIG. 8 provides a block diagram of an electronic device 800 for training a neural network model, according to some embodiments of the present application. As shown in fig. 8, the electronic device 800 includes:
a second data obtaining module 802, configured to obtain sample data, and input the sample data to a second network layer, where the sample data includes initial input data and expected result data;
a second data processing module 804 for performing the following operations
When i is 2, performing symbol dereferencing on a plurality of full-precision weights of the ith network layer to obtain a plurality of initial weights of the ith network layer
Figure BDA0002382198810000196
And based on the initial input data and a plurality of initial weights
Figure BDA0002382198810000197
The output data of the ith network layer is obtained,
when i is more than 2 and less than or equal to n, carrying out symbol dereferencing on the multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure BDA0002382198810000198
And obtaining output data of the ith network layer based on the output data of the (i-1) th network layer and the plurality of initial weights, wherein,
multiple initial weights of ith network layer
Figure BDA0002382198810000199
Is obtained based on m discrete values and a plurality of initial weights
Figure BDA00023821988100001910
Has a numerical value range of
Figure BDA00023821988100001911
And m ═ {2, 3 }; second weight adjustment module 80And 6, adjusting the plurality of initial weights of the ith network layer based on the error between the output data of the n network layers and the expected result data in the sample data.
It can be understood that the electronic device 800 for training a neural network model shown in fig. 8 corresponds to the training method for a neural network model provided in the present application, and the technical details in the above detailed description about the training method for a neural network model provided in the present application are still applicable to the electronic device 800 for training a neural network model shown in fig. 8, and the detailed description is referred to above and is not repeated herein.
Fig. 9 shows a schematic structural diagram of an electronic device 900 according to an embodiment of the present application. The electronic device 900 is also capable of performing the training of the neural network model disclosed in the above-described embodiments of the present application. In fig. 9, like parts have the same reference numerals. As shown in fig. 9, the electronic device 900 may include a processor 910, a power module 940, a memory 980, a mobile communication module 930, a wireless communication module 920, a sensor module 990, an audio module 950, a camera 970, an interface module 960, buttons 901, a display 902, and the like.
It is to be understood that the illustrated architecture of the present invention is not to be construed as a specific limitation for the electronic device 900. In other embodiments of the present application, electronic device 900 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.
The processor 910 may include one or more Processing units, for example, Processing modules or Processing circuits that may include a central Processing Unit (cpu), (central Processing Unit), an image Processing Unit (gpu), (graphics Processing Unit), a Digital Signal Processor (DSP), a Micro-programmed Control Unit (MCU), an Artificial Intelligence (AI) processor, or a Programmable logic device (fpga), (field Programmable gate array), etc. The different processing units may be separate devices or may be integrated into one or more processors. A memory unit may be provided in the processor 910 for storing instructions and data. In some embodiments, the storage unit in the processor 910 is a cache 980. The memory 980 mainly includes a storage program area 9801 and a storage data area 9802, wherein the storage program area 9801 can store an operating system and application programs required for at least one function (such as functions of sound playing, image recognition, and the like). The neural network model provided in the embodiment of the present application can be regarded as an application program that can implement functions such as image processing and voice processing in the storage program area 9801. The weight of each network layer of the neural network model is stored in the above-described stored data area 9802.
The power module 940 may include a power supply, power management components, and the like. The power source may be a battery. The power management component is used for managing the charging of the power supply and the power supply of the power supply to other modules. In some embodiments, the power management component includes a charge management module and a power management module. The charging management module is used for receiving charging input from the charger; the power management module is used for connecting a power supply, and the charging management module is connected with the processor 910. The power management module receives power and/or charge management module input and provides power to the processor 910, the display 902, the camera 970, and the wireless communication module 920.
The mobile communication module 930 may include, but is not limited to, AN antenna, a power amplifier, a filter, a low noise amplifier (L low noise amplifier, L NA), etc. the mobile communication module 930 may provide a solution for wireless communication including 2G/3G/4G/5G, etc. applied to the electronic device 900, the mobile communication module 930 may receive electromagnetic waves from the antenna, filter, amplify, etc. the received electromagnetic waves, and transmit to the modem processor for demodulation, the mobile communication module 930 may further amplify signals modulated by the modem processor, and convert the signals into electromagnetic waves radiated by the antenna, in some embodiments, at least a part of the functional modules of the mobile communication module 930 may be disposed in the processor 910, in some embodiments, at least a part of the functional modules of the mobile communication module 930 may be disposed in the same device as at least a part of the processor 910, wireless communication technologies may include a global system for mobile communication (GSM), a general packet radio service (radio system for mobile communication), a wireless satellite system (GPRS) and satellite (radio system), a global navigation system (GPS) radio system, satellite navigation system, GPS-satellite system (GPS) and satellite communication system (GNSS), etc., GPS-radio system, GPS-satellite communication system (GPS-satellite communication system) may include, GPS-radio system, GPS-satellite communication system, GPS-radio system (CDMA-satellite communication system) and satellite communication system (CDMA) including CDMA-satellite communication system, CDMA-radio system, CDMA system for wireless system for.
The wireless communication module 920 may include AN antenna and may transmit and receive electromagnetic waves via the antenna, the wireless communication module 920 may provide solutions for wireless communication applied to the electronic device 900, including a wireless local area network (W L AN) (e.g., a wireless fidelity (Wi-Fi) network), Bluetooth (BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like, the electronic device 900 may communicate with a network and other devices via wireless communication technologies.
In some embodiments, the mobile communication module 930 and the wireless communication module 920 of the electronic device 900 may also be located in the same module.
The display panel may be a liquid crystal display (L CD), an organic light-emitting diode (O L ED), an active matrix organic light-emitting diode (AMO L ED), a flexible light-emitting diode (F L ED), a miniature, Micro L ED, a Micro-oled, a quantum dot light-emitting diode (Q L ED), or the like.
The sensor module 990 may include a proximity light sensor, a pressure sensor, a gyroscope sensor, an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, and the like.
The audio module 950 is used to convert digital audio information into an analog audio signal for output, or convert an analog audio input into a digital audio signal. The audio module 950 may also be used to encode and decode audio signals. In some embodiments, the audio module 950 may be disposed in the processor 910, or some functional modules of the audio module 950 may be disposed in the processor 910. In some embodiments, audio module 950 may include speakers, an earpiece, a microphone, and a headphone interface.
The camera 970 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element converts the optical signal into an electrical signal, and then transmits the electrical signal to an ISP (Image signal processing) to be converted into a digital Image signal. The electronic device 900 may implement a shooting function through an ISP, a camera 970, a video codec, a GPU (graphics Processing Unit), a display 902, an application processor, and the like.
The interface module 960 includes an external memory interface, a Universal Serial Bus (USB) interface, a Subscriber Identity Module (SIM) card interface, and the like. The external memory interface may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the electronic device 900. The external memory card communicates with the processor 910 through an external memory interface to implement a data storage function. The universal serial bus interface is used for communication between the electronic device 900 and other electronic devices. The SIM card interface is used to communicate with a SIM card installed to the electronic device 900, such as to read a phone number stored in the SIM card or to write a phone number into the SIM card.
In some embodiments, the electronic device 900 also includes keys 901, a motor, and indicators, among other things, where the keys 901 may include a volume key, an on/off key, etc. the motor is used to cause a vibration effect to the electronic device 900, such as when the user's electronic device 900 is being called, to alert the user to answer an incoming call to the electronic device 900. the indicators may include a laser indicator, a radio frequency indicator, an L ED indicator, etc.
Embodiments of the mechanisms disclosed herein may be implemented in hardware, software, firmware, or a combination of these implementations. Embodiments of the application may be implemented as computer programs or program code executing on programmable systems comprising at least one processor, a storage system (including volatile and non-volatile memory and/or storage elements), at least one input device, and at least one output device.
Program code may be applied to input instructions to perform the functions described herein and generate output information. The output information may be applied to one or more output devices in a known manner. For purposes of this application, a processing system includes any system having a processor such as, for example, a Digital Signal Processor (DSP), a microcontroller, an Application Specific Integrated Circuit (ASIC), or a microprocessor.
The program code may be implemented in a high level procedural or object oriented programming language to communicate with a processing system. The program code can also be implemented in assembly or machine language, if desired. Indeed, the mechanisms described in this application are not limited in scope to any particular programming language. In any case, the language may be a compiled or interpreted language.
In some cases, the disclosed embodiments may be implemented in hardware, firmware, software, or any combination thereof. The disclosed embodiments may also be implemented as instructions carried by or stored on one or more transitory or non-transitory machine-readable (e.g., computer-readable) storage media, which may be read and executed by one or more processors. For example, the instructions may be distributed via a network or via other computer readable media. Thus, a machine-readable medium may include any mechanism for storing or transmitting information in a form readable by a machine (e.g., a computer), including, but not limited to, floppy diskettes, optical disks, read-only memories (CD-ROMs), magneto-optical disks, read-only memories (ROMs), Random Access Memories (RAMs), erasable programmable read-only memories (EPROMs), electrically erasable programmable read-only memories (EEPROMs), magnetic or optical cards, flash memory, or a tangible machine-readable memory for transmitting information (e.g., carrier waves, infrared digital signals, etc.) using the internet in an electrical, optical, acoustical or other form of propagated signal. Thus, a machine-readable medium includes any type of machine-readable medium suitable for storing or transmitting electronic instructions or information in a form readable by a machine (e.g., a computer).
In the drawings, some features of the structures or methods may be shown in a particular arrangement and/or order. However, it is to be understood that such specific arrangement and/or ordering may not be required. Rather, in some embodiments, the features may be arranged in a manner and/or order different from that shown in the illustrative figures. In addition, the inclusion of a structural or methodical feature in a particular figure is not meant to imply that such feature is required in all embodiments, and in some embodiments, may not be included or may be combined with other features.
It should be noted that, in the embodiments of the apparatuses in the present application, each unit/module is a logical unit/module, and physically, one logical unit/module may be one physical unit/module, or may be a part of one physical unit/module, and may also be implemented by a combination of multiple physical units/modules, where the physical implementation manner of the logical unit/module itself is not the most important, and the combination of the functions implemented by the logical unit/module is the key to solve the technical problem provided by the present application. Furthermore, in order to highlight the innovative part of the present application, the above-mentioned device embodiments of the present application do not introduce units/modules which are not so closely related to solve the technical problems presented in the present application, which does not indicate that no other units/modules exist in the above-mentioned device embodiments.
It is noted that, in the examples and descriptions of this patent, relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the use of the verb "comprise a" to define an element does not exclude the presence of another, same element in a process, method, article, or apparatus that comprises the element.
While the present application has been shown and described with reference to certain preferred embodiments thereof, it will be understood by those of ordinary skill in the art that various changes in form and details may be made therein without departing from the spirit and scope of the present application.

Claims (24)

1.一种神经网络模型的训练方法,其特征在于,所述神经网络模型包括n个网络层,n为大于1的正整数;并且1. A training method for a neural network model, wherein the neural network model comprises n network layers, and n is a positive integer greater than 1; and 所述方法包括:The method includes: 所述n个网络层中的第一个网络层获取样本数据,并将所述样本数据输入到第二个网络层,其中,所述样本数据包括初始输入数据和期望结果数据;The first network layer of the n network layers obtains sample data, and inputs the sample data to the second network layer, wherein the sample data includes initial input data and expected result data; 对于所述n个网络层中的第i个网络层,执行如下操作:For the ith network layer of the n network layers, do the following: 当i=2时,基于所述初始输入数据和第i个网络层的多个初始权重
Figure FDA0002382198800000011
得到第i个网络层的输出数据,
When i=2, based on the initial input data and multiple initial weights of the i-th network layer
Figure FDA0002382198800000011
Get the output data of the i-th network layer,
当2<i≤n时,基于第i-1个网络层的输出数据和第i个网络层的多个初始权重
Figure FDA0002382198800000012
得到第i个网络层的输出数据,其中,
When 2<i≤n, based on the output data of the ith network layer and multiple initial weights of the ith network layer
Figure FDA0002382198800000012
Get the output data of the i-th network layer, where,
所述第i个网络层的所述多个初始权重
Figure FDA0002382198800000013
是基于m个离散值得到的,其中,所述多个初始权重
Figure FDA0002382198800000014
的数值范围为
Figure FDA0002382198800000015
且m={2,3};
the plurality of initial weights of the i-th network layer
Figure FDA0002382198800000013
is obtained based on m discrete values, wherein the multiple initial weights
Figure FDA0002382198800000014
The range of values is
Figure FDA0002382198800000015
and m={2,3};
基于所述n个网络层的输出数据和所述样本数据中的期望结果数据之间的误差,对所述第i个网络层的所述多个初始权重
Figure FDA0002382198800000016
进行调节。
the plurality of initial weights for the i-th network layer based on the error between the output data of the n network layers and the expected result data in the sample data
Figure FDA0002382198800000016
Make adjustments.
2.如权利要求1所述的方法,其特征在于,所述第i个网络层的所述多个初始权重
Figure FDA0002382198800000017
中的每一个为m个离散值中的一个。
2. The method of claim 1, wherein the plurality of initial weights of the i-th network layer
Figure FDA0002382198800000017
Each of is one of m discrete values.
3.如权利要求2所述的方法,其特征在于,所述m个离散值为-1和1,并且所述第i个网络层的所述多个初始权重
Figure FDA0002382198800000018
的均值为0,方差为1。
3. The method of claim 2, wherein the m discrete values are -1 and 1, and the plurality of initial weights of the ith network layer
Figure FDA0002382198800000018
The mean is 0 and the variance is 1.
4.如权利要求2所述的方法,其特征在于,所述m个离散值为-1、0和1,并且所述第i个网络层的所述多个初始权重
Figure FDA0002382198800000019
的均值为0,方差为2/3。
4. The method of claim 2, wherein the m discrete values are -1, 0, and 1, and the plurality of initial weights of the ith network layer
Figure FDA0002382198800000019
The mean is 0 and the variance is 2/3.
5.如权利要求1所述的方法,其特征在于,所述第i个网络层具有p个初始权重
Figure FDA00023821988000000110
并且所述第i个网络层的所述p个初始权重
Figure FDA00023821988000000111
通过以下公式计算:
5. The method of claim 1, wherein the i-th network layer has p initial weights
Figure FDA00023821988000000110
and the p initial weights of the i-th network layer
Figure FDA00023821988000000111
Calculated by the following formula:
Figure FDA00023821988000000112
Figure FDA00023821988000000112
其中,Wb为所述m个离散值中的一个,所述Wb的数值范围为-1≤Wb≤1,并且对应所述p个初始权重
Figure FDA00023821988000000113
的p个Wb的均值为0,方差为1或2/3;α为缩放因子并且是小于1的正数,用于调整所述第i个网络层的输出数据的分布。
Wherein, W b is one of the m discrete values, the value range of W b is -1≤W b ≤1, and corresponds to the p initial weights
Figure FDA00023821988000000113
The mean of the p W b is 0, and the variance is 1 or 2/3; α is a scaling factor and is a positive number less than 1, which is used to adjust the distribution of the output data of the ith network layer.
6.如权利要求5所述的方法,其特征在于,对应所述p个初始权重
Figure FDA00023821988000000114
的p个Wb的方差为1,并且所述m个离散值为-1和1。
6. The method according to claim 5, wherein, corresponding to the p initial weights
Figure FDA00023821988000000114
The variance of the p W b is 1, and the m discrete values are -1 and 1.
7.如权利要求5所述的方法,其特征在于,对应所述p个初始权重
Figure FDA00023821988000000115
的p个Wb的方差为2/3,所述m个离散值为-1、0和1。
7. The method according to claim 5, wherein, corresponding to the p initial weights
Figure FDA00023821988000000115
The variance of the p W b is 2/3, and the m discrete values are -1, 0 and 1.
8.如权利要求5至7中任一项所述的方法,其特征在于,所述缩放因子通过下列公式获得:8. The method of any one of claims 5 to 7, wherein the scaling factor is obtained by the following formula:
Figure FDA00023821988000000116
Figure FDA00023821988000000116
其中,Wj b为所述p个Wb中对应所述p个初始权重
Figure FDA0002382198800000021
中的第j个初始权重的离散值,
Figure FDA0002382198800000022
为所述第i个网络层的p个Wb的平均值,li为所述第i个网络层的输入通道数。
Wherein, W j b is the p initial weights corresponding to the p W b
Figure FDA0002382198800000021
The discrete value of the j-th initial weight in ,
Figure FDA0002382198800000022
is the average value of p W b of the ith network layer, and l i is the number of input channels of the ith network layer.
9.如权利要求1所述的方法,其特征在于,所述第i个网络层的所述多个初始权重
Figure FDA0002382198800000023
通过以下公式计算:
9. The method of claim 1, wherein the plurality of initial weights of the i-th network layer
Figure FDA0002382198800000023
Calculated by the following formula:
Figure FDA0002382198800000024
Figure FDA0002382198800000024
其中,Wt为所述第i个网络层前一次训练所确定的多个权重中的任意一个,所述Wt的数值范围为-1≤Wt≤1,α为缩放因子并且是小于1的正数,用于调整所述第i个网络层的输出数据的分布。Wherein, W t is any one of the multiple weights determined by the previous training of the i-th network layer, the numerical range of the W t is -1≤W t ≤1, and α is the scaling factor and is less than 1 A positive number for adjusting the distribution of the output data of the i-th network layer.
10.如权利要求9所述的方法,其特征在于,所述缩放因子通过下列公式获得:10. The method of claim 9, wherein the scaling factor is obtained by the following formula:
Figure FDA0002382198800000025
Figure FDA0002382198800000025
其中,p为所述第i个网络层前一次训练所确定的权重的数量,
Figure FDA0002382198800000026
表示p个前一次训练所确定的权重中的第j个权重,
Figure FDA0002382198800000027
为所述p个前一次训练所确定的权重的平均值。
Among them, p is the number of weights determined by the previous training of the i-th network layer,
Figure FDA0002382198800000026
represents the jth weight among the weights determined by the p previous training,
Figure FDA0002382198800000027
The average of the weights determined for the p previous training sessions.
11.如权利要求5或9所述的方法,其特征在于,所述缩放因子通过下列公式获得:11. The method of claim 5 or 9, wherein the scaling factor is obtained by the following formula:
Figure FDA0002382198800000028
Figure FDA0002382198800000028
其中,li为所述第i个网络层的输入通道数,li+1为第i+1个网络层的输入通道数。Wherein, l i is the number of input channels of the i-th network layer, and l i+1 is the number of input channels of the i+1-th network layer.
12.一种神经网络模型的训练方法,其特征在于,所述神经网络模型包括n个网络层并且所述神经网络模型已经收敛,n为大于1的正整数;并且12. A method for training a neural network model, wherein the neural network model comprises n network layers and the neural network model has converged, and n is a positive integer greater than 1; and 所述方法包括:The method includes: 所述n个网络层中的第一个网络层获取样本数据,并将所述样本数据输入到第二个网络层,其中,所述样本数据包括初始输入数据和期望结果数据;The first network layer of the n network layers obtains sample data, and inputs the sample data to the second network layer, wherein the sample data includes initial input data and expected result data; 对于所述n个网络层中的第i个网络层,执行如下操作:For the ith network layer of the n network layers, do the following: 当i=2时,对所述第i个网络层的多个全精度权重进行符号取值得到该第i个网络层的多个初始权重
Figure FDA0002382198800000029
并基于所述初始输入数据和所述多个初始权重
Figure FDA00023821988000000210
得到第i个网络层的输出数据,
When i=2, perform sign value on multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure FDA0002382198800000029
and based on the initial input data and the plurality of initial weights
Figure FDA00023821988000000210
Get the output data of the i-th network layer,
当2<i≤n时,对所述第i个网络层的多个全精度权重进行符号取值得到该第i个网络层的多个初始权重
Figure FDA00023821988000000211
并基于第i-1个网络层的输出数据和所述多个初始权重
Figure FDA00023821988000000212
得到第i个网络层的输出数据,其中,
When 2<i≤n, perform sign value on multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure FDA00023821988000000211
and based on the output data of the i-1th network layer and the multiple initial weights
Figure FDA00023821988000000212
Get the output data of the i-th network layer, where,
所述第i个网络层的所述多个初始权重
Figure FDA00023821988000000213
是基于m个离散值得到的,并且所述多个初始权重
Figure FDA00023821988000000214
的数值范围为
Figure FDA00023821988000000215
且m={2,3};
the plurality of initial weights of the i-th network layer
Figure FDA00023821988000000213
is obtained based on m discrete values, and the multiple initial weights
Figure FDA00023821988000000214
The range of values is
Figure FDA00023821988000000215
and m={2,3};
基于所述n个网络层的输出数据和所述样本数据中的期望结果数据之间的误差,对所述第i个网络层的所述多个初始权重
Figure FDA00023821988000000216
进行调节。
the plurality of initial weights for the i-th network layer based on the error between the output data of the n network layers and the expected result data in the sample data
Figure FDA00023821988000000216
Make adjustments.
13.如权利要求12所述的方法,其特征在于,所述m个离散值为-1和1,所述第i个网络层的所述多个初始权重
Figure FDA00023821988000000217
的均值为0,方差为1;并且
13. The method of claim 12, wherein the m discrete values are -1 and 1, the plurality of initial weights of the i-th network layer
Figure FDA00023821988000000217
has a mean of 0 and a variance of 1; and
所述对所述第i个网络层的多个全精度权重进行符号取值得到该第i个网络层的多个初始权重包括:The multiple initial weights of the ith network layer obtained by performing symbol value on multiple full-precision weights of the ith network layer include: 如果所述全精度权重小于或者等于0,则将-1作为对应所述全精度权重的初始权重;If the full-precision weight is less than or equal to 0, use -1 as the initial weight corresponding to the full-precision weight; 如果所述全精度权重大于0,则将1作为对应所述全精度权重的初始权重。If the full-precision weight is greater than 0, use 1 as the initial weight corresponding to the full-precision weight.
14.如权利要求12所述的方法,其特征在于,所述m个离散值为-1、0和1,所述第i个网络层的所述多个初始权重
Figure FDA0002382198800000031
的均值为0,方差为2/3;并且
14. The method of claim 12, wherein the m discrete values are -1, 0, and 1, and the plurality of initial weights of the i-th network layer
Figure FDA0002382198800000031
has a mean of 0 and a variance of 2/3; and
所述对所述第i个网络层的多个全精度权重进行符号取值得到该第i个网络层的多个初始权重
Figure FDA0002382198800000032
包括:
The symbol value of multiple full-precision weights of the ith network layer is obtained to obtain multiple initial weights of the ith network layer
Figure FDA0002382198800000032
include:
如果所述全精度权重小于0,则将-1作为对应所述全精度权重的初始权重
Figure FDA0002382198800000033
If the full-precision weight is less than 0, use -1 as the initial weight corresponding to the full-precision weight
Figure FDA0002382198800000033
如果所述全精度权重等于0,则将0作为对应所述全精度权重的初始权重
Figure FDA0002382198800000034
If the full-precision weight is equal to 0, use 0 as the initial weight corresponding to the full-precision weight
Figure FDA0002382198800000034
如果所述全精度权重大于0,则将1作为对应所述全精度权重的初始权重
Figure FDA0002382198800000035
If the full-precision weight is greater than 0, use 1 as the initial weight corresponding to the full-precision weight
Figure FDA0002382198800000035
15.如权利要求12所述的方法,其特征在于,所述m个离散值为-1和1,并且,15. The method of claim 12, wherein the m discrete values are -1 and 1, and, 所述对所述第i个网络层的多个全精度权重进行符号取值得到该第i个网络层的多个初始权重
Figure FDA0002382198800000036
包括:
The symbol value of multiple full-precision weights of the ith network layer is obtained to obtain multiple initial weights of the ith network layer
Figure FDA0002382198800000036
include:
如果所述全精度权重小于或者等于0,则将-1与缩放因子的乘积作为对应所述全精度权重的初始权重
Figure FDA0002382198800000037
If the full-precision weight is less than or equal to 0, the product of -1 and the scaling factor is used as the initial weight corresponding to the full-precision weight
Figure FDA0002382198800000037
如果所述全精度权重大于0,则将1与缩放因子的乘积作为对应所述全精度权重的初始权重
Figure FDA0002382198800000038
If the full-precision weight is greater than 0, use the product of 1 and the scaling factor as the initial weight corresponding to the full-precision weight
Figure FDA0002382198800000038
其中,所述缩放因子为小于1的正数,用于调整所述第i个网络层的输出数据的分布。Wherein, the scaling factor is a positive number less than 1, and is used to adjust the distribution of the output data of the i-th network layer.
16.如权利要求12所述的方法,其特征在于,所述m个离散值为-1、0和1,并且,所述对所述第i个网络层的多个全精度权重进行符号取值得到该第i个网络层的多个初始权重
Figure FDA0002382198800000039
包括:
16. The method of claim 12, wherein the m discrete values are -1, 0, and 1, and the sign is performed on a plurality of full-precision weights of the i-th network layer. value to get multiple initial weights of the i-th network layer
Figure FDA0002382198800000039
include:
如果所述全精度权重小于0,则将-1与缩放因子的乘积作为对应所述全精度权重的初始权重
Figure FDA00023821988000000310
If the full-precision weight is less than 0, the product of -1 and the scaling factor is used as the initial weight corresponding to the full-precision weight
Figure FDA00023821988000000310
如果所述全精度权重等于0,则将0作为对应所述全精度权重的初始权重
Figure FDA00023821988000000311
If the full-precision weight is equal to 0, use 0 as the initial weight corresponding to the full-precision weight
Figure FDA00023821988000000311
如果所述全精度权重大于0,则将1与缩放因子的乘积作为对应所述全精度权重的初始权重
Figure FDA00023821988000000312
If the full-precision weight is greater than 0, use the product of 1 and the scaling factor as the initial weight corresponding to the full-precision weight
Figure FDA00023821988000000312
其中,所述缩放因子为小于1的正数,用于调整所述第i个网络层的输出数据的分布。Wherein, the scaling factor is a positive number less than 1, and is used to adjust the distribution of the output data of the i-th network layer.
17.如权利要求15或16所述的方法,其特征在于,所述缩放因子通过下列公式获得:17. The method of claim 15 or 16, wherein the scaling factor is obtained by the following formula:
Figure FDA00023821988000000313
Figure FDA00023821988000000313
其中,α为缩放因子,li为所述第i个网络层的输入通道数,li+1为所述第i+1个网络层的输入通道数。Wherein, α is a scaling factor, l i is the number of input channels of the i-th network layer, and l i+1 is the number of input channels of the i+1-th network layer.
18.如权利要求15所述的方法,其特征在于,所述缩放因子通过下列公式获得:18. The method of claim 15, wherein the scaling factor is obtained by the following formula:
Figure FDA00023821988000000314
Figure FDA00023821988000000314
其中,α为缩放因子;p表示所述第i个网络层的多个初始权重的数量;Wj z为-1和1中的一个,并对应p个初始权重中的第j个初始权重,并且对应所述p个初始权重的p个Wj z的均值为0,方差为1;
Figure FDA00023821988000000315
为所述p个Wj z的平均值;li为所述第i个网络层的输入通道数。
Among them, α is a scaling factor; p represents the number of multiple initial weights of the i-th network layer; W j z is one of -1 and 1, and corresponds to the j-th initial weight in the p initial weights, And the mean value of the p W j z corresponding to the p initial weights is 0, and the variance is 1;
Figure FDA00023821988000000315
is the average value of the p W j z ; li is the number of input channels of the ith network layer.
19.如权利要求16所述的方法,其特征在于,所述缩放因子通过下列公式获得:19. The method of claim 16, wherein the scaling factor is obtained by the following formula:
Figure FDA0002382198800000041
Figure FDA0002382198800000041
其中,α为缩放因子;p表示所述第i个网络层的多个初始权重的数量;Wj q为-1、0、1中的一个,并对应p个初始权重中的第j个初始权重,并且对应所述p个初始权重的p个Wj q的均值为0,方差为2/3;
Figure FDA0002382198800000042
为所述p个Wj q的平均值;li为所述第i个网络层的输入通道数。
Among them, α is the scaling factor; p represents the number of multiple initial weights of the i-th network layer; W j q is one of -1, 0, and 1, and corresponds to the j-th initial weight in the p initial weights weight, and the mean value of the p W j q corresponding to the p initial weights is 0, and the variance is 2/3;
Figure FDA0002382198800000042
is the average value of the p W j qs ; li is the number of input channels of the i -th network layer.
20.如权利要求12至19中任一项所述的方法,其特征在于,所述样本数据包括图像数据,所述神经网络模型用于图像识别。20. The method of any one of claims 12 to 19, wherein the sample data includes image data, and the neural network model is used for image recognition. 21.一种用于训练神经网络模型的电子设备,其特征在于,包括:21. An electronic device for training a neural network model, comprising: 第一数据获取模块,用于获取样本数据,并将所述样本数据输入到第二个网络层,其中,所述样本数据包括初始输入数据和期望结果数据;a first data acquisition module for acquiring sample data and inputting the sample data to the second network layer, wherein the sample data includes initial input data and expected result data; 第一数据处理模块,用于执行如下操作:The first data processing module is configured to perform the following operations: 当i=2时,基于所述初始输入数据和第i个网络层的多个初始权重
Figure FDA0002382198800000043
得到第i个网络层的输出数据,
When i=2, based on the initial input data and multiple initial weights of the i-th network layer
Figure FDA0002382198800000043
Get the output data of the i-th network layer,
当2<i≤n时,基于第i-1个网络层的输出数据和第i个网络层的多个初始权重
Figure FDA0002382198800000044
得到第i个网络层的输出数据,其中,
When 2<i≤n, based on the output data of the ith network layer and multiple initial weights of the ith network layer
Figure FDA0002382198800000044
Get the output data of the i-th network layer, where,
所述第i个网络层的所述多个初始权重
Figure FDA0002382198800000045
是基于m个离散值得到的,其中,所述多个初始权重
Figure FDA0002382198800000046
的数值范围为
Figure FDA0002382198800000047
且m={2,3};
the plurality of initial weights of the i-th network layer
Figure FDA0002382198800000045
is obtained based on m discrete values, wherein the multiple initial weights
Figure FDA0002382198800000046
The range of values is
Figure FDA0002382198800000047
and m={2,3};
第一权重调节模块,用于基于所述n个网络层的输出数据和所述样本数据中的期望结果数据之间的误差,对所述第i个网络层的所述多个初始权重进行调节。a first weight adjustment module, configured to adjust the multiple initial weights of the i-th network layer based on the error between the output data of the n network layers and the expected result data in the sample data .
22.一种用于训练神经网络模型的电子设备,其特征在于,包括:22. An electronic device for training a neural network model, comprising: 第二数据获取模块,用于获取样本数据,并将所述样本数据输入到第二个网络层,其中,所述样本数据包括初始输入数据和期望结果数据;a second data acquisition module, configured to acquire sample data and input the sample data to the second network layer, wherein the sample data includes initial input data and expected result data; 第二数据处理模块,用于执行如下操作The second data processing module is used to perform the following operations 当i=2时,对所述第i个网络层的多个全精度权重进行符号取值得到该第i个网络层的多个初始权重
Figure FDA0002382198800000048
并基于所述初始输入数据和所述多个初始权重
Figure FDA0002382198800000049
得到第i个网络层的输出数据,
When i=2, perform sign value on multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure FDA0002382198800000048
and based on the initial input data and the plurality of initial weights
Figure FDA0002382198800000049
Get the output data of the i-th network layer,
当2<i≤n时,对所述第i个网络层的多个全精度权重进行符号取值得到该第i个网络层的多个初始权重
Figure FDA00023821988000000410
并基于第i-1个网络层的输出数据和所述多个初始权重
Figure FDA00023821988000000411
得到第i个网络层的输出数据,其中,
When 2<i≤n, perform sign value on multiple full-precision weights of the ith network layer to obtain multiple initial weights of the ith network layer
Figure FDA00023821988000000410
and based on the output data of the i-1th network layer and the multiple initial weights
Figure FDA00023821988000000411
Get the output data of the i-th network layer, where,
所述第i个网络层的所述多个初始权重
Figure FDA00023821988000000412
是基于m个离散值得到的,并且所述多个初始权重
Figure FDA00023821988000000413
的数值范围为
Figure FDA00023821988000000414
且m={2,3};
the plurality of initial weights of the i-th network layer
Figure FDA00023821988000000412
is obtained based on m discrete values, and the multiple initial weights
Figure FDA00023821988000000413
The range of values is
Figure FDA00023821988000000414
and m={2,3};
第二权重调节模块,用于基于所述n个网络层的输出数据和所述样本数据中的期望结果数据之间的误差,对所述第i个网络层的所述多个初始权重
Figure FDA00023821988000000415
进行调节。
The second weight adjustment module is configured to, based on the error between the output data of the n network layers and the expected result data in the sample data, apply the plurality of initial weights to the i-th network layer
Figure FDA00023821988000000415
Make adjustments.
23.一种计算机可读介质,其特征在于,所述计算机可读介质上存储有指令,该指令在计算机上执行时使计算机执行权利要求1-20中任一项所述的神经网络模型的训练方法。23. A computer-readable medium, characterized in that the computer-readable medium has instructions stored thereon, and when the instructions are executed on a computer, the computer executes the neural network model of any one of claims 1-20. training method. 24.一种电子设备,其特征在于,包括:24. An electronic device, characterized in that, comprising: 存储器,用于存储由系统的一个或多个处理器执行的指令,以及memory for storing instructions for execution by one or more processors of the system, and 处理器,是系统的处理器之一,用于执行权利要求1-20中任一项所述的神经网络模型的训练方法。The processor, which is one of the processors of the system, is configured to execute the training method of the neural network model according to any one of claims 1-20.
CN202010086380.0A 2020-02-11 2020-02-11 Training method of neural network model and its media and electronic equipment Active CN111401546B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010086380.0A CN111401546B (en) 2020-02-11 2020-02-11 Training method of neural network model and its media and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010086380.0A CN111401546B (en) 2020-02-11 2020-02-11 Training method of neural network model and its media and electronic equipment

Publications (2)

Publication Number Publication Date
CN111401546A true CN111401546A (en) 2020-07-10
CN111401546B CN111401546B (en) 2023-12-08

Family

ID=71428349

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010086380.0A Active CN111401546B (en) 2020-02-11 2020-02-11 Training method of neural network model and its media and electronic equipment

Country Status (1)

Country Link
CN (1) CN111401546B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408197A (en) * 2021-06-11 2021-09-17 华帝股份有限公司 Training method of temperature field mathematical model
CN113467590A (en) * 2021-09-06 2021-10-01 南京大学 Many-core chip temperature reconstruction method based on correlation and artificial neural network
CN113642740A (en) * 2021-08-12 2021-11-12 百度在线网络技术(北京)有限公司 Model training method and device, electronic device and medium
CN115238871A (en) * 2022-08-12 2022-10-25 湖南国科微电子股份有限公司 Correcting method and system of quantization neural network and related components

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269351B1 (en) * 1999-03-31 2001-07-31 Dryken Technologies, Inc. Method and system for training an artificial neural network
US10192327B1 (en) * 2016-02-04 2019-01-29 Google Llc Image compression with recurrent neural networks
CN109447532A (en) * 2018-12-28 2019-03-08 中国石油大学(华东) A kind of oil reservoir inter well connectivity based on data-driven determines method
US10422854B1 (en) * 2019-05-01 2019-09-24 Mapsted Corp. Neural network training for mobile device RSS fingerprint-based indoor navigation
WO2019180314A1 (en) * 2018-03-20 2019-09-26 Nokia Technologies Oy Artificial neural networks
CN110490295A (en) * 2018-05-15 2019-11-22 华为技术有限公司 A neural network model, data processing method and processing device
EP3591586A1 (en) * 2018-07-06 2020-01-08 Capital One Services, LLC Data model generation using generative adversarial networks and fully automated machine learning system which generates and optimizes solutions given a dataset and a desired outcome
CN110717585A (en) * 2019-09-30 2020-01-21 上海寒武纪信息科技有限公司 Training method of neural network model, data processing method and related product

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6269351B1 (en) * 1999-03-31 2001-07-31 Dryken Technologies, Inc. Method and system for training an artificial neural network
US10192327B1 (en) * 2016-02-04 2019-01-29 Google Llc Image compression with recurrent neural networks
WO2019180314A1 (en) * 2018-03-20 2019-09-26 Nokia Technologies Oy Artificial neural networks
CN110490295A (en) * 2018-05-15 2019-11-22 华为技术有限公司 A neural network model, data processing method and processing device
EP3591586A1 (en) * 2018-07-06 2020-01-08 Capital One Services, LLC Data model generation using generative adversarial networks and fully automated machine learning system which generates and optimizes solutions given a dataset and a desired outcome
CN109447532A (en) * 2018-12-28 2019-03-08 中国石油大学(华东) A kind of oil reservoir inter well connectivity based on data-driven determines method
US10422854B1 (en) * 2019-05-01 2019-09-24 Mapsted Corp. Neural network training for mobile device RSS fingerprint-based indoor navigation
CN110717585A (en) * 2019-09-30 2020-01-21 上海寒武纪信息科技有限公司 Training method of neural network model, data processing method and related product

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
蔡瑞初 等: "面向"边缘"应用的卷积神经网络量化与压缩方法", 《计算机应用》 *
蔡瑞初 等: "面向"边缘"应用的卷积神经网络量化与压缩方法", 《计算机应用》, vol. 38, no. 9, 10 September 2018 (2018-09-10), pages 2449 - 2454 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113408197A (en) * 2021-06-11 2021-09-17 华帝股份有限公司 Training method of temperature field mathematical model
CN113642740A (en) * 2021-08-12 2021-11-12 百度在线网络技术(北京)有限公司 Model training method and device, electronic device and medium
CN113467590A (en) * 2021-09-06 2021-10-01 南京大学 Many-core chip temperature reconstruction method based on correlation and artificial neural network
CN115238871A (en) * 2022-08-12 2022-10-25 湖南国科微电子股份有限公司 Correcting method and system of quantization neural network and related components

Also Published As

Publication number Publication date
CN111401546B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
US11475298B2 (en) Using quantization in training an artificial intelligence model in a semiconductor solution
CN113326930B (en) Data processing method, neural network training method, related device and equipment
US12475388B2 (en) Machine learning model search method, related apparatus, and device
CN109816589B (en) Method and apparatus for generating manga style transfer model
CN111401546A (en) Training method of neural network model and its medium and electronic device
CN110084281A (en) Image generation method, neural network compression method, related device and equipment
CN109800865B (en) Neural network generation and image processing method and device, platform and electronic equipment
CN114065900B (en) Data processing methods and data processing devices
CN111670463B (en) Machine learning-based geometric mesh simplification
US20200320385A1 (en) Using quantization in training an artificial intelligence model in a semiconductor solution
CN112990440A (en) Data quantization method for neural network model, readable medium, and electronic device
JP2020004433A (en) Information processing apparatus and information processing method
CN120318591A (en) Image recognition and classification method, system, device and medium
CN113762297A (en) An image processing method and device
CN117975211A (en) Image processing method and device based on multi-mode information
CN113537470A (en) Model quantization method and device, storage medium and electronic device
WO2024060727A1 (en) Method and apparatus for training neural network model, and device and system
CN114510911B (en) Text processing method, device, computer equipment and storage medium
KR20210082993A (en) Quantized image generation method and sensor debice for perfoming the same
CN115936092A (en) Neural network model quantization method and device, storage medium and electronic device
CN113468935B (en) Face recognition method
CN114945105B (en) Wireless earphone audio hysteresis cancellation method combined with sound compensation
CN114239792B (en) Systems, devices and storage media for image processing using quantitative models
Chang et al. Ternary weighted networks with equal quantization levels
CN112435169A (en) Image generation method and device based on neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant