[go: up one dir, main page]

CN112307815A - Image processing method and device, electronic equipment and readable storage medium - Google Patents

Image processing method and device, electronic equipment and readable storage medium Download PDF

Info

Publication number
CN112307815A
CN112307815A CN201910685032.2A CN201910685032A CN112307815A CN 112307815 A CN112307815 A CN 112307815A CN 201910685032 A CN201910685032 A CN 201910685032A CN 112307815 A CN112307815 A CN 112307815A
Authority
CN
China
Prior art keywords
image
neural network
network model
sample
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910685032.2A
Other languages
Chinese (zh)
Inventor
郭天楚
刘永超
刘夏冰
张辉
韩在濬
崔昌圭
郭荣竣
兪炳仁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Original Assignee
Beijing Samsung Telecommunications Technology Research Co Ltd
Samsung Electronics Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Samsung Telecommunications Technology Research Co Ltd, Samsung Electronics Co Ltd filed Critical Beijing Samsung Telecommunications Technology Research Co Ltd
Priority to CN201910685032.2A priority Critical patent/CN112307815A/en
Priority to KR1020200047423A priority patent/KR20210012888A/en
Priority to US16/937,722 priority patent/US11347308B2/en
Publication of CN112307815A publication Critical patent/CN112307815A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/30Determination of transform parameters for the alignment of images, i.e. image registration
    • G06T7/33Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/18Eye characteristics, e.g. of the iris
    • G06V40/193Preprocessing; Feature extraction
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • G06F3/013Eye tracking input arrangements
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30196Human being; Person
    • G06T2207/30201Face

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Biomedical Technology (AREA)
  • Software Systems (AREA)
  • Computational Linguistics (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Ophthalmology & Optometry (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Image Analysis (AREA)

Abstract

The embodiment of the application provides an image processing method, an image processing device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a face image of a user; and obtaining the sight focus position of the user based on the face image by using a neural network model. Based on the method provided by the embodiment of the application, the accuracy of the sight focus position estimation of the user can be effectively improved.

Description

Image processing method and device, electronic equipment and readable storage medium
Technical Field
The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a readable storage medium.
Background
At present, with the development of scientific technology, various electronic devices have become an indispensable part of people's lives. In many application scenarios, it is sometimes necessary to estimate a focus of a user's gaze when using an electronic device, i.e., a focus point of the user's gaze, for example, selecting an application program and starting the application program with the gaze (equivalent to using the gaze as a mouse), or pushing an advertisement according to the gaze position, etc. In the application scenarios, the sight position of the user on the screen of the electronic equipment needs to be estimated accurately in real time. However, the estimation accuracy of the existing estimation implementation schemes in practical application needs to be improved.
Disclosure of Invention
The application aims to provide an image processing method, an image processing device, electronic equipment and a readable storage medium, so as to improve the accuracy of the estimation of the key point position of the sight of a user. The scheme provided by the embodiment of the application is as follows:
in a first aspect, an embodiment of the present application provides an image processing method based on a neural network model, where the method includes:
acquiring a face image of a user;
and obtaining the sight focus position of the user based on the face image by using a neural network model.
In a second aspect, an embodiment of the present application provides a method for training a neural network model, where the method includes:
acquiring a training sample set, wherein the training sample set comprises sample images;
and training the initial target neural network model based on each sample image until the loss function is converged to obtain the trained target neural network model.
In a third aspect, an embodiment of the present application provides an image processing apparatus, including:
the image acquisition module is used for acquiring a face image of a user;
and the sight focus position determining module is used for obtaining the sight focus position of the user based on the face image by using the neural network model.
In a fourth aspect, an embodiment of the present application provides an apparatus for training a neural network model, where the apparatus includes:
the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a training sample set which comprises sample images;
and the model training module is used for training the initial target neural network model based on each sample image until the loss function is converged to obtain the trained target neural network model.
In a fifth aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor; wherein the memory has stored therein a computer program; the processor is configured to invoke the computer program to perform the method provided in the first aspect or the second aspect of the present application.
In a sixth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method provided in the first or second aspect of the present application.
The advantages of the technical solutions provided in the present application will be described in detail with reference to the following embodiments and accompanying drawings, which are not described herein.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart illustrating a method for training a neural network model according to an embodiment of the present disclosure;
FIG. 2 illustrates a flow diagram of a training method in an example of the present application;
FIG. 3 is a flow chart illustrating an image processing method according to an embodiment of the present disclosure;
FIG. 4 shows a schematic view of a screen marker in an example of the present application;
fig. 5 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
Generally, in a scene needing to estimate the positions of the sight line key points, an important performance of the sight line estimation scheme is precision, that is, a trained estimator needs to have high precision not only on a training set, but also on an actual test set, that is, a training algorithm has good generalization performance. Another important property is stability, when the user is fixed at the same point, or the user has a small movement near a certain point, the estimated sight position by the algorithm is required to be not only accurate, but also not have large jitter. But the generalization performance of the existing algorithm is poor. In the aspect of algorithm stability, most of the methods based on videos or multi-frames in the prior art are post-processing performed after prediction, and have the problem of delay. The sight line estimation needs a fast and accurate result in practical application.
For the problems existing in the prior art, the method for training the neural network model is provided on the one hand, the problem can be fundamentally solved in the aspect of training, a single picture is taken as input, the stability of sight line estimation is improved under the condition of no loss of real-time performance, and the method can be combined with a video-based method for use. The method can correspondingly process the initial sight line key point position obtained based on model prediction, and improves the sight line key point position with higher accuracy and stability.
The scheme provided by the present application is specifically described below.
An embodiment of the present application provides a training method of a neural network model, as shown in fig. 1, the method may include:
step S101: acquiring a training sample set, wherein the training sample set comprises sample images;
step S102: and training the initial target neural network model based on each sample image until the loss function is converged to obtain the trained target neural network model.
In an optional embodiment of the application, in the step S102, training the initial target neural network model based on each sample image specifically includes:
acquiring a first neural network model;
training the first neural network model at least twice based on each sample image to obtain the first neural network model after each training;
predicting each sample image through the neural network model after each training to obtain the prediction result of each sample image corresponding to the neural network model after each training;
and deleting the sample images in the training sample set based on the difference between the prediction result of each time corresponding to each sample image and the real result of the sample to obtain the processed sample images.
In an optional embodiment of the present application, when the first neural network model is trained at least twice based on the sample images, the sample images in the previous training are sample images obtained by deleting sample images of a set number or a set proportion, which have a smaller difference between a prediction result of the sample images and a real result of the sample images, from among the sample images used in the previous training.
That is to say, according to the scheme provided in the embodiment of the present application, before the model is trained based on the sample image, the optional manner may be first adopted to perform preprocessing on the sample image, that is, denoising processing on the training sample set, to filter out a part of poor sample images, and then train the target neural network model based on the filtered good sample images, so as to improve the accuracy of the trained model.
The preprocessing scheme for the sample image is further described below in connection with an example.
In this example, an iterative screening strategy is used, which aims to remove noise in a part of data through an algorithm, improve the quality of the data, and thus obtain a better training model. Let the entire dataset (i.e. the training sample set) contain N samples (i.e. x _ i, i ═ 1, …, N), and each sample knows its corresponding true value gt _ i, i.e. the true result. The number of training iterations in this example is M (M ≧ 2). Let Nd be N/M for each deleted sample number, i.e. the number of samples used in the next iterative training is the current sample number minus Nd. The method comprises the following specific steps:
1. the training sample set is initialized to a "data set" (containing N samples). Neural network parameters are initialized. A loss function is selected. A learning rate (e.g., 0.01) is initialized. Wherein, the neural network structure (i.e. the first neural network model) can be selected from the network structures of the prior art, such as AlexNet. Loss function for training neural networks we use rank plane loss function (ordering loss), but other loss functions can be chosen.
2. And training the neural network by using the training sample set to obtain the model and the model parameters after the training.
3. And (3) calculating a predicted value y _ i of each sample x _ i in the training sample set by using the neural network trained in the step (2), and calculating an error err _ i between the predicted value and a true value, namely distance (x _ i, y _ i), wherein the error metric distance can be selected as Euclidean distance. All errors are sorted in ascending order.
4. The first Nd samples (i.e., the Nd samples with the smallest error) are selected and deleted from the training sample set. I.e. the size of the current training sample set becomes N-Nd t (the current training in the t-th training in the t-order step 2). And saving the current neural network parameter model. And adjusts the learning rate (a general learning rate adjustment algorithm may be selected).
5. If N-Nd is not zero, go back to step 1 to continue execution, namely repeat the above steps until N-Nd is zero or M times of iterative training have been carried out.
6. And calculating the predicted values of the N samples of the whole data set by using all the stored M neural network parameter models (namely, the first neural network parameters corresponding to the M model parameters and the neural network parameters), and calculating the error between the predicted values and the actual values of the predicted values. Thus, M error results are obtained per sample. And (3) carrying out ascending arrangement on the N error results obtained by the same model to obtain M sequences, wherein N values of each sequence represent the errors of a certain model to all samples. If a certain sample x is ranked at the rear r% (the set ratio in this example is r%) of the sequence in the M sequences, x is considered as a noise sample, x is deleted from the data set, and finally a clean data set, that is, a training sample set with N × r% of N samples deleted, is obtained.
In an alternative embodiment of the present application, training the initial target neural network model based on each sample image includes:
inputting each sample image into a teacher network model to obtain an output result of each sample image;
taking each output result as a real result of each corresponding sample image, and training a target neural network model based on each sample image;
the teacher network model is any one randomly selected teacher network model in the teacher queue;
taking each output result as a real result of each corresponding sample image, and after training the target neural network model once based on each sample image, the method further comprises the following steps:
adding the target neural network model after each training to a teacher queue;
the model when the teacher queue is initialized is empty, and the real result of each sample image during initialization is the real result corresponding to the label of the sample image.
In an alternative embodiment of the present application, training the initial target neural network model based on each sample image includes:
initializing one part of model parameters of the target neural network model after each training, taking the other part of model parameters and the initialized part of model parameters as new model parameters of the target neural network model, and carrying out the next training of the target neural network model.
In an optional embodiment of the present application, initializing a part of model parameters of the target neural network model after each training includes:
determining the importance degree of each filter in the target neural network model;
determining a target filter needing parameter initialization according to the importance degree of each filter;
model parameters of each target filter are initialized.
In an optional embodiment of the present application, initializing the model parameters of each target filter includes:
decomposing a filter parameter matrix of a neural network layer where a target filter is located to obtain an orthogonal matrix of the filter parameter matrix;
for the neural network layer where the target filter is located, determining a feature vector corresponding to each target filter in an orthogonal matrix corresponding to the neural network layer according to the position of each target filter in the neural network layer in the corresponding neural network layer;
determining two norms of the feature vectors of all target filters in the same neural network layer according to the feature vectors corresponding to all the target filters in the same neural network layer;
and for each target filter, determining initialized parameters of the target filter according to the feature vector corresponding to the target filter and the corresponding two-norm in the neural network layer to which the target filter belongs.
The above examples of the present application provide a training method that can effectively reduce overfitting. The method is based on a basic framework of knowledge distillation, and two modules (a pruning module based on cosine similarity and an alignment orthogonal initialization module) can be added to optimize the training process, so that the accuracy and the stability of the model are improved.
The content referred to in the above examples is further explained below with reference to an example.
In this example, let the neural network model be net, with its parameters being W. The iteration number is K, and the number of times of traversing the training data in each iteration is L. The pruning rate is p% (i.e. the proportion of the filters with the network parameters to be redetermined to the total number of filters of the model) and the maximum pruning rate of each layer is p _ max% (i.e. the proportion of the filters with the network parameters to be redetermined to the total number of filters of the layer does not exceed p _ max% for a layer of the network structure of the model). The algorithmic process of the training method can be represented as:
the initialization teacher queue is empty. Parameters of the net are initialized.
Figure BDA0002145986430000071
Figure BDA0002145986430000085
And finishing, and outputting the last network model in the current teacher queue as a training result. The neural network structure may use, among other things, a prior art structure such as alexnet. The loss function may use existing techniques such as edit loss.
The pruning algorithm and the reinitialization algorithm are described below separately.
Let the filter parameter of each layer of the neural network model be WF, whose shape is (Nout, C, Kw, Kh), where Nout is the number of filters in the layer, C is the number of input channels, and Kw and Kh are the width and height of the filter in the layer. Shape adjustment of WF to Nout one-dimensional vectors WfiiAnd i is 1, …, Nout, the dimension of each one-dimensional vector is C multiplied by Kw multiplied by Kh, namely a vector with the row number of 1 and the column number of C multiplied by Kw multiplied by Kh, and Nout represents the number of filters in one layer.
The pruning algorithm may specifically include:
1. normalized according to formula (1)
Figure BDA0002145986430000081
(the norm in the following formula, the European norm can be selected in the specific implementation, of course, other normalization methods can be adopted)
Figure BDA0002145986430000082
2. Simf (score of all filters representing all layers score) is calculated according to equation (2)
Simf={SimfkK is 1, … Layer _ num }, i.e. it is the set for all layers
Figure BDA0002145986430000083
Figure BDA0002145986430000084
In the above formulas, Layer _ num is the number of network structure layers of the model, SimfkSimf, i.e. Simf, for each filter represented as the k-th layerkAlso corresponds to a set, where the elements are Simf of each filter in the layer, such as the above k-th layer with the number of filters Nout, then SimfkFor the ith filter of one layer, corresponding to a number of Simf's with Nout, this filter may be based on
Figure BDA0002145986430000091
And of the jth filter of the layer to which the filter belongs
Figure BDA0002145986430000092
By calculating the dot product of the two
Figure BDA0002145986430000093
The correlation between the network parameters of the filter and the filter is obtained, based on the mode, the correlation between the network parameters of the ith filter and each filter of the layer to which the filter belongs can be calculated, and Nout correlations are obtained
Figure BDA0002145986430000094
The Simf of the ith filter can be obtained.
It should be noted that the Simf of each filter represents the importance of the filter, and the larger the Simf is, the lower the importance is.
3. The Simf of each filter is arranged in ascending order, and the filters arranged in the last p% are cut filters W'. However, the filter ratio at which each layer is clipped should not be greater than p _ max%.
The specific steps of the reinitialization algorithm may include:
1. QR decomposition is carried out on each layer (layer) WF of the W, and an orthogonal matrix Worth is obtained. Taking the value of the position corresponding to W 'to obtain a matrix Worth' with the same size as W '(namely, Worth' is obtained by independently calculating each layer);
2. calculating W 'according to a formula, and calculating Wpra' of parameter aggregated with Batch Normalization (BN) (independently calculating for each filter)
Figure BDA0002145986430000095
The BNscale and the BNvar are parameters of Batch Normalization, the BNscale is a network coefficient of the BN layer, and the BNvar is a variance of the network parameters of the BN layer.
It will be appreciated that in practice this step may be omitted if the BN layer is not connected after the filter, i.e. the convolutional layer.
3. Calculating the two-norm of each row of Wpark,i(i.e. the second norm of the ith filter of the kth layer), and recording the maximum value and the minimum value of all the second norms obtained by each layer, which are respectively recorded as max _ norm and min _ norm.
4. The reinitialized weight Wr is obtained according to the following formulak,i(calculation of each filter for each layer)
Figure BDA0002145986430000096
Wherein, scalaralignedMay be sampled from a uniform distribution of (min _ norm, max _ norm).
In an alternative embodiment of the present application, training the initial target neural network model based on each sample image may include:
and determining the prediction loss of the target neural network model to each sample image during each training, correcting each sample image according to the prediction loss, and performing the next training of the target neural network model based on each corrected sample image.
In an optional embodiment of the present application, determining a prediction loss of the target neural network model to each sample image during each training, and correcting each sample image according to the prediction loss may specifically include:
determining the prediction loss of the target neural network model to each sample image during each training;
and determining the disturbance of the prediction loss on each sample image, and correcting each sample image based on the disturbance.
In an optional embodiment of the present application, determining a disturbance of a prediction loss on each sample image, and correcting each sample image based on the disturbance includes:
for each sample image, determining the gradient change of the prediction loss of the sample image to each pixel point in the sample image;
determining the disturbance of the prediction loss to each pixel point according to the gradient change corresponding to each pixel point;
and superposing the disturbance corresponding to each pixel point with the original pixel value of the pixel point corresponding to the sample image to obtain the corrected sample.
In an alternative embodiment of the present application, before determining the prediction loss of the target neural network model for each sample image in each training, the method includes:
for each sample image, cutting the sample image to obtain a global image and a local image of the sample image;
determining the prediction loss of the target neural network model to each sample image during each training, correcting each sample image according to the prediction loss, and performing the next training of the target neural network model based on each corrected sample image, wherein the training comprises the following steps:
taking the global image and the local image corresponding to each sample image as new sample images, and determining the prediction loss of the target neural network model to each new sample image during each training;
correcting each new sample image corresponding to the prediction loss of each new sample image;
and training the target neural network model next time based on each modified new sample image.
The above-described alternative embodiments of the present application further provide a training method that can effectively increase the robustness of the model. It is to be understood that this mode may be used in combination with the above-described method of reducing overfitting, may be used alone, or may be used on the basis of the result obtained by the above-described method of reducing overfitting, that is, the result obtained by the above-described method of reducing overfitting is used as the initial value of the part.
The training method for enhancing the robustness of the model is described below with reference to an example.
In this example, the target neural network model is exemplified by a model for predicting the positions of the gaze keypoints of the user in the face image. The training sample set, i.e. the data set, is a face image. A flowchart of the training method is shown in fig. 2, and the specific process may include:
1.1 inputting a random picture data X in the data set, and cutting the picture into a face picture X according to the detection result of the key points of the facef(Global image), left eye Picture Xl(partial image) and right eye picture XrAnd (local image) adjusting the sizes of the three pictures to preset fixed sizes by using bilinear interpolation and outputting the three pictures. Assume that the preset fixed sizes corresponding to the three pictures are 64x64,48x64, and 48x64, respectively.
1.2 determine whether this training has already been generated against the picture (modified image in this example): if not, outputting the original three pictures; if yes, outputting the latest three confrontation pictures.
1.3 inputting the three pictures into a neural network model, and calculating to obtain a network output Px(vector representing the positions of the sight-line key points) ═ f (Xf, Xl, Xr). Then through the sorting Loss function, the Loss is calculated and output. The calculation formula is as follows:
Lossi=-Y′xi*Log(Pxi)-(1-Y′xi)*Log(1-Pxi)
Figure BDA0002145986430000111
wherein i represents a vector Px or Y'xThe ith component of (1), bin _ num represents the total number of components, Y'xA representation vector representing the correct output value (i.e. ground route) corresponding to the input picture, i.e. the actual gaze keypoint location.
1.4 are respectively countedCalculating the gradient of the loss relative to the three pictures to obtain three groups of counterdisturbance
Figure BDA0002145986430000112
Figure BDA0002145986430000113
Taking a face picture as an example, the calculation formula is as follows:
Figure BDA0002145986430000121
Figure BDA0002145986430000122
where p _ bin is the number of the last component in the Px vector that is greater than a set value (e.g., 0.5). Adding the three groups of confrontation disturbances to the corresponding pictures to obtain three confrontation pictures
Figure BDA0002145986430000123
α is a hyperparameter, representing the step size, which is optionally 1.0.
Figure BDA0002145986430000125
Representing laplacian, sgn () is a sign function, and k is a set value, for example, it can be taken as 4, and it can be understood that the value of 2k +1 cannot be greater than the total number of neurons in the output layer of the model. By the formula
Figure BDA0002145986430000124
The gradient change of each pixel point in the image can be calculated.
1.5, judging whether the confrontation step number reaches a preset value step: if not, returning to the step 1.2, using the three confrontation pictures as input, and continuing to judge the step backwards; if yes, the step 1.6 is carried out. The preset value step can be configured according to actual requirements, and can be selected to be 3.
1.6, inputting the three confrontation pictures into a neural network model, and calculating the confrontation Loss Loss _ adv. Calculation method and Loss is the same, input picture is replaced by Xf adv,Xl adv,Xr adv
And 1.7, inputting the antagonistic Loss Loss _ adv and the original Loss Loss, taking the weighted sum Loss _ total of the two losses as c Loss + (1-c) Loss _ adv according to a preset percentage c, solving the gradient of all parameters of the neural network model as the overall Loss, and carrying out gradient back transmission. Optionally, the preset percentage c may be 80%.
1.8 judging whether the training step number reaches a preset step number upper limit value s: if not, repeating the step 1.1 to the step 1.7, and judging the step; if yes, outputting parameters of the neural network model, and finishing the training process. The step number upper limit value s may be 200000 steps.
In an experiment, a neural network model for estimating the sight line of a user (i.e. the key point position of the sight line of the user) is trained based on the training method provided by the embodiment of the application, the model trained based on the scheme of the embodiment of the application and the model trained by the existing common training method are tested on the database of GAZE _ CN _ DB and GAZE _ start _ DB, and the experimental results are shown in the following table:
Figure BDA0002145986430000131
in the figure, the error 241 pixel indicates that the number of pixels of the deviation between the predicted coordinate (predicted view key position) and the actual coordinate (actual view key position) is 241, and the standard deviation 63.88 indicates that the standard deviation calculated based on the predicted deviation of each experimental sample is 63.88. As can be seen from the table, compared with the existing common training method, the model obtained by training based on the scheme of the embodiment of the application effectively improves the stability of the model prediction result in the aspect of performance.
An embodiment of the present application provides an image processing method based on a neural network model, as shown in fig. 3, the method may mainly include:
step S110: acquiring a face image of a user;
step S120: and obtaining the sight focus position of the user based on the face image by using a neural network model.
By adopting the image processing method provided by the embodiment of the application, the sight focus position of the user in the image, namely the position of the focus point concerned by the eyes of the user in the image can be determined according to the face image of the user.
In an alternative embodiment of the present application, obtaining the gaze focus position of the user based on the face image using the neural network model includes:
acquiring a position adjustment parameter;
obtaining a predicted sight focus position of the user based on the face image by using a neural network model;
and adjusting the predicted sight focus position based on the position adjustment parameter to obtain the adjusted sight focus position.
Alternatively, after the predicted gaze focal position of the user is obtained based on the neural network model, the predicted position may be adjusted based on the position adjustment parameter to obtain the gaze focal position of the user, so as to improve the accuracy of the gaze focal position.
In an alternative embodiment of the present application, the position adjustment parameter may be obtained by:
displaying the calibration object to a user, and acquiring a current face image of the user;
obtaining a predicted sight focus position of the user corresponding to the current face image based on the current face image by using a neural network model;
and determining position adjusting parameters according to the predicted sight focus position of the user corresponding to the current face image and the position of the calibration object.
In this embodiment, the user may be guided to pay attention to the calibration portion by providing the user with the calibration object, the face image of the user at that time may be acquired, and the position adjustment parameter may be determined based on the predicted gaze focal position of the face image acquired at that time and the position of the calibration object.
In practical application, the number of the calibration objects can be configured according to practical needs, and can be one or more. The style of the calibration object is not limited in the embodiments of the present application, and may be a calibration point.
As an example, a schematic diagram of a calibration object is shown in fig. 4, and as shown in the diagram, the calibration object in this example may be 3 specific calibration points shown in the diagram, and the step of determining the position adjustment parameter f (x) based on the 3 calibration points may include:
1. displaying a characteristic calibration point on a screen of an electronic device (in this example, a mobile phone is taken as an example) every time to guide a user to watch the specific calibration point on the screen, acquiring n (n is greater than or equal to 1) pictures at the calibration point through a visible light camera of the mobile phone, acquiring n pictures for 3 specific calibration points respectively and correspondingly, namely acquiring n pictures when displaying each characteristic calibration point, and recording the actual position of each calibration point on the screen, namely the coordinates of the calibration points per se, which are marked as g1, g2 and g 3; for the picture of each index point, the predicted sight line focus position of the user on the screen corresponding to each picture can be predicted through a neural network model, and the predicted sight line focus position corresponding to each index point, namely the predicted coordinate is obtained and is marked as p1, p2 and p3, and p1, p2 and p3 respectively correspond to g1, g2 and g 3. In practical applications, when n >1 (for example, n may be 3), the average value of the predicted coordinates of n pictures of each of the three specific calibration points may be adopted for p1, p2, and p 3.
2. After obtaining g1, g2, g3, p1, p2, and p3, the position adjustment parameters may be determined based on g1, g2, g3, p1, p2, and p 3.
In this example, an expression of an optional position adjustment parameter f (x) is given, which is specifically as follows:
Figure BDA0002145986430000151
wherein, x in the expression represents a coordinate, which is a predicted sight focus position that needs to be adjusted when the position is adjusted based on the function, and scr is a preconfigured maximum position.
It should be noted that, in practical applications, for the gaze focal position, in the calculation, coordinates of different dimensions need to be adjusted for the gaze focal position that needs to be adjusted, for example, if the focal position is in two directions (e.g., horizontal direction X and vertical direction Y), it is necessary to obtain a corresponding adjustment value according to the predicted gaze focal coordinate in the horizontal direction through the above function, obtain an adjusted gaze focal coordinate in the horizontal direction according to the adjustment value and the predicted gaze focal coordinate in the horizontal direction, and similarly, it is necessary to obtain a corresponding adjustment value according to the predicted gaze focal coordinate in the vertical direction through the above function, and obtain an adjusted gaze focal coordinate in the vertical direction according to the adjustment value and the predicted gaze focal coordinate in the vertical direction. Accordingly, the points p1, p2, p3, g1, g2, and g3 also need to be calculated based on coordinate values in each direction to perform corresponding calculations in each direction.
Based on the scheme provided by the embodiment of the application, when the attention point of the sight of the user on the electronic equipment needs to be determined, based on the collection of the face image of the user using the electronic equipment, the image is input into the neural network model, the predicted sight focus position x is output, the corresponding adjustment is obtained according to the adjustment function to F (x), the adjustment to F (x) is added to the predicted sight focus position to obtain the adjusted position F (x) + x, and the adjusted position is used as the sight focus position of the user on the electronic equipment.
In an optional embodiment of the application, the obtaining, in the step S120, the focal position of the line of sight of the user based on the face image by using the neural network model may include:
obtaining a predicted sight focus position of the user based on the face image by using a neural network model;
determining a predicted loss of the predicted gaze focal position;
determining a confidence level of the predicted gaze focal position based on the predicted loss;
if the confidence coefficient is greater than the set threshold value, determining the predicted sight line focus position as the sight line focus position of the user;
if the confidence coefficient is not greater than the set threshold, the predicted sight line focus position is adjusted to obtain the adjusted sight line focus position, or the sight line focus position corresponding to the previous frame of face image is determined as the sight line focus position of the user.
In an alternative embodiment of the present application, determining a confidence level of the predicted gaze focal position based on the predicted loss comprises:
determining at least two perturbations of the prediction loss to the facial image;
respectively correcting the face image based on each disturbance to obtain at least two corrected images;
obtaining a predicted sight focus position corresponding to each corrected image through a neural network model;
and obtaining the confidence coefficient according to the predicted sight focus position corresponding to each corrected image.
In an optional embodiment of the present application, obtaining a confidence level according to a predicted gaze focus position corresponding to each corrected image includes:
and determining a standard deviation according to the predicted sight focus position corresponding to each corrected image, and taking the reciprocal of the standard deviation as a confidence coefficient.
In an alternative embodiment of the present application, the determining of at least two perturbations of the facial image by the prediction loss comprises at least one of:
determining a predicted loss of an initial gaze focus position corresponding to the facial image relative to at least two directions; determining the disturbance of the initial sight focus position to the face image in each direction based on the corresponding prediction loss in each direction;
at least two perturbations of the initial gaze focal position to the face image are determined based on the at least two perturbation coefficients.
In an alternative embodiment of the present application, obtaining the predicted gaze focus position of the user based on the facial image using a neural network model includes:
cutting the face image to obtain a global image and a local image of the face image;
inputting the global image and the local image into a neural network model to obtain a predicted sight focus position of the user;
determining at least two perturbations of the prediction loss to the facial image, including:
determining at least two kinds of disturbance of prediction loss to each image in a global image and a local image;
correcting the face image based on each kind of disturbance to obtain at least two corrected images, including:
respectively correcting corresponding images based on at least two kinds of disturbance corresponding to each image in the global image and the local image to obtain at least two corrected images corresponding to each image;
obtaining the predicted sight focus position corresponding to each corrected image through a neural network model, wherein the predicted sight focus position comprises the following steps:
inputting each group of corrected images into a neural network model to obtain a predicted sight focus position corresponding to each group of corrected images, wherein each group of corrected images comprises a corrected image corresponding to each image obtained by cutting the face image;
obtaining a confidence level according to the predicted sight line focus position corresponding to each corrected image, comprising:
and obtaining the confidence degree based on the corresponding predicted sight line focus position of each group of modified images.
In an alternative embodiment of the present application, determining a perturbation of the facial image by the prediction loss comprises:
determining the gradient change of the prediction loss to each pixel point in the face image;
determining the disturbance of the prediction loss to each pixel point according to the gradient change corresponding to each pixel point;
correcting the face image based on the disturbance to obtain a corrected image, comprising:
and superposing the disturbance corresponding to each pixel point with the original pixel value of the pixel point corresponding to the face image to obtain the corrected image.
The above alternatives provided by the embodiments of the present application are specifically described below with reference to a specific example. In this example, the input is image data captured by a visible light camera of an electronic device (e.g., a cell phone), such as a single frame picture containing a human face. The image processing method in this example mainly includes the following steps:
1. the input of the step is a face image to be processed, namely a single-frame picture X, and for the single-frame picture X, the picture can be cut into the face picture X according to the detection result of the key points of the facefLeft eye picture XlRight eye picture XrAnd adjusting the sizes of the three pictures to preset fixed sizes by using bilinear interpolation and outputting the three pictures. For example, face image XfAnd the left eye image XlAnd right eye image XrThe corresponding preset fixed sizes may be 64px64p, 48px64p, 48px64p, respectively, where p denotes a pixel.
It can be understood that the face image XfI.e. the global image in this example, the left eye image XlAnd right eye image XrI.e. the partial image in this example.
2. Inputting the three pictures into a neural network model to obtain an output PxThe output in this example is the feature vector, P, corresponding to the initial gaze focus position of the single frame picture XxIs bin _ num. I.e. the total number of components of the vector, corresponding to the number of neurons of the model output layer, PxThe value of the ith component may be denoted as Pxi
3. Based on PxiDetermining a plurality of kinds of disturbances, in this example, a face picture Xf is taken as an example to describe the disturbances, and the disturbances corresponding to the face picture Xf are expressed as
Figure BDA0002145986430000181
Where f represents the corresponding face picture, l corresponds to the direction, and in this example, there are two values of l, such as 1 and 2, a value of 1 corresponds to the left-hand disturbance, a value of 2 corresponds to the right-hand disturbance, and g and j correspond to the disturbance coefficients, respectively, and the corresponding explanation will be described later.
For face picture XfBased on PxiCan obtain a plurality ofDisturbance
Figure BDA0002145986430000182
Thus, a plurality of confrontation pictures (i.e. corrected face pictures) are obtained, which can be expressed as:
Figure BDA0002145986430000183
that is, each disturbance is associated with the face picture XfSuperposing (pixel value superposing) to obtain a modified face picture
Figure BDA0002145986430000184
In this example, the specific calculation formula is:
Loss_test_li=-Log(1-Pxi)
Loss_test_ri=-Log(Pxi)
Figure BDA0002145986430000185
Figure BDA0002145986430000186
Figure BDA0002145986430000187
Figure BDA0002145986430000188
wherein,
Figure BDA0002145986430000189
in each expression, Loss _ test _ liIndicating the predicted Loss of the ith component in the left direction, Loss _ test _ riIndicating the prediction Loss of the i-th component in the rightward direction, and Loss _ test _ l indicating the predicted gaze-focus position whole pair in this exampleThe Loss of prediction in the left direction should be, Loss _ test _ r represents the Loss of prediction in the right direction corresponding to the whole predicted gaze focus position in this example, k is a set value, and if it can be taken as 4, it can be understood that the value of 2k +1 cannot be greater than the total number of neurons in the output layer of the model. p _ bin denotes the number of the last component (neuron number) in the Px vector that is greater than a set value (e.g., 0.5),
Figure BDA0002145986430000191
which is indicative of a left-hand perturbation,
Figure BDA0002145986430000192
a right-hand perturbation is represented and,
Figure BDA0002145986430000193
representing the Laplace operator, alphajIndicating step size, j taking a different value, then αjCorresponding to different step sizes, e.g. two options for step size, respectively a1=1,α22, i.e., j may take the value of 1 or 2, in this case,
Figure BDA0002145986430000194
a perturbation to the left corresponding to step 1 is indicated,
Figure BDA0002145986430000195
then a left-hand perturbation corresponding to step 2 is indicated; sgn () denotes the sign function, posgRepresenting probability, pos, for different values of ggRepresenting different probabilities, in this example three choices of probability, e.g. pos1=1,pos2=0.8,pos3When the value of g is 0.6, i.e., 1, 2 or 3, in this case,
Figure BDA0002145986430000196
a left-hand perturbation with a probability of 1 corresponding to step 1 is indicated,
Figure BDA0002145986430000197
then it represents the left with a probability of 0.8 corresponding to step size 1The disturbance of the direction and so on.
rdm(M,posk)nmRepresenting pos in M matrixkRandom reserved elements of probability. As can be appreciated, rdm (M, pos)k)nmFor purposes of illustrating the meaning of the rdm () function only, in random () ≦ posgWhen the function results in the corresponding element value, random () > posgThe element value is 0. In particular, for
Figure BDA0002145986430000198
Representing the position in pos of each pixelgRandomly selecting whether to generate disturbance or not, and obtaining a disturbance matrix with disturbance at some pixel positions and no disturbance at some pixel positions, for example, random () > pos at (m, n) positiongThen the value of the (m, n) position in the resulting matrix is 0, i.e. the perturbation at that position is 0.
3. Calculating confidence (inverse of the standard deviation std _ adv)
The equation for the confrontational standard deviation in this example is as follows:
Figure BDA0002145986430000201
Figure BDA0002145986430000202
Figure BDA0002145986430000203
wherein N is the total number of the corresponding antagonistic pictures of the three pictures, namely j, g, l,
Figure BDA0002145986430000204
the predicted sight focus position corresponding to each group of confrontation pictures (namely each group of modified images) is represented, and for different values of j, g and l,
Figure BDA0002145986430000205
corresponding to the predicted sight focus position of each group of corresponding confrontation pictures, and mean is an average value; var is the variance. In application, each
Figure BDA0002145986430000206
Corresponding to a group of confrontation pictures, each group of confrontation pictures comprises one confrontation picture corresponding to each of three pictures (face picture, left eye picture and right eye picture), namely, every three confrontation pictures are taken as a group of inputs of the neural network model, each group of inputs corresponds to one corrected predicted sight focus position, in the example, N groups of inputs are obtained, and N groups of inputs are obtained
Figure BDA0002145986430000207
Based on the N
Figure BDA0002145986430000208
The standard deviation can be calculated, and the confidence can be obtained by taking the reciprocal of the standard deviation.
4. Predicted gaze focus position P for single frame picture X from antagonistic standard deviationsxDifferent treatments are performed. In this step, the predicted gaze focus position P is input as a picturexAnd confidence 1/std _ adv, output as a pair of processed gaze focus positions.
Specifically, if 1/std _ adv is greater than threshold th1, it indicates that the confidence of the prediction result of the picture is high, and the predicted gaze focal position P is directly outputx(ii) a If 1/std _ adv is not greater than the threshold th1, it indicates that the confidence of the prediction result of the picture is small, and we perform temporal smoothing, such as kalman filtering, on the prediction coordinates. The threshold th1 may be configured according to actual requirements, for example, it may be 1/63.88.
Of course, in practical applications, the processing may be performed directly based on the standard deviation, specifically, if std _ adv is less than or equal to the threshold th2, the antagonistic standard deviation of the prediction result of the picture is small, and we consider that the confidence is high, and the predicted gaze focus position P is directly outputx(ii) a If std _ adv is greater than threshold th2, it indicates the prediction result of the pictureWith less confidence, we perform temporal smoothing on their predicted coordinates, such as kalman filtering. The threshold th2 may be configured according to actual requirements, for example, may be 63.88.
It is understood that the neural network model for image processing in the embodiment of the present application may be trained based on the training method provided in any embodiment of the present application. That is to say, the target neural network model to be trained may be a model for outputting a gaze focal position of the user (or a vector representation corresponding to the focal position), when the model is trained, a prediction result of the model is a predicted gaze focal position of the sample image, and a real result is a real gaze focal position of the user in the sample image, that is, a real coordinate point of the user on a screen position of the electronic device in the sample image.
The scheme provided by the embodiment of the application can be applied to various electronic devices, for example, the scheme can be applied to mobile electronic devices, such as mobile phones, tablets and the like, which only have a single visible light camera, and the mobile phone end can estimate the sight of the user based on videos/images. The scheme of the embodiment of the application can effectively improve the interaction performance between the sight line of a user and the equipment (mobile phone), for example, user data such as pictures can be acquired through a mobile phone camera, and then the position of the user on a mobile phone screen watched by the user can be estimated according to the user data.
It should be noted that, in the embodiment of the present application, for the example in the embodiment of the training method and the example in the image processing method, some corresponding parameter interpretations may be referred to each other.
In the embodiment of the training method, a method for denoising a data set and a training method for reducing overfitting are provided, so that the generalization performance of a trained model is improved. The embodiment of the application also provides a training method aiming at the ranking loss function based on the countertraining (namely the training method for enhancing the robustness of the model), so that the sight estimation result of the trained model is more stable, and the problem of jitter is solved in the training stage.
In the embodiment of the image processing method, a testing method for obtaining a stable prediction result is provided, the standard deviation and the prediction result of the confrontation sample of the test picture are output, and the prediction result is processed through the confrontation standard deviation. The embodiment of the application also provides a three-point calibration method for a specific person, and certainly, a single-point or multi-point calibration method can be adopted, so that the prediction result can be adjusted quickly and efficiently.
Based on the same principle, the embodiment of the application also provides an image processing device which comprises an image acquisition module and a sight line focus position determination module. Wherein:
the image acquisition module is used for acquiring a face image of a user;
and the sight focus position determining module is used for obtaining the sight focus position of the user based on the face image by using the neural network model.
Optionally, the gaze focus position determination module is specifically configured to:
acquiring a position adjustment parameter;
obtaining a predicted sight focus position of the user based on the face image by using a neural network model;
and adjusting the predicted sight focus position based on the position adjustment parameter to obtain the adjusted sight focus position.
Optionally, the position adjustment parameter is obtained by:
displaying the calibration object to a user, and acquiring a current face image of the user;
obtaining a predicted sight focus position of the user corresponding to the current face image based on the current face image by using a neural network model;
and determining position adjusting parameters according to the predicted sight focus position of the user corresponding to the current face image and the position of the calibration object.
Optionally, the gaze focus position determination module is specifically configured to:
obtaining a predicted sight focus position of the user based on the face image by using a neural network model;
determining a predicted loss of the predicted gaze focal position;
determining a confidence level of the predicted gaze focal position based on the predicted loss;
if the confidence coefficient is greater than the set threshold value, determining the predicted sight line focus position as the sight line focus position of the user;
if the confidence coefficient is not greater than the set threshold, the predicted sight line focus position is adjusted to obtain the adjusted sight line focus position, or the sight line focus position corresponding to the previous frame of face image is determined as the sight line focus position of the user.
Optionally, when the gaze focus position determination module determines the confidence of the predicted gaze focus position based on the predicted loss, the gaze focus position determination module is specifically configured to:
determining at least two perturbations of the prediction loss to the facial image;
respectively correcting the face images based on each kind of disturbance to obtain at least two corrected images;
obtaining a predicted sight focus position corresponding to each corrected image through a neural network model;
and obtaining the confidence coefficient according to the predicted sight focus position corresponding to each corrected image.
Optionally, when obtaining the confidence level according to the predicted gaze focus position corresponding to each corrected image, the gaze focus position determining module is specifically configured to:
determining a standard deviation according to the predicted sight focus position corresponding to each corrected image;
the inverse of the standard deviation was taken as the confidence.
Optionally, the gaze focus position determination module, when determining at least two perturbations to the facial image that are predicted to be lost, performs at least one of:
determining a predicted loss of an initial gaze focus position corresponding to the facial image relative to at least two directions; determining the disturbance of the initial sight focus position to the face image in each direction based on the corresponding prediction loss in each direction;
at least two perturbations of the initial gaze focal position to the face image are determined based on the at least two perturbation coefficients.
Optionally, when the gaze focus position determination module obtains the predicted gaze focus position of the user based on the face image by using the neural network model, the gaze focus position determination module is specifically configured to:
cutting the face image to obtain a global image and a local image of the face image;
inputting the global image and the local image into a neural network model to obtain a predicted sight focus position of the user;
the gaze focal position determination module, when determining at least two perturbations of the facial image by the prediction loss, is specifically configured to:
determining at least two kinds of disturbance of prediction loss to each image in a global image and a local image;
correcting the face image based on each kind of disturbance to obtain at least two corrected images, including:
respectively correcting corresponding images based on at least two kinds of disturbance corresponding to each image in the global image and the local image to obtain at least two corrected images corresponding to each image;
the gaze focus position determination module, when obtaining the predicted gaze focus position corresponding to each modified image through the neural network model, is specifically configured to:
inputting each group of corrected images into a neural network model to obtain a predicted sight focus position corresponding to each group of corrected images, wherein each group of corrected images comprises a corrected image corresponding to each image obtained by cutting the face image;
the gaze focus position determination module is specifically configured to, when obtaining the confidence level according to the predicted gaze focus position corresponding to each corrected image:
and obtaining the confidence degree based on the corresponding predicted sight line focus position of each group of modified images.
Optionally, when determining the disturbance of the prediction loss on the face image, the gaze focus position determination module is specifically configured to:
determining the gradient change of the prediction loss to each pixel point in the face image;
determining the disturbance of the prediction loss to each pixel point according to the gradient change corresponding to each pixel point;
correcting the face image based on the disturbance to obtain a corrected image, comprising:
and superposing the disturbance corresponding to each pixel point with the original pixel value of the pixel point corresponding to the face image to obtain the corrected image.
Based on the same principle, the embodiment of the application also provides a training device of the neural network model, and the device comprises a sample obtaining module and a model training module. Wherein:
the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a training sample set which comprises sample images;
and the model training module is used for training the initial target neural network model based on each sample image until the loss function is converged to obtain the trained target neural network model.
Optionally, the model training module is specifically configured to:
acquiring a first neural network model;
training the first neural network model at least twice based on each sample image to obtain the first neural network model after each training;
predicting each sample image through the neural network model after each training to obtain the prediction result of each sample image corresponding to the neural network model after each training;
and deleting the sample images in the training sample set based on the difference between the prediction result of each time corresponding to each sample image and the real result of the sample to obtain the processed sample images.
Optionally, when the model training module performs at least two times of training on the first neural network model based on the sample images, the sample images in the previous training are sample images obtained by deleting sample images of a set number or a set proportion, in each sample image used in the previous training, whose difference between the prediction result of the sample image and the real result of the sample image is small.
Optionally, the model training module is specifically configured to:
inputting each sample image into a teacher network model to obtain an output result of each sample image;
and taking each output result as a real result of each corresponding sample image, and training the target neural network model based on each sample image.
Optionally, the teacher network model is any one of the teacher network models randomly selected from the teacher queue;
optionally, the model training module is further configured to, after each training of the target neural network model based on each sample image is performed by taking each output result as a true result of each corresponding sample image:
adding the target neural network model after each training to a teacher queue;
the model when the teacher queue is initialized is empty, and the real result of each sample image during initialization is the real result corresponding to the label of the sample image.
Optionally, when the model training module trains the initial target neural network model based on each sample image, the model training module is specifically configured to:
initializing one part of model parameters of the target neural network model after each training, taking the other part of model parameters and the initialized part of model parameters as new model parameters of the target neural network model, and carrying out the next training of the target neural network model.
Optionally, when initializing a part of model parameters of the target neural network model after each training, the model training module is specifically configured to:
determining the importance degree of each filter in the target neural network model;
determining a target filter needing parameter initialization according to the importance degree of each filter;
model parameters of each target filter are initialized.
Optionally, when initializing the model parameters of each target filter, the model training module is specifically configured to:
decomposing a filter parameter matrix of a neural network layer where a target filter is located to obtain an orthogonal matrix of the filter parameter matrix;
for the neural network layer where the target filter is located, determining a feature vector corresponding to each target filter in an orthogonal matrix corresponding to the neural network layer according to the position of each target filter in the neural network layer in the corresponding neural network layer;
determining two norms of the feature vectors of all target filters in the same neural network layer according to the feature vectors corresponding to all the target filters in the same neural network layer;
and for each target filter, determining initialized parameters of the target filter according to the feature vector corresponding to the target filter and the corresponding two-norm in the neural network layer to which the target filter belongs.
It is understood that each module provided in the embodiments of the present application may have a function of implementing a corresponding step in the method provided in the embodiments of the present application. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The modules can be software and/or hardware, and can be implemented individually or by integrating a plurality of modules. For the functional description of each module of the speech translation apparatus, reference may be specifically made to the corresponding description in the methods in the foregoing embodiments, and details are not described here again.
In addition, in practical application, each functional module of the apparatus in the embodiment of the present application may be operated in the terminal device and/or the server according to a requirement of the practical application.
Based on the same principle, the embodiment of the application also provides an electronic device, which comprises a memory and a processor; the memory has a computer program stored therein; a processor for invoking a computer program to perform a method provided in any of the embodiments of the present application.
Based on the same principle, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the computer program implements the method provided in any embodiment of the present application.
Alternatively, fig. 5 shows a schematic structural diagram of an electronic device to which the embodiment of the present application is applied, and as shown in fig. 5, the electronic device 4000 may include a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
Processor 4001 may be a CPU, general purpose processor, DSP, ASIC, FPGA or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.
Bus 4002 may include a path that carries information between the aforementioned components. Bus 4002 may be a PCI bus, EISA bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 5, but this is not intended to represent only one bus or type of bus.
Memory 4003 may be, but is not limited to, a ROM or other type of static storage device that can store static information and instructions, a RAM or other type of dynamic storage device that can store information and instructions, an EEPROM, a CD-ROM or other optical disk storage, an optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), a magnetic disk storage medium or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer.
The memory 4003 is used for storing computer programs for executing the present scheme, and is controlled by the processor 4001 for execution. Processor 4001 is configured to execute a computer program stored in memory 4003 to implement what is shown in any of the foregoing method embodiments.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (20)

1.一种基于神经网络模型的图像处理方法,其特征在于,包括:1. an image processing method based on neural network model, is characterized in that, comprises: 获取用户的脸部图像;Get the user's face image; 使用神经网络模型,基于所述脸部图像,得到所述用户的视线焦点位置。Using a neural network model, based on the face image, the focus position of the user's line of sight is obtained. 2.根据权利要求1所述的方法,其特征在于,所述使用神经网络模型,基于所述脸部图像,得到所述用户的视线焦点位置,包括:2. The method according to claim 1, wherein the using a neural network model to obtain the focus position of the user's sight based on the face image, comprising: 获取位置调整参数;Get the position adjustment parameters; 使用神经网络模型,基于所述脸部图像,得到所述用户的预测视线焦点位置;Using a neural network model, based on the face image, obtain the user's predicted line-of-sight focus position; 基于所述位置调整参数,对预测视线焦点位置进行调整,得到调整后的视线焦点位置。Based on the position adjustment parameter, the predicted line-of-sight focus position is adjusted to obtain the adjusted line-of-sight focus position. 3.根据权利要求2所述的方法,其特征在于,所述位置调整参数是通过以下方式得到的:3. The method according to claim 2, wherein the position adjustment parameter is obtained in the following manner: 将标定物显示给所述用户,并获取所述用户的当前脸部图像;displaying the calibration object to the user, and acquiring the current face image of the user; 使用神经网络模型,基于所述当前脸部图像,得到所述当前脸部图像对应的用户的预测视线焦点位置;Using a neural network model, based on the current face image, obtain the predicted line-of-sight focus position of the user corresponding to the current face image; 根据所述当前脸部图像对应的用户的预测视线焦点位置,以及所述标定物的位置,确定所述位置调整参数。The position adjustment parameter is determined according to the user's predicted line-of-sight focus position corresponding to the current face image and the position of the calibration object. 4.根据权利要求1所述的方法,其特征在于,所述使用神经网络模型,基于所述脸部图像,得到所述用户的视线焦点位置,包括:4. The method according to claim 1, wherein the using a neural network model to obtain the focus position of the user's sight based on the face image, comprising: 使用神经网络模型,基于所述脸部图像,得到所述用户的预测视线焦点位置;Using a neural network model, based on the face image, obtain the user's predicted line-of-sight focus position; 确定预测视线焦点位置的预测损失;Determining the predicted loss for predicting the focus position of the line of sight; 基于所述预测损失,确定所述预测视线焦点位置的置信度;determining the confidence of the predicted line-of-sight focus position based on the predicted loss; 若置信度大于设定阈值,则将预测视线焦点位置确定为用户的视线焦点位置;If the confidence level is greater than the set threshold, the predicted line-of-sight focus position is determined as the user's line-of-sight focus position; 若置信度不大于设定阈值,则对预测视线焦点位置进行调整,得到调整后的视线焦点位置,或者,将上一帧脸部图像对应的视线焦点位置确定为用户的视线焦点位置。If the confidence level is not greater than the set threshold, adjust the predicted line-of-sight focus position to obtain the adjusted line-of-sight focus position, or determine the line-of-sight focus position corresponding to the previous frame of face image as the user's line-of-sight focus position. 5.根据权利要求4所述的方法,其特征在于:所述基于所述预测损失,确定所述预测视线焦点位置的置信度,包括:5 . The method according to claim 4 , wherein: determining the confidence of the predicted line-of-sight focus position based on the predicted loss, comprising: 6 . 确定预测损失对于脸部图像的至少两种扰动;determining the prediction loss for at least two perturbations of the face image; 基于每种扰动分别脸部图像进行修正,得到至少两种修正后的图像;Correcting the facial images based on each disturbance to obtain at least two corrected images; 通过神经网络模型,得到每种修正后的图像所对应的预测视线焦点位置;Through the neural network model, the predicted line-of-sight focus position corresponding to each corrected image is obtained; 根据每种修正后的图像所对应的预测视线焦点位置,得到置信度。The confidence is obtained according to the predicted line-of-sight focal position corresponding to each modified image. 6.根据权利要求4所述的方法,其特征在于,所述根据每种修正后的图像所对应的预测视线焦点位置,得到置信度,包括:6. The method according to claim 4, characterized in that, obtaining the confidence according to the predicted line-of-sight focal position corresponding to each modified image, comprising: 根据每种修正后的图像所对应的预测视线焦点位置,确定标准差;Determine the standard deviation according to the predicted line-of-sight focal position corresponding to each modified image; 将标准差的倒数作为置信度。Take the inverse of the standard deviation as the confidence level. 7.根据权利要求5所述的方法,其特征在于,所述确定预测损失对于脸部图像的至少两种扰动,包括以下至少一种:7. The method according to claim 5, wherein the determining the prediction loss for at least two disturbances of the face image comprises at least one of the following: 确定脸部图像对应的初始视线焦点位置相对于至少两个方向上的预测损失;基于各方向上对应的预测损失,确定初始视线焦点位置在各方向上对脸部图像的扰动;Determine the initial line-of-sight focus position corresponding to the face image with respect to the prediction loss in at least two directions; determine the disturbance of the initial line-of-sight focus position to the face image in each direction based on the corresponding prediction losses in each direction; 基于至少两种扰动系数,确定初始视线焦点位置对于脸部图像的至少两种扰动。Based on the at least two disturbance coefficients, at least two disturbances of the initial line-of-sight focus position to the face image are determined. 8.根据权利要求6或7所述的方法,其特征在于,所述使用神经网络模型,基于所述脸部图像,得到所述用户的预测视线焦点位置,包括:8. The method according to claim 6 or 7, wherein the using a neural network model to obtain the predicted line-of-sight focus position of the user based on the face image, comprising: 对脸部图像进行裁剪,得到所述脸部图像的全局图像和局部图像;The facial image is cropped to obtain a global image and a partial image of the facial image; 将全局图像和局部图像输入至神经网络模型中,得到用户的预测视线焦点位置;Input the global image and local image into the neural network model to obtain the user's predicted focus position; 所述确定预测损失对于脸部图像的至少两种扰动,包括:The determining prediction loss for at least two perturbations of the face image, including: 确定预测损失对于全局图像和局部图像中各图像的至少两种扰动;determining the prediction loss for at least two perturbations of each image in the global image and the local image; 所述基于每种扰动分别脸部图像进行修正,得到至少两种修正后的图像,包括:The face image is corrected based on each disturbance, and at least two kinds of corrected images are obtained, including: 基于全局图像和局部图像中各图像对应的至少两种扰动,分别对相应的各图像进行修正,得到各图像对应的至少两个修正后的图像;Based on at least two disturbances corresponding to each image in the global image and the local image, correct each corresponding image respectively, and obtain at least two corrected images corresponding to each image; 所述通过神经网络模型,得到每种修正后的图像所对应的预测视线焦点位置,包括:The predicted line-of-sight focal position corresponding to each modified image is obtained through the neural network model, including: 将每组修正后的图像输入至神经网络模型,得到每组修正后的图像对应的预测视线焦点位置,每组修正后的图像包括对脸部图像进行剪裁后的各图像所对应的一个修正后的图像;Each group of corrected images is input into the neural network model, and the predicted line-of-sight focus position corresponding to each group of corrected images is obtained. Image; 所述根据每种修正后的图像所对应的预测视线焦点位置,得到置信度,包括:The confidence is obtained according to the predicted line-of-sight focal position corresponding to each modified image, including: 基于各组修改后的图像对应的预测视线焦点位置,得到置信度。Confidence is obtained based on the predicted line-of-sight focal positions corresponding to each set of modified images. 9.根据权利要求5至8中任一项所述的方法,其特征在于,确定预测损失对于脸部图像的扰动,包括:9. The method according to any one of claims 5 to 8, wherein determining the perturbation of the prediction loss to the face image comprises: 确定预测损失对于脸部图像中每一像素点的梯度变化;Determine the gradient change of the prediction loss for each pixel in the face image; 根据每一像素点对应的梯度变化,确定预测损失对每一像素点的扰动;According to the gradient change corresponding to each pixel, determine the perturbation of the prediction loss to each pixel; 基于扰动对脸部图像进行修正,得到修正后的图像,包括:Correct the face image based on the disturbance to obtain the corrected image, including: 将每一像素点对应的扰动与脸部图像对应的像素点的原像素值叠加,得到修正后的图像。The disturbance corresponding to each pixel is superimposed with the original pixel value of the pixel corresponding to the face image to obtain a corrected image. 10.一种神经网络模型的训练方法,其特征在于,包括:10. A method for training a neural network model, comprising: 获取训练样本集,训练样本集中包括各样本图像;Obtain a training sample set, which includes each sample image; 基于所述各样本图像对初始的目标神经网络模型进行训练,直至损失函数收敛,得到训练好的目标神经网络模型。The initial target neural network model is trained based on the sample images until the loss function converges, and the trained target neural network model is obtained. 11.根据权利要求10所述的方法,其特征在于,所述基于所述各样本图像对初始的目标神经网络模型进行训练,包括:11. The method according to claim 10, wherein the training of the initial target neural network model based on the respective sample images comprises: 获取第一神经网络模型;Obtain the first neural network model; 基于各样本图像对所述第一神经网络模型进行至少两次训练,得到每次训练后的第一神经网络模型;The first neural network model is trained at least twice based on each sample image to obtain the first neural network model after each training; 通过每一次训练后的神经网络模型分别对各训练样本进行预测,得到每次训练后的神经网络模型所对应的各样本图像的预测结果;Through the neural network model after each training, each training sample is predicted respectively, and the prediction result of each sample image corresponding to the neural network model after each training is obtained; 基于每个样本图像所对应的各次的预测结果与样本的真实结果之间的差异,对所述训练样本集中的样本图像进行删除处理,得到处理后的样本图像。Based on the difference between the prediction result corresponding to each sample image and the real result of the sample, delete the sample images in the training sample set to obtain the processed sample images. 12.根据权利要求11所述的方法,其特征在于,基于所述样本图像对所述第一神经网络模型进行至少两次训练时,当前次训练时的样本图像是将上一次训练时所采用的各样本图像中,样本图像的预测结果与样本图像的真实结果之间的差异较小的设定数量或设定比例的样本图像进行删除后的样本图像。12 . The method according to claim 11 , wherein when the first neural network model is trained at least twice based on the sample images, the sample images in the current training are those used in the previous training. 13 . Among the sample images of , the sample images of which the difference between the predicted result of the sample image and the real result of the sample image is small is a set number or a set proportion of sample images after deletion. 13.根据权利要求10所述的方法,其特征在于,所述基于所述各样本图像对初始的目标神经网络模型进行训练,包括:13. The method according to claim 10, wherein the training of the initial target neural network model based on the respective sample images comprises: 将各样本图像输入到教师网络模型中,得到各样本图像的输出结果;Input each sample image into the teacher network model to obtain the output result of each sample image; 将各输出结果作为对应的各样本图像的真实结果,基于各样本图像对目标神经网络模型进行训练;Take each output result as the real result of each corresponding sample image, and train the target neural network model based on each sample image; 其中,所述教师网络模型为教师队列中随机选取的任一教师网络模型;Wherein, the teacher network model is any teacher network model randomly selected in the teacher queue; 将各输出结果作为对应的各样本图像的真实结果,每基于各样本图像对目标神经网络模型进行一次训练之后,还包括:Taking each output result as the real result of each corresponding sample image, after each training of the target neural network model based on each sample image, it also includes: 将每次训练后的目标神经网络模型添加至所述教师队列中;adding the target neural network model after each training to the teacher queue; 其中,所述教师队列初始化时的模型为空,初始化时各样本图像的真实结果为样本图像的标注标签对应的真实结果。Wherein, the model when the teacher queue is initialized is empty, and the real result of each sample image during initialization is the real result corresponding to the label of the sample image. 14.根据权利要求10所述的方法,其特征在于,所述基于所述各样本图像对初始的目标神经网络模型进行训练,包括:14. The method according to claim 10, wherein the training of the initial target neural network model based on the respective sample images comprises: 对每次训练后的目标神经网络模型的一部分模型参数进行初始化,将另一部分模型参数和初始化后的所述一部分模型参数作为目标神经网络模型的新的模型参数,进行目标神经网络模型的下一次训练。Initialize a part of the model parameters of the target neural network model after each training, use another part of the model parameters and the initialized part of the model parameters as new model parameters of the target neural network model, and perform the next time of the target neural network model. train. 15.根据权利要求14所述的方法,其特征在于,所述对每次训练后的目标神经网络模型的一部分模型参数进行初始化,包括:15. The method according to claim 14, wherein the initializing a part of model parameters of the target neural network model after each training comprises: 确定所述目标神经网络模型中各滤波器的重要程度;Determine the importance of each filter in the target neural network model; 根据各滤波器的重要程度确定需要进行参数初始化的目标滤波器;Determine the target filter that needs parameter initialization according to the importance of each filter; 对各目标滤波器的模型参数进行初始化;Initialize the model parameters of each target filter; 其中,所述对各目标滤波器的模型参数进行初始化,包括:Wherein, the initialization of the model parameters of each target filter includes: 对目标滤波器所在的神经网络层的滤波器参数矩阵进行分解,得到滤波器参数矩阵的正交矩阵;Decompose the filter parameter matrix of the neural network layer where the target filter is located to obtain the orthogonal matrix of the filter parameter matrix; 对于目标滤波器所在的神经网络层,根据神经网络层中的各目标滤波器在对应的神经网络层中的位置,确定神经网络层对应的正交矩阵中各目标滤波器所对应的特征向量;For the neural network layer where the target filter is located, according to the position of each target filter in the neural network layer in the corresponding neural network layer, determine the eigenvectors corresponding to each target filter in the orthogonal matrix corresponding to the neural network layer; 根据同一神经网络层中各目标滤波器所对应的特征向量,确定同一神经网络层中各目标滤波器的特征向量的二范数;According to the eigenvectors corresponding to each target filter in the same neural network layer, determine the second norm of the eigenvectors of each target filter in the same neural network layer; 对于每个目标滤波器,根据目标滤波器所对应的特征向量,以及目标滤波器所属神经网络层中所对应的二范数,确定目标滤波器的初始化后的参数。For each target filter, the initialized parameters of the target filter are determined according to the feature vector corresponding to the target filter and the corresponding two-norm in the neural network layer to which the target filter belongs. 16.根据权利要求10所述的方法,其特征在于,所述基于所述各样本图像对初始的目标神经网络模型进行训练,包括:16. The method according to claim 10, wherein the training of the initial target neural network model based on the respective sample images comprises: 确定每次训练时目标神经网络模型对各样本图像的预测损失,根据预测损失对各样本图像进行修正,基于修正的各样本图像进行目标神经网络模型的下一次训练。The prediction loss of the target neural network model for each sample image in each training is determined, each sample image is modified according to the prediction loss, and the next training of the target neural network model is performed based on each modified sample image. 17.根据权利要求16所述的方法,其特征在于,所述确定每次训练时目标神经网络模型对各样本图像的预测损失,根据预测损失对各样本图像进行修正,包括:17. The method according to claim 16, wherein the determining the prediction loss of each sample image by the target neural network model during each training, and revising each sample image according to the prediction loss, comprises: 确定每次训练时目标神经网络模型对各样本图像的预测损失;Determine the prediction loss of the target neural network model for each sample image in each training; 对于每个样本图像,确定样本图像的预测损失对于样本图像中每一像素点的梯度变化;For each sample image, determine the gradient change of the prediction loss of the sample image for each pixel in the sample image; 根据每一像素点对应的梯度变化,确定预测损失对每一像素点的扰动;According to the gradient change corresponding to each pixel, determine the perturbation of the prediction loss to each pixel; 将每一像素点对应的扰动与样本图像对应的像素点的原像素值叠加,得到修正后的样本图像。The disturbance corresponding to each pixel point is superimposed with the original pixel value of the pixel point corresponding to the sample image to obtain the corrected sample image. 18.一种图像处理装置,其特征在于,包括:18. An image processing device, comprising: 图像获取模块,用于获取用户的脸部图像;an image acquisition module for acquiring the user's face image; 视线焦点位置确定模块,用于使用神经网络模型,基于所述脸部图像,得到所述用户的视线焦点位置。A line-of-sight focal position determination module, configured to use a neural network model to obtain the user's line-of-sight focal position based on the face image. 19.一种电子设备,其特征在于,包括存储器和处理器;19. An electronic device, comprising a memory and a processor; 所述存储器中存储有计算机程序;A computer program is stored in the memory; 所述处理器,用于调用所述计算机程序,以执行权利要求1至17中任一项所述的方法。The processor is used for invoking the computer program to execute the method of any one of claims 1 to 17. 20.一种计算机可读存储介质,其特征在于,所述存储介质中存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1至17中任一项所述的方法。20 . A computer-readable storage medium, wherein a computer program is stored in the storage medium, and when the computer program is executed by a processor, the method according to any one of claims 1 to 17 is implemented.
CN201910685032.2A 2019-07-26 2019-07-26 Image processing method and device, electronic equipment and readable storage medium Pending CN112307815A (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201910685032.2A CN112307815A (en) 2019-07-26 2019-07-26 Image processing method and device, electronic equipment and readable storage medium
KR1020200047423A KR20210012888A (en) 2019-07-26 2020-04-20 Method and apparatus for gaze tracking and method and apparatus for training neural network for gaze tracking
US16/937,722 US11347308B2 (en) 2019-07-26 2020-07-24 Method and apparatus with gaze tracking

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910685032.2A CN112307815A (en) 2019-07-26 2019-07-26 Image processing method and device, electronic equipment and readable storage medium

Publications (1)

Publication Number Publication Date
CN112307815A true CN112307815A (en) 2021-02-02

Family

ID=74329723

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910685032.2A Pending CN112307815A (en) 2019-07-26 2019-07-26 Image processing method and device, electronic equipment and readable storage medium

Country Status (2)

Country Link
KR (1) KR20210012888A (en)
CN (1) CN112307815A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445663A (en) * 2022-01-25 2022-05-06 百度在线网络技术(北京)有限公司 Method, apparatus and computer program product for detecting challenge samples
CN116091541A (en) * 2022-12-21 2023-05-09 哲库科技(上海)有限公司 Eye movement tracking method, eye movement tracking device, electronic device, storage medium, and program product
JP2023108563A (en) * 2022-01-25 2023-08-04 キヤノン株式会社 Gaze detection device, display device, control method, and program
CN118051772A (en) * 2024-01-23 2024-05-17 哈尔滨工程大学 A robust training method based on phase flipping
WO2024245263A1 (en) * 2023-05-29 2024-12-05 北京字跳网络技术有限公司 Method and apparatus for constructing line-of-sight prediction model, and device and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104951084A (en) * 2015-07-30 2015-09-30 京东方科技集团股份有限公司 Eye-tracking method and device
CN106796449A (en) * 2014-09-02 2017-05-31 香港浸会大学 sight tracking method and device
CN108229284A (en) * 2017-05-26 2018-06-29 北京市商汤科技开发有限公司 Eye-controlling focus and training method and device, system, electronic equipment and storage medium
CN109271914A (en) * 2018-09-07 2019-01-25 百度在线网络技术(北京)有限公司 Detect method, apparatus, storage medium and the terminal device of sight drop point
US20190080474A1 (en) * 2016-06-28 2019-03-14 Google Llc Eye gaze tracking using neural networks
CN109698901A (en) * 2017-10-23 2019-04-30 广东顺德工业设计研究院(广东顺德创新设计研究院) Atomatic focusing method, device, storage medium and computer equipment
CN110008835A (en) * 2019-03-05 2019-07-12 成都旷视金智科技有限公司 Sight prediction technique, device, system and readable storage medium storing program for executing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106796449A (en) * 2014-09-02 2017-05-31 香港浸会大学 sight tracking method and device
CN104951084A (en) * 2015-07-30 2015-09-30 京东方科技集团股份有限公司 Eye-tracking method and device
US20190080474A1 (en) * 2016-06-28 2019-03-14 Google Llc Eye gaze tracking using neural networks
CN108229284A (en) * 2017-05-26 2018-06-29 北京市商汤科技开发有限公司 Eye-controlling focus and training method and device, system, electronic equipment and storage medium
CN109698901A (en) * 2017-10-23 2019-04-30 广东顺德工业设计研究院(广东顺德创新设计研究院) Atomatic focusing method, device, storage medium and computer equipment
CN109271914A (en) * 2018-09-07 2019-01-25 百度在线网络技术(北京)有限公司 Detect method, apparatus, storage medium and the terminal device of sight drop point
CN110008835A (en) * 2019-03-05 2019-07-12 成都旷视金智科技有限公司 Sight prediction technique, device, system and readable storage medium storing program for executing

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114445663A (en) * 2022-01-25 2022-05-06 百度在线网络技术(北京)有限公司 Method, apparatus and computer program product for detecting challenge samples
JP2023108563A (en) * 2022-01-25 2023-08-04 キヤノン株式会社 Gaze detection device, display device, control method, and program
CN116091541A (en) * 2022-12-21 2023-05-09 哲库科技(上海)有限公司 Eye movement tracking method, eye movement tracking device, electronic device, storage medium, and program product
WO2024245263A1 (en) * 2023-05-29 2024-12-05 北京字跳网络技术有限公司 Method and apparatus for constructing line-of-sight prediction model, and device and storage medium
CN118051772A (en) * 2024-01-23 2024-05-17 哈尔滨工程大学 A robust training method based on phase flipping

Also Published As

Publication number Publication date
KR20210012888A (en) 2021-02-03

Similar Documents

Publication Publication Date Title
CN112307815A (en) Image processing method and device, electronic equipment and readable storage medium
CN111325851B (en) Image processing method and device, electronic device, and computer-readable storage medium
CN114707604B (en) Twin network tracking system and method based on space-time attention mechanism
CN106157307B (en) A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF
CN112801890B (en) Video processing method, device and equipment
US12400349B2 (en) Joint depth prediction from dual-cameras and dual-pixels
CN105046659B (en) A kind of simple lens based on rarefaction representation is calculated as PSF evaluation methods
CN114973098B (en) A short video deduplication method based on deep learning
CN107871099A (en) Face detection method and apparatus
CN113066034A (en) Face image restoration method and device, restoration model, medium and equipment
CN115187474B (en) A two-stage dehazing method for dense fog images based on inference
CN112419191A (en) Image motion blur removing method based on convolution neural network
CN115511708B (en) Depth map super-resolution method and system based on uncertainty perception feature transmission
CN119131265A (en) Three-dimensional panoramic scene understanding method and device based on multi-view consistency
CN111445496B (en) Underwater image recognition tracking system and method
CN115439738A (en) A method of underwater target detection based on self-supervised collaborative reconstruction
CN111667495A (en) Image scene analysis method and device
CN112634331B (en) Optical flow prediction method and device
CN112418279B (en) Image fusion method, device, electronic equipment and readable storage medium
CN119323741B (en) Unmanned aerial vehicle video target detection method and system based on space-time correlation
CN109978928A (en) A kind of binocular vision solid matching method and its system based on Nearest Neighbor with Weighted Voting
CN120612354A (en) Self-supervised monocular depth estimation method for dynamic scenes based on efficient parameter fine-tuning
Qiu et al. A GAN-based motion blurred image restoration algorithm
WO2024221818A1 (en) Definition identification method and apparatus, model training method and apparatus, and device, medium and product
Wang et al. Image deblurring using fusion transformer-based generative adversarial networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination