CN112307815A - Image processing method and device, electronic equipment and readable storage medium - Google Patents
Image processing method and device, electronic equipment and readable storage medium Download PDFInfo
- Publication number
- CN112307815A CN112307815A CN201910685032.2A CN201910685032A CN112307815A CN 112307815 A CN112307815 A CN 112307815A CN 201910685032 A CN201910685032 A CN 201910685032A CN 112307815 A CN112307815 A CN 112307815A
- Authority
- CN
- China
- Prior art keywords
- image
- neural network
- network model
- sample
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/30—Determination of transform parameters for the alignment of images, i.e. image registration
- G06T7/33—Determination of transform parameters for the alignment of images, i.e. image registration using feature-based methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/18—Eye characteristics, e.g. of the iris
- G06V40/193—Preprocessing; Feature extraction
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/013—Eye tracking input arrangements
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/20—Special algorithmic details
- G06T2207/20084—Artificial neural networks [ANN]
-
- G—PHYSICS
- G06—COMPUTING OR CALCULATING; COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T2207/00—Indexing scheme for image analysis or image enhancement
- G06T2207/30—Subject of image; Context of image processing
- G06T2207/30196—Human being; Person
- G06T2207/30201—Face
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Evolutionary Computation (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Biomedical Technology (AREA)
- Software Systems (AREA)
- Computational Linguistics (AREA)
- Human Computer Interaction (AREA)
- Multimedia (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Ophthalmology & Optometry (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Image Analysis (AREA)
Abstract
The embodiment of the application provides an image processing method, an image processing device, electronic equipment and a readable storage medium, wherein the method comprises the following steps: acquiring a face image of a user; and obtaining the sight focus position of the user based on the face image by using a neural network model. Based on the method provided by the embodiment of the application, the accuracy of the sight focus position estimation of the user can be effectively improved.
Description
Technical Field
The present application relates to the field of computer technologies, and in particular, to an image processing method and apparatus, an electronic device, and a readable storage medium.
Background
At present, with the development of scientific technology, various electronic devices have become an indispensable part of people's lives. In many application scenarios, it is sometimes necessary to estimate a focus of a user's gaze when using an electronic device, i.e., a focus point of the user's gaze, for example, selecting an application program and starting the application program with the gaze (equivalent to using the gaze as a mouse), or pushing an advertisement according to the gaze position, etc. In the application scenarios, the sight position of the user on the screen of the electronic equipment needs to be estimated accurately in real time. However, the estimation accuracy of the existing estimation implementation schemes in practical application needs to be improved.
Disclosure of Invention
The application aims to provide an image processing method, an image processing device, electronic equipment and a readable storage medium, so as to improve the accuracy of the estimation of the key point position of the sight of a user. The scheme provided by the embodiment of the application is as follows:
in a first aspect, an embodiment of the present application provides an image processing method based on a neural network model, where the method includes:
acquiring a face image of a user;
and obtaining the sight focus position of the user based on the face image by using a neural network model.
In a second aspect, an embodiment of the present application provides a method for training a neural network model, where the method includes:
acquiring a training sample set, wherein the training sample set comprises sample images;
and training the initial target neural network model based on each sample image until the loss function is converged to obtain the trained target neural network model.
In a third aspect, an embodiment of the present application provides an image processing apparatus, including:
the image acquisition module is used for acquiring a face image of a user;
and the sight focus position determining module is used for obtaining the sight focus position of the user based on the face image by using the neural network model.
In a fourth aspect, an embodiment of the present application provides an apparatus for training a neural network model, where the apparatus includes:
the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a training sample set which comprises sample images;
and the model training module is used for training the initial target neural network model based on each sample image until the loss function is converged to obtain the trained target neural network model.
In a fifth aspect, an embodiment of the present application provides an electronic device, which includes a memory and a processor; wherein the memory has stored therein a computer program; the processor is configured to invoke the computer program to perform the method provided in the first aspect or the second aspect of the present application.
In a sixth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements the method provided in the first or second aspect of the present application.
The advantages of the technical solutions provided in the present application will be described in detail with reference to the following embodiments and accompanying drawings, which are not described herein.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.
Fig. 1 is a schematic flowchart illustrating a method for training a neural network model according to an embodiment of the present disclosure;
FIG. 2 illustrates a flow diagram of a training method in an example of the present application;
FIG. 3 is a flow chart illustrating an image processing method according to an embodiment of the present disclosure;
FIG. 4 shows a schematic view of a screen marker in an example of the present application;
fig. 5 shows a schematic structural diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
Reference will now be made in detail to embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.
Generally, in a scene needing to estimate the positions of the sight line key points, an important performance of the sight line estimation scheme is precision, that is, a trained estimator needs to have high precision not only on a training set, but also on an actual test set, that is, a training algorithm has good generalization performance. Another important property is stability, when the user is fixed at the same point, or the user has a small movement near a certain point, the estimated sight position by the algorithm is required to be not only accurate, but also not have large jitter. But the generalization performance of the existing algorithm is poor. In the aspect of algorithm stability, most of the methods based on videos or multi-frames in the prior art are post-processing performed after prediction, and have the problem of delay. The sight line estimation needs a fast and accurate result in practical application.
For the problems existing in the prior art, the method for training the neural network model is provided on the one hand, the problem can be fundamentally solved in the aspect of training, a single picture is taken as input, the stability of sight line estimation is improved under the condition of no loss of real-time performance, and the method can be combined with a video-based method for use. The method can correspondingly process the initial sight line key point position obtained based on model prediction, and improves the sight line key point position with higher accuracy and stability.
The scheme provided by the present application is specifically described below.
An embodiment of the present application provides a training method of a neural network model, as shown in fig. 1, the method may include:
step S101: acquiring a training sample set, wherein the training sample set comprises sample images;
step S102: and training the initial target neural network model based on each sample image until the loss function is converged to obtain the trained target neural network model.
In an optional embodiment of the application, in the step S102, training the initial target neural network model based on each sample image specifically includes:
acquiring a first neural network model;
training the first neural network model at least twice based on each sample image to obtain the first neural network model after each training;
predicting each sample image through the neural network model after each training to obtain the prediction result of each sample image corresponding to the neural network model after each training;
and deleting the sample images in the training sample set based on the difference between the prediction result of each time corresponding to each sample image and the real result of the sample to obtain the processed sample images.
In an optional embodiment of the present application, when the first neural network model is trained at least twice based on the sample images, the sample images in the previous training are sample images obtained by deleting sample images of a set number or a set proportion, which have a smaller difference between a prediction result of the sample images and a real result of the sample images, from among the sample images used in the previous training.
That is to say, according to the scheme provided in the embodiment of the present application, before the model is trained based on the sample image, the optional manner may be first adopted to perform preprocessing on the sample image, that is, denoising processing on the training sample set, to filter out a part of poor sample images, and then train the target neural network model based on the filtered good sample images, so as to improve the accuracy of the trained model.
The preprocessing scheme for the sample image is further described below in connection with an example.
In this example, an iterative screening strategy is used, which aims to remove noise in a part of data through an algorithm, improve the quality of the data, and thus obtain a better training model. Let the entire dataset (i.e. the training sample set) contain N samples (i.e. x _ i, i ═ 1, …, N), and each sample knows its corresponding true value gt _ i, i.e. the true result. The number of training iterations in this example is M (M ≧ 2). Let Nd be N/M for each deleted sample number, i.e. the number of samples used in the next iterative training is the current sample number minus Nd. The method comprises the following specific steps:
1. the training sample set is initialized to a "data set" (containing N samples). Neural network parameters are initialized. A loss function is selected. A learning rate (e.g., 0.01) is initialized. Wherein, the neural network structure (i.e. the first neural network model) can be selected from the network structures of the prior art, such as AlexNet. Loss function for training neural networks we use rank plane loss function (ordering loss), but other loss functions can be chosen.
2. And training the neural network by using the training sample set to obtain the model and the model parameters after the training.
3. And (3) calculating a predicted value y _ i of each sample x _ i in the training sample set by using the neural network trained in the step (2), and calculating an error err _ i between the predicted value and a true value, namely distance (x _ i, y _ i), wherein the error metric distance can be selected as Euclidean distance. All errors are sorted in ascending order.
4. The first Nd samples (i.e., the Nd samples with the smallest error) are selected and deleted from the training sample set. I.e. the size of the current training sample set becomes N-Nd t (the current training in the t-th training in the t-order step 2). And saving the current neural network parameter model. And adjusts the learning rate (a general learning rate adjustment algorithm may be selected).
5. If N-Nd is not zero, go back to step 1 to continue execution, namely repeat the above steps until N-Nd is zero or M times of iterative training have been carried out.
6. And calculating the predicted values of the N samples of the whole data set by using all the stored M neural network parameter models (namely, the first neural network parameters corresponding to the M model parameters and the neural network parameters), and calculating the error between the predicted values and the actual values of the predicted values. Thus, M error results are obtained per sample. And (3) carrying out ascending arrangement on the N error results obtained by the same model to obtain M sequences, wherein N values of each sequence represent the errors of a certain model to all samples. If a certain sample x is ranked at the rear r% (the set ratio in this example is r%) of the sequence in the M sequences, x is considered as a noise sample, x is deleted from the data set, and finally a clean data set, that is, a training sample set with N × r% of N samples deleted, is obtained.
In an alternative embodiment of the present application, training the initial target neural network model based on each sample image includes:
inputting each sample image into a teacher network model to obtain an output result of each sample image;
taking each output result as a real result of each corresponding sample image, and training a target neural network model based on each sample image;
the teacher network model is any one randomly selected teacher network model in the teacher queue;
taking each output result as a real result of each corresponding sample image, and after training the target neural network model once based on each sample image, the method further comprises the following steps:
adding the target neural network model after each training to a teacher queue;
the model when the teacher queue is initialized is empty, and the real result of each sample image during initialization is the real result corresponding to the label of the sample image.
In an alternative embodiment of the present application, training the initial target neural network model based on each sample image includes:
initializing one part of model parameters of the target neural network model after each training, taking the other part of model parameters and the initialized part of model parameters as new model parameters of the target neural network model, and carrying out the next training of the target neural network model.
In an optional embodiment of the present application, initializing a part of model parameters of the target neural network model after each training includes:
determining the importance degree of each filter in the target neural network model;
determining a target filter needing parameter initialization according to the importance degree of each filter;
model parameters of each target filter are initialized.
In an optional embodiment of the present application, initializing the model parameters of each target filter includes:
decomposing a filter parameter matrix of a neural network layer where a target filter is located to obtain an orthogonal matrix of the filter parameter matrix;
for the neural network layer where the target filter is located, determining a feature vector corresponding to each target filter in an orthogonal matrix corresponding to the neural network layer according to the position of each target filter in the neural network layer in the corresponding neural network layer;
determining two norms of the feature vectors of all target filters in the same neural network layer according to the feature vectors corresponding to all the target filters in the same neural network layer;
and for each target filter, determining initialized parameters of the target filter according to the feature vector corresponding to the target filter and the corresponding two-norm in the neural network layer to which the target filter belongs.
The above examples of the present application provide a training method that can effectively reduce overfitting. The method is based on a basic framework of knowledge distillation, and two modules (a pruning module based on cosine similarity and an alignment orthogonal initialization module) can be added to optimize the training process, so that the accuracy and the stability of the model are improved.
The content referred to in the above examples is further explained below with reference to an example.
In this example, let the neural network model be net, with its parameters being W. The iteration number is K, and the number of times of traversing the training data in each iteration is L. The pruning rate is p% (i.e. the proportion of the filters with the network parameters to be redetermined to the total number of filters of the model) and the maximum pruning rate of each layer is p _ max% (i.e. the proportion of the filters with the network parameters to be redetermined to the total number of filters of the layer does not exceed p _ max% for a layer of the network structure of the model). The algorithmic process of the training method can be represented as:
the initialization teacher queue is empty. Parameters of the net are initialized.
And finishing, and outputting the last network model in the current teacher queue as a training result. The neural network structure may use, among other things, a prior art structure such as alexnet. The loss function may use existing techniques such as edit loss.
The pruning algorithm and the reinitialization algorithm are described below separately.
Let the filter parameter of each layer of the neural network model be WF, whose shape is (Nout, C, Kw, Kh), where Nout is the number of filters in the layer, C is the number of input channels, and Kw and Kh are the width and height of the filter in the layer. Shape adjustment of WF to Nout one-dimensional vectors WfiiAnd i is 1, …, Nout, the dimension of each one-dimensional vector is C multiplied by Kw multiplied by Kh, namely a vector with the row number of 1 and the column number of C multiplied by Kw multiplied by Kh, and Nout represents the number of filters in one layer.
The pruning algorithm may specifically include:
1. normalized according to formula (1)(the norm in the following formula, the European norm can be selected in the specific implementation, of course, other normalization methods can be adopted)
2. Simf (score of all filters representing all layers score) is calculated according to equation (2)
Simf={SimfkK is 1, … Layer _ num }, i.e. it is the set for all layers
In the above formulas, Layer _ num is the number of network structure layers of the model, SimfkSimf, i.e. Simf, for each filter represented as the k-th layerkAlso corresponds to a set, where the elements are Simf of each filter in the layer, such as the above k-th layer with the number of filters Nout, then SimfkFor the ith filter of one layer, corresponding to a number of Simf's with Nout, this filter may be based onAnd of the jth filter of the layer to which the filter belongsBy calculating the dot product of the twoThe correlation between the network parameters of the filter and the filter is obtained, based on the mode, the correlation between the network parameters of the ith filter and each filter of the layer to which the filter belongs can be calculated, and Nout correlations are obtainedThe Simf of the ith filter can be obtained.
It should be noted that the Simf of each filter represents the importance of the filter, and the larger the Simf is, the lower the importance is.
3. The Simf of each filter is arranged in ascending order, and the filters arranged in the last p% are cut filters W'. However, the filter ratio at which each layer is clipped should not be greater than p _ max%.
The specific steps of the reinitialization algorithm may include:
1. QR decomposition is carried out on each layer (layer) WF of the W, and an orthogonal matrix Worth is obtained. Taking the value of the position corresponding to W 'to obtain a matrix Worth' with the same size as W '(namely, Worth' is obtained by independently calculating each layer);
2. calculating W 'according to a formula, and calculating Wpra' of parameter aggregated with Batch Normalization (BN) (independently calculating for each filter)
The BNscale and the BNvar are parameters of Batch Normalization, the BNscale is a network coefficient of the BN layer, and the BNvar is a variance of the network parameters of the BN layer.
It will be appreciated that in practice this step may be omitted if the BN layer is not connected after the filter, i.e. the convolutional layer.
3. Calculating the two-norm of each row of Wpark,i(i.e. the second norm of the ith filter of the kth layer), and recording the maximum value and the minimum value of all the second norms obtained by each layer, which are respectively recorded as max _ norm and min _ norm.
4. The reinitialized weight Wr is obtained according to the following formulak,i(calculation of each filter for each layer)
Wherein, scalaralignedMay be sampled from a uniform distribution of (min _ norm, max _ norm).
In an alternative embodiment of the present application, training the initial target neural network model based on each sample image may include:
and determining the prediction loss of the target neural network model to each sample image during each training, correcting each sample image according to the prediction loss, and performing the next training of the target neural network model based on each corrected sample image.
In an optional embodiment of the present application, determining a prediction loss of the target neural network model to each sample image during each training, and correcting each sample image according to the prediction loss may specifically include:
determining the prediction loss of the target neural network model to each sample image during each training;
and determining the disturbance of the prediction loss on each sample image, and correcting each sample image based on the disturbance.
In an optional embodiment of the present application, determining a disturbance of a prediction loss on each sample image, and correcting each sample image based on the disturbance includes:
for each sample image, determining the gradient change of the prediction loss of the sample image to each pixel point in the sample image;
determining the disturbance of the prediction loss to each pixel point according to the gradient change corresponding to each pixel point;
and superposing the disturbance corresponding to each pixel point with the original pixel value of the pixel point corresponding to the sample image to obtain the corrected sample.
In an alternative embodiment of the present application, before determining the prediction loss of the target neural network model for each sample image in each training, the method includes:
for each sample image, cutting the sample image to obtain a global image and a local image of the sample image;
determining the prediction loss of the target neural network model to each sample image during each training, correcting each sample image according to the prediction loss, and performing the next training of the target neural network model based on each corrected sample image, wherein the training comprises the following steps:
taking the global image and the local image corresponding to each sample image as new sample images, and determining the prediction loss of the target neural network model to each new sample image during each training;
correcting each new sample image corresponding to the prediction loss of each new sample image;
and training the target neural network model next time based on each modified new sample image.
The above-described alternative embodiments of the present application further provide a training method that can effectively increase the robustness of the model. It is to be understood that this mode may be used in combination with the above-described method of reducing overfitting, may be used alone, or may be used on the basis of the result obtained by the above-described method of reducing overfitting, that is, the result obtained by the above-described method of reducing overfitting is used as the initial value of the part.
The training method for enhancing the robustness of the model is described below with reference to an example.
In this example, the target neural network model is exemplified by a model for predicting the positions of the gaze keypoints of the user in the face image. The training sample set, i.e. the data set, is a face image. A flowchart of the training method is shown in fig. 2, and the specific process may include:
1.1 inputting a random picture data X in the data set, and cutting the picture into a face picture X according to the detection result of the key points of the facef(Global image), left eye Picture Xl(partial image) and right eye picture XrAnd (local image) adjusting the sizes of the three pictures to preset fixed sizes by using bilinear interpolation and outputting the three pictures. Assume that the preset fixed sizes corresponding to the three pictures are 64x64,48x64, and 48x64, respectively.
1.2 determine whether this training has already been generated against the picture (modified image in this example): if not, outputting the original three pictures; if yes, outputting the latest three confrontation pictures.
1.3 inputting the three pictures into a neural network model, and calculating to obtain a network output Px(vector representing the positions of the sight-line key points) ═ f (Xf, Xl, Xr). Then through the sorting Loss function, the Loss is calculated and output. The calculation formula is as follows:
Lossi=-Y′xi*Log(Pxi)-(1-Y′xi)*Log(1-Pxi)
wherein i represents a vector Px or Y'xThe ith component of (1), bin _ num represents the total number of components, Y'xA representation vector representing the correct output value (i.e. ground route) corresponding to the input picture, i.e. the actual gaze keypoint location.
1.4 are respectively countedCalculating the gradient of the loss relative to the three pictures to obtain three groups of counterdisturbance Taking a face picture as an example, the calculation formula is as follows:
where p _ bin is the number of the last component in the Px vector that is greater than a set value (e.g., 0.5). Adding the three groups of confrontation disturbances to the corresponding pictures to obtain three confrontation picturesα is a hyperparameter, representing the step size, which is optionally 1.0.Representing laplacian, sgn () is a sign function, and k is a set value, for example, it can be taken as 4, and it can be understood that the value of 2k +1 cannot be greater than the total number of neurons in the output layer of the model. By the formulaThe gradient change of each pixel point in the image can be calculated.
1.5, judging whether the confrontation step number reaches a preset value step: if not, returning to the step 1.2, using the three confrontation pictures as input, and continuing to judge the step backwards; if yes, the step 1.6 is carried out. The preset value step can be configured according to actual requirements, and can be selected to be 3.
1.6, inputting the three confrontation pictures into a neural network model, and calculating the confrontation Loss Loss _ adv. Calculation method and Loss is the same, input picture is replaced by Xf adv,Xl adv,Xr adv。
And 1.7, inputting the antagonistic Loss Loss _ adv and the original Loss Loss, taking the weighted sum Loss _ total of the two losses as c Loss + (1-c) Loss _ adv according to a preset percentage c, solving the gradient of all parameters of the neural network model as the overall Loss, and carrying out gradient back transmission. Optionally, the preset percentage c may be 80%.
1.8 judging whether the training step number reaches a preset step number upper limit value s: if not, repeating the step 1.1 to the step 1.7, and judging the step; if yes, outputting parameters of the neural network model, and finishing the training process. The step number upper limit value s may be 200000 steps.
In an experiment, a neural network model for estimating the sight line of a user (i.e. the key point position of the sight line of the user) is trained based on the training method provided by the embodiment of the application, the model trained based on the scheme of the embodiment of the application and the model trained by the existing common training method are tested on the database of GAZE _ CN _ DB and GAZE _ start _ DB, and the experimental results are shown in the following table:
in the figure, the error 241 pixel indicates that the number of pixels of the deviation between the predicted coordinate (predicted view key position) and the actual coordinate (actual view key position) is 241, and the standard deviation 63.88 indicates that the standard deviation calculated based on the predicted deviation of each experimental sample is 63.88. As can be seen from the table, compared with the existing common training method, the model obtained by training based on the scheme of the embodiment of the application effectively improves the stability of the model prediction result in the aspect of performance.
An embodiment of the present application provides an image processing method based on a neural network model, as shown in fig. 3, the method may mainly include:
step S110: acquiring a face image of a user;
step S120: and obtaining the sight focus position of the user based on the face image by using a neural network model.
By adopting the image processing method provided by the embodiment of the application, the sight focus position of the user in the image, namely the position of the focus point concerned by the eyes of the user in the image can be determined according to the face image of the user.
In an alternative embodiment of the present application, obtaining the gaze focus position of the user based on the face image using the neural network model includes:
acquiring a position adjustment parameter;
obtaining a predicted sight focus position of the user based on the face image by using a neural network model;
and adjusting the predicted sight focus position based on the position adjustment parameter to obtain the adjusted sight focus position.
Alternatively, after the predicted gaze focal position of the user is obtained based on the neural network model, the predicted position may be adjusted based on the position adjustment parameter to obtain the gaze focal position of the user, so as to improve the accuracy of the gaze focal position.
In an alternative embodiment of the present application, the position adjustment parameter may be obtained by:
displaying the calibration object to a user, and acquiring a current face image of the user;
obtaining a predicted sight focus position of the user corresponding to the current face image based on the current face image by using a neural network model;
and determining position adjusting parameters according to the predicted sight focus position of the user corresponding to the current face image and the position of the calibration object.
In this embodiment, the user may be guided to pay attention to the calibration portion by providing the user with the calibration object, the face image of the user at that time may be acquired, and the position adjustment parameter may be determined based on the predicted gaze focal position of the face image acquired at that time and the position of the calibration object.
In practical application, the number of the calibration objects can be configured according to practical needs, and can be one or more. The style of the calibration object is not limited in the embodiments of the present application, and may be a calibration point.
As an example, a schematic diagram of a calibration object is shown in fig. 4, and as shown in the diagram, the calibration object in this example may be 3 specific calibration points shown in the diagram, and the step of determining the position adjustment parameter f (x) based on the 3 calibration points may include:
1. displaying a characteristic calibration point on a screen of an electronic device (in this example, a mobile phone is taken as an example) every time to guide a user to watch the specific calibration point on the screen, acquiring n (n is greater than or equal to 1) pictures at the calibration point through a visible light camera of the mobile phone, acquiring n pictures for 3 specific calibration points respectively and correspondingly, namely acquiring n pictures when displaying each characteristic calibration point, and recording the actual position of each calibration point on the screen, namely the coordinates of the calibration points per se, which are marked as g1, g2 and g 3; for the picture of each index point, the predicted sight line focus position of the user on the screen corresponding to each picture can be predicted through a neural network model, and the predicted sight line focus position corresponding to each index point, namely the predicted coordinate is obtained and is marked as p1, p2 and p3, and p1, p2 and p3 respectively correspond to g1, g2 and g 3. In practical applications, when n >1 (for example, n may be 3), the average value of the predicted coordinates of n pictures of each of the three specific calibration points may be adopted for p1, p2, and p 3.
2. After obtaining g1, g2, g3, p1, p2, and p3, the position adjustment parameters may be determined based on g1, g2, g3, p1, p2, and p 3.
In this example, an expression of an optional position adjustment parameter f (x) is given, which is specifically as follows:
wherein, x in the expression represents a coordinate, which is a predicted sight focus position that needs to be adjusted when the position is adjusted based on the function, and scr is a preconfigured maximum position.
It should be noted that, in practical applications, for the gaze focal position, in the calculation, coordinates of different dimensions need to be adjusted for the gaze focal position that needs to be adjusted, for example, if the focal position is in two directions (e.g., horizontal direction X and vertical direction Y), it is necessary to obtain a corresponding adjustment value according to the predicted gaze focal coordinate in the horizontal direction through the above function, obtain an adjusted gaze focal coordinate in the horizontal direction according to the adjustment value and the predicted gaze focal coordinate in the horizontal direction, and similarly, it is necessary to obtain a corresponding adjustment value according to the predicted gaze focal coordinate in the vertical direction through the above function, and obtain an adjusted gaze focal coordinate in the vertical direction according to the adjustment value and the predicted gaze focal coordinate in the vertical direction. Accordingly, the points p1, p2, p3, g1, g2, and g3 also need to be calculated based on coordinate values in each direction to perform corresponding calculations in each direction.
Based on the scheme provided by the embodiment of the application, when the attention point of the sight of the user on the electronic equipment needs to be determined, based on the collection of the face image of the user using the electronic equipment, the image is input into the neural network model, the predicted sight focus position x is output, the corresponding adjustment is obtained according to the adjustment function to F (x), the adjustment to F (x) is added to the predicted sight focus position to obtain the adjusted position F (x) + x, and the adjusted position is used as the sight focus position of the user on the electronic equipment.
In an optional embodiment of the application, the obtaining, in the step S120, the focal position of the line of sight of the user based on the face image by using the neural network model may include:
obtaining a predicted sight focus position of the user based on the face image by using a neural network model;
determining a predicted loss of the predicted gaze focal position;
determining a confidence level of the predicted gaze focal position based on the predicted loss;
if the confidence coefficient is greater than the set threshold value, determining the predicted sight line focus position as the sight line focus position of the user;
if the confidence coefficient is not greater than the set threshold, the predicted sight line focus position is adjusted to obtain the adjusted sight line focus position, or the sight line focus position corresponding to the previous frame of face image is determined as the sight line focus position of the user.
In an alternative embodiment of the present application, determining a confidence level of the predicted gaze focal position based on the predicted loss comprises:
determining at least two perturbations of the prediction loss to the facial image;
respectively correcting the face image based on each disturbance to obtain at least two corrected images;
obtaining a predicted sight focus position corresponding to each corrected image through a neural network model;
and obtaining the confidence coefficient according to the predicted sight focus position corresponding to each corrected image.
In an optional embodiment of the present application, obtaining a confidence level according to a predicted gaze focus position corresponding to each corrected image includes:
and determining a standard deviation according to the predicted sight focus position corresponding to each corrected image, and taking the reciprocal of the standard deviation as a confidence coefficient.
In an alternative embodiment of the present application, the determining of at least two perturbations of the facial image by the prediction loss comprises at least one of:
determining a predicted loss of an initial gaze focus position corresponding to the facial image relative to at least two directions; determining the disturbance of the initial sight focus position to the face image in each direction based on the corresponding prediction loss in each direction;
at least two perturbations of the initial gaze focal position to the face image are determined based on the at least two perturbation coefficients.
In an alternative embodiment of the present application, obtaining the predicted gaze focus position of the user based on the facial image using a neural network model includes:
cutting the face image to obtain a global image and a local image of the face image;
inputting the global image and the local image into a neural network model to obtain a predicted sight focus position of the user;
determining at least two perturbations of the prediction loss to the facial image, including:
determining at least two kinds of disturbance of prediction loss to each image in a global image and a local image;
correcting the face image based on each kind of disturbance to obtain at least two corrected images, including:
respectively correcting corresponding images based on at least two kinds of disturbance corresponding to each image in the global image and the local image to obtain at least two corrected images corresponding to each image;
obtaining the predicted sight focus position corresponding to each corrected image through a neural network model, wherein the predicted sight focus position comprises the following steps:
inputting each group of corrected images into a neural network model to obtain a predicted sight focus position corresponding to each group of corrected images, wherein each group of corrected images comprises a corrected image corresponding to each image obtained by cutting the face image;
obtaining a confidence level according to the predicted sight line focus position corresponding to each corrected image, comprising:
and obtaining the confidence degree based on the corresponding predicted sight line focus position of each group of modified images.
In an alternative embodiment of the present application, determining a perturbation of the facial image by the prediction loss comprises:
determining the gradient change of the prediction loss to each pixel point in the face image;
determining the disturbance of the prediction loss to each pixel point according to the gradient change corresponding to each pixel point;
correcting the face image based on the disturbance to obtain a corrected image, comprising:
and superposing the disturbance corresponding to each pixel point with the original pixel value of the pixel point corresponding to the face image to obtain the corrected image.
The above alternatives provided by the embodiments of the present application are specifically described below with reference to a specific example. In this example, the input is image data captured by a visible light camera of an electronic device (e.g., a cell phone), such as a single frame picture containing a human face. The image processing method in this example mainly includes the following steps:
1. the input of the step is a face image to be processed, namely a single-frame picture X, and for the single-frame picture X, the picture can be cut into the face picture X according to the detection result of the key points of the facefLeft eye picture XlRight eye picture XrAnd adjusting the sizes of the three pictures to preset fixed sizes by using bilinear interpolation and outputting the three pictures. For example, face image XfAnd the left eye image XlAnd right eye image XrThe corresponding preset fixed sizes may be 64px64p, 48px64p, 48px64p, respectively, where p denotes a pixel.
It can be understood that the face image XfI.e. the global image in this example, the left eye image XlAnd right eye image XrI.e. the partial image in this example.
2. Inputting the three pictures into a neural network model to obtain an output PxThe output in this example is the feature vector, P, corresponding to the initial gaze focus position of the single frame picture XxIs bin _ num. I.e. the total number of components of the vector, corresponding to the number of neurons of the model output layer, PxThe value of the ith component may be denoted as Pxi。
3. Based on PxiDetermining a plurality of kinds of disturbances, in this example, a face picture Xf is taken as an example to describe the disturbances, and the disturbances corresponding to the face picture Xf are expressed asWhere f represents the corresponding face picture, l corresponds to the direction, and in this example, there are two values of l, such as 1 and 2, a value of 1 corresponds to the left-hand disturbance, a value of 2 corresponds to the right-hand disturbance, and g and j correspond to the disturbance coefficients, respectively, and the corresponding explanation will be described later.
For face picture XfBased on PxiCan obtain a plurality ofDisturbanceThus, a plurality of confrontation pictures (i.e. corrected face pictures) are obtained, which can be expressed as:that is, each disturbance is associated with the face picture XfSuperposing (pixel value superposing) to obtain a modified face picture
In this example, the specific calculation formula is:
Loss_test_li=-Log(1-Pxi)
Loss_test_ri=-Log(Pxi)
in each expression, Loss _ test _ liIndicating the predicted Loss of the ith component in the left direction, Loss _ test _ riIndicating the prediction Loss of the i-th component in the rightward direction, and Loss _ test _ l indicating the predicted gaze-focus position whole pair in this exampleThe Loss of prediction in the left direction should be, Loss _ test _ r represents the Loss of prediction in the right direction corresponding to the whole predicted gaze focus position in this example, k is a set value, and if it can be taken as 4, it can be understood that the value of 2k +1 cannot be greater than the total number of neurons in the output layer of the model. p _ bin denotes the number of the last component (neuron number) in the Px vector that is greater than a set value (e.g., 0.5),which is indicative of a left-hand perturbation,a right-hand perturbation is represented and,representing the Laplace operator, alphajIndicating step size, j taking a different value, then αjCorresponding to different step sizes, e.g. two options for step size, respectively a1=1,α22, i.e., j may take the value of 1 or 2, in this case,a perturbation to the left corresponding to step 1 is indicated,then a left-hand perturbation corresponding to step 2 is indicated; sgn () denotes the sign function, posgRepresenting probability, pos, for different values of ggRepresenting different probabilities, in this example three choices of probability, e.g. pos1=1,pos2=0.8,pos3When the value of g is 0.6, i.e., 1, 2 or 3, in this case,a left-hand perturbation with a probability of 1 corresponding to step 1 is indicated,then it represents the left with a probability of 0.8 corresponding to step size 1The disturbance of the direction and so on.
rdm(M,posk)nmRepresenting pos in M matrixkRandom reserved elements of probability. As can be appreciated, rdm (M, pos)k)nmFor purposes of illustrating the meaning of the rdm () function only, in random () ≦ posgWhen the function results in the corresponding element value, random () > posgThe element value is 0. In particular, forRepresenting the position in pos of each pixelgRandomly selecting whether to generate disturbance or not, and obtaining a disturbance matrix with disturbance at some pixel positions and no disturbance at some pixel positions, for example, random () > pos at (m, n) positiongThen the value of the (m, n) position in the resulting matrix is 0, i.e. the perturbation at that position is 0.
3. Calculating confidence (inverse of the standard deviation std _ adv)
The equation for the confrontational standard deviation in this example is as follows:
wherein N is the total number of the corresponding antagonistic pictures of the three pictures, namely j, g, l,the predicted sight focus position corresponding to each group of confrontation pictures (namely each group of modified images) is represented, and for different values of j, g and l,corresponding to the predicted sight focus position of each group of corresponding confrontation pictures, and mean is an average value; var is the variance. In application, eachCorresponding to a group of confrontation pictures, each group of confrontation pictures comprises one confrontation picture corresponding to each of three pictures (face picture, left eye picture and right eye picture), namely, every three confrontation pictures are taken as a group of inputs of the neural network model, each group of inputs corresponds to one corrected predicted sight focus position, in the example, N groups of inputs are obtained, and N groups of inputs are obtainedBased on the NThe standard deviation can be calculated, and the confidence can be obtained by taking the reciprocal of the standard deviation.
4. Predicted gaze focus position P for single frame picture X from antagonistic standard deviationsxDifferent treatments are performed. In this step, the predicted gaze focus position P is input as a picturexAnd confidence 1/std _ adv, output as a pair of processed gaze focus positions.
Specifically, if 1/std _ adv is greater than threshold th1, it indicates that the confidence of the prediction result of the picture is high, and the predicted gaze focal position P is directly outputx(ii) a If 1/std _ adv is not greater than the threshold th1, it indicates that the confidence of the prediction result of the picture is small, and we perform temporal smoothing, such as kalman filtering, on the prediction coordinates. The threshold th1 may be configured according to actual requirements, for example, it may be 1/63.88.
Of course, in practical applications, the processing may be performed directly based on the standard deviation, specifically, if std _ adv is less than or equal to the threshold th2, the antagonistic standard deviation of the prediction result of the picture is small, and we consider that the confidence is high, and the predicted gaze focus position P is directly outputx(ii) a If std _ adv is greater than threshold th2, it indicates the prediction result of the pictureWith less confidence, we perform temporal smoothing on their predicted coordinates, such as kalman filtering. The threshold th2 may be configured according to actual requirements, for example, may be 63.88.
It is understood that the neural network model for image processing in the embodiment of the present application may be trained based on the training method provided in any embodiment of the present application. That is to say, the target neural network model to be trained may be a model for outputting a gaze focal position of the user (or a vector representation corresponding to the focal position), when the model is trained, a prediction result of the model is a predicted gaze focal position of the sample image, and a real result is a real gaze focal position of the user in the sample image, that is, a real coordinate point of the user on a screen position of the electronic device in the sample image.
The scheme provided by the embodiment of the application can be applied to various electronic devices, for example, the scheme can be applied to mobile electronic devices, such as mobile phones, tablets and the like, which only have a single visible light camera, and the mobile phone end can estimate the sight of the user based on videos/images. The scheme of the embodiment of the application can effectively improve the interaction performance between the sight line of a user and the equipment (mobile phone), for example, user data such as pictures can be acquired through a mobile phone camera, and then the position of the user on a mobile phone screen watched by the user can be estimated according to the user data.
It should be noted that, in the embodiment of the present application, for the example in the embodiment of the training method and the example in the image processing method, some corresponding parameter interpretations may be referred to each other.
In the embodiment of the training method, a method for denoising a data set and a training method for reducing overfitting are provided, so that the generalization performance of a trained model is improved. The embodiment of the application also provides a training method aiming at the ranking loss function based on the countertraining (namely the training method for enhancing the robustness of the model), so that the sight estimation result of the trained model is more stable, and the problem of jitter is solved in the training stage.
In the embodiment of the image processing method, a testing method for obtaining a stable prediction result is provided, the standard deviation and the prediction result of the confrontation sample of the test picture are output, and the prediction result is processed through the confrontation standard deviation. The embodiment of the application also provides a three-point calibration method for a specific person, and certainly, a single-point or multi-point calibration method can be adopted, so that the prediction result can be adjusted quickly and efficiently.
Based on the same principle, the embodiment of the application also provides an image processing device which comprises an image acquisition module and a sight line focus position determination module. Wherein:
the image acquisition module is used for acquiring a face image of a user;
and the sight focus position determining module is used for obtaining the sight focus position of the user based on the face image by using the neural network model.
Optionally, the gaze focus position determination module is specifically configured to:
acquiring a position adjustment parameter;
obtaining a predicted sight focus position of the user based on the face image by using a neural network model;
and adjusting the predicted sight focus position based on the position adjustment parameter to obtain the adjusted sight focus position.
Optionally, the position adjustment parameter is obtained by:
displaying the calibration object to a user, and acquiring a current face image of the user;
obtaining a predicted sight focus position of the user corresponding to the current face image based on the current face image by using a neural network model;
and determining position adjusting parameters according to the predicted sight focus position of the user corresponding to the current face image and the position of the calibration object.
Optionally, the gaze focus position determination module is specifically configured to:
obtaining a predicted sight focus position of the user based on the face image by using a neural network model;
determining a predicted loss of the predicted gaze focal position;
determining a confidence level of the predicted gaze focal position based on the predicted loss;
if the confidence coefficient is greater than the set threshold value, determining the predicted sight line focus position as the sight line focus position of the user;
if the confidence coefficient is not greater than the set threshold, the predicted sight line focus position is adjusted to obtain the adjusted sight line focus position, or the sight line focus position corresponding to the previous frame of face image is determined as the sight line focus position of the user.
Optionally, when the gaze focus position determination module determines the confidence of the predicted gaze focus position based on the predicted loss, the gaze focus position determination module is specifically configured to:
determining at least two perturbations of the prediction loss to the facial image;
respectively correcting the face images based on each kind of disturbance to obtain at least two corrected images;
obtaining a predicted sight focus position corresponding to each corrected image through a neural network model;
and obtaining the confidence coefficient according to the predicted sight focus position corresponding to each corrected image.
Optionally, when obtaining the confidence level according to the predicted gaze focus position corresponding to each corrected image, the gaze focus position determining module is specifically configured to:
determining a standard deviation according to the predicted sight focus position corresponding to each corrected image;
the inverse of the standard deviation was taken as the confidence.
Optionally, the gaze focus position determination module, when determining at least two perturbations to the facial image that are predicted to be lost, performs at least one of:
determining a predicted loss of an initial gaze focus position corresponding to the facial image relative to at least two directions; determining the disturbance of the initial sight focus position to the face image in each direction based on the corresponding prediction loss in each direction;
at least two perturbations of the initial gaze focal position to the face image are determined based on the at least two perturbation coefficients.
Optionally, when the gaze focus position determination module obtains the predicted gaze focus position of the user based on the face image by using the neural network model, the gaze focus position determination module is specifically configured to:
cutting the face image to obtain a global image and a local image of the face image;
inputting the global image and the local image into a neural network model to obtain a predicted sight focus position of the user;
the gaze focal position determination module, when determining at least two perturbations of the facial image by the prediction loss, is specifically configured to:
determining at least two kinds of disturbance of prediction loss to each image in a global image and a local image;
correcting the face image based on each kind of disturbance to obtain at least two corrected images, including:
respectively correcting corresponding images based on at least two kinds of disturbance corresponding to each image in the global image and the local image to obtain at least two corrected images corresponding to each image;
the gaze focus position determination module, when obtaining the predicted gaze focus position corresponding to each modified image through the neural network model, is specifically configured to:
inputting each group of corrected images into a neural network model to obtain a predicted sight focus position corresponding to each group of corrected images, wherein each group of corrected images comprises a corrected image corresponding to each image obtained by cutting the face image;
the gaze focus position determination module is specifically configured to, when obtaining the confidence level according to the predicted gaze focus position corresponding to each corrected image:
and obtaining the confidence degree based on the corresponding predicted sight line focus position of each group of modified images.
Optionally, when determining the disturbance of the prediction loss on the face image, the gaze focus position determination module is specifically configured to:
determining the gradient change of the prediction loss to each pixel point in the face image;
determining the disturbance of the prediction loss to each pixel point according to the gradient change corresponding to each pixel point;
correcting the face image based on the disturbance to obtain a corrected image, comprising:
and superposing the disturbance corresponding to each pixel point with the original pixel value of the pixel point corresponding to the face image to obtain the corrected image.
Based on the same principle, the embodiment of the application also provides a training device of the neural network model, and the device comprises a sample obtaining module and a model training module. Wherein:
the system comprises a sample acquisition module, a data acquisition module and a data processing module, wherein the sample acquisition module is used for acquiring a training sample set which comprises sample images;
and the model training module is used for training the initial target neural network model based on each sample image until the loss function is converged to obtain the trained target neural network model.
Optionally, the model training module is specifically configured to:
acquiring a first neural network model;
training the first neural network model at least twice based on each sample image to obtain the first neural network model after each training;
predicting each sample image through the neural network model after each training to obtain the prediction result of each sample image corresponding to the neural network model after each training;
and deleting the sample images in the training sample set based on the difference between the prediction result of each time corresponding to each sample image and the real result of the sample to obtain the processed sample images.
Optionally, when the model training module performs at least two times of training on the first neural network model based on the sample images, the sample images in the previous training are sample images obtained by deleting sample images of a set number or a set proportion, in each sample image used in the previous training, whose difference between the prediction result of the sample image and the real result of the sample image is small.
Optionally, the model training module is specifically configured to:
inputting each sample image into a teacher network model to obtain an output result of each sample image;
and taking each output result as a real result of each corresponding sample image, and training the target neural network model based on each sample image.
Optionally, the teacher network model is any one of the teacher network models randomly selected from the teacher queue;
optionally, the model training module is further configured to, after each training of the target neural network model based on each sample image is performed by taking each output result as a true result of each corresponding sample image:
adding the target neural network model after each training to a teacher queue;
the model when the teacher queue is initialized is empty, and the real result of each sample image during initialization is the real result corresponding to the label of the sample image.
Optionally, when the model training module trains the initial target neural network model based on each sample image, the model training module is specifically configured to:
initializing one part of model parameters of the target neural network model after each training, taking the other part of model parameters and the initialized part of model parameters as new model parameters of the target neural network model, and carrying out the next training of the target neural network model.
Optionally, when initializing a part of model parameters of the target neural network model after each training, the model training module is specifically configured to:
determining the importance degree of each filter in the target neural network model;
determining a target filter needing parameter initialization according to the importance degree of each filter;
model parameters of each target filter are initialized.
Optionally, when initializing the model parameters of each target filter, the model training module is specifically configured to:
decomposing a filter parameter matrix of a neural network layer where a target filter is located to obtain an orthogonal matrix of the filter parameter matrix;
for the neural network layer where the target filter is located, determining a feature vector corresponding to each target filter in an orthogonal matrix corresponding to the neural network layer according to the position of each target filter in the neural network layer in the corresponding neural network layer;
determining two norms of the feature vectors of all target filters in the same neural network layer according to the feature vectors corresponding to all the target filters in the same neural network layer;
and for each target filter, determining initialized parameters of the target filter according to the feature vector corresponding to the target filter and the corresponding two-norm in the neural network layer to which the target filter belongs.
It is understood that each module provided in the embodiments of the present application may have a function of implementing a corresponding step in the method provided in the embodiments of the present application. The functions may be implemented by hardware, or may be implemented by hardware executing corresponding software. The modules can be software and/or hardware, and can be implemented individually or by integrating a plurality of modules. For the functional description of each module of the speech translation apparatus, reference may be specifically made to the corresponding description in the methods in the foregoing embodiments, and details are not described here again.
In addition, in practical application, each functional module of the apparatus in the embodiment of the present application may be operated in the terminal device and/or the server according to a requirement of the practical application.
Based on the same principle, the embodiment of the application also provides an electronic device, which comprises a memory and a processor; the memory has a computer program stored therein; a processor for invoking a computer program to perform a method provided in any of the embodiments of the present application.
Based on the same principle, an embodiment of the present application further provides a computer-readable storage medium, where a computer program is stored in the storage medium, and when the computer program is executed by a processor, the computer program implements the method provided in any embodiment of the present application.
Alternatively, fig. 5 shows a schematic structural diagram of an electronic device to which the embodiment of the present application is applied, and as shown in fig. 5, the electronic device 4000 may include a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.
The memory 4003 is used for storing computer programs for executing the present scheme, and is controlled by the processor 4001 for execution. Processor 4001 is configured to execute a computer program stored in memory 4003 to implement what is shown in any of the foregoing method embodiments.
It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.
The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.
Claims (20)
Priority Applications (3)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910685032.2A CN112307815A (en) | 2019-07-26 | 2019-07-26 | Image processing method and device, electronic equipment and readable storage medium |
| KR1020200047423A KR20210012888A (en) | 2019-07-26 | 2020-04-20 | Method and apparatus for gaze tracking and method and apparatus for training neural network for gaze tracking |
| US16/937,722 US11347308B2 (en) | 2019-07-26 | 2020-07-24 | Method and apparatus with gaze tracking |
Applications Claiming Priority (1)
| Application Number | Priority Date | Filing Date | Title |
|---|---|---|---|
| CN201910685032.2A CN112307815A (en) | 2019-07-26 | 2019-07-26 | Image processing method and device, electronic equipment and readable storage medium |
Publications (1)
| Publication Number | Publication Date |
|---|---|
| CN112307815A true CN112307815A (en) | 2021-02-02 |
Family
ID=74329723
Family Applications (1)
| Application Number | Title | Priority Date | Filing Date |
|---|---|---|---|
| CN201910685032.2A Pending CN112307815A (en) | 2019-07-26 | 2019-07-26 | Image processing method and device, electronic equipment and readable storage medium |
Country Status (2)
| Country | Link |
|---|---|
| KR (1) | KR20210012888A (en) |
| CN (1) | CN112307815A (en) |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114445663A (en) * | 2022-01-25 | 2022-05-06 | 百度在线网络技术(北京)有限公司 | Method, apparatus and computer program product for detecting challenge samples |
| CN116091541A (en) * | 2022-12-21 | 2023-05-09 | 哲库科技(上海)有限公司 | Eye movement tracking method, eye movement tracking device, electronic device, storage medium, and program product |
| JP2023108563A (en) * | 2022-01-25 | 2023-08-04 | キヤノン株式会社 | Gaze detection device, display device, control method, and program |
| CN118051772A (en) * | 2024-01-23 | 2024-05-17 | 哈尔滨工程大学 | A robust training method based on phase flipping |
| WO2024245263A1 (en) * | 2023-05-29 | 2024-12-05 | 北京字跳网络技术有限公司 | Method and apparatus for constructing line-of-sight prediction model, and device and storage medium |
Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN104951084A (en) * | 2015-07-30 | 2015-09-30 | 京东方科技集团股份有限公司 | Eye-tracking method and device |
| CN106796449A (en) * | 2014-09-02 | 2017-05-31 | 香港浸会大学 | sight tracking method and device |
| CN108229284A (en) * | 2017-05-26 | 2018-06-29 | 北京市商汤科技开发有限公司 | Eye-controlling focus and training method and device, system, electronic equipment and storage medium |
| CN109271914A (en) * | 2018-09-07 | 2019-01-25 | 百度在线网络技术(北京)有限公司 | Detect method, apparatus, storage medium and the terminal device of sight drop point |
| US20190080474A1 (en) * | 2016-06-28 | 2019-03-14 | Google Llc | Eye gaze tracking using neural networks |
| CN109698901A (en) * | 2017-10-23 | 2019-04-30 | 广东顺德工业设计研究院(广东顺德创新设计研究院) | Atomatic focusing method, device, storage medium and computer equipment |
| CN110008835A (en) * | 2019-03-05 | 2019-07-12 | 成都旷视金智科技有限公司 | Sight prediction technique, device, system and readable storage medium storing program for executing |
-
2019
- 2019-07-26 CN CN201910685032.2A patent/CN112307815A/en active Pending
-
2020
- 2020-04-20 KR KR1020200047423A patent/KR20210012888A/en active Pending
Patent Citations (7)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN106796449A (en) * | 2014-09-02 | 2017-05-31 | 香港浸会大学 | sight tracking method and device |
| CN104951084A (en) * | 2015-07-30 | 2015-09-30 | 京东方科技集团股份有限公司 | Eye-tracking method and device |
| US20190080474A1 (en) * | 2016-06-28 | 2019-03-14 | Google Llc | Eye gaze tracking using neural networks |
| CN108229284A (en) * | 2017-05-26 | 2018-06-29 | 北京市商汤科技开发有限公司 | Eye-controlling focus and training method and device, system, electronic equipment and storage medium |
| CN109698901A (en) * | 2017-10-23 | 2019-04-30 | 广东顺德工业设计研究院(广东顺德创新设计研究院) | Atomatic focusing method, device, storage medium and computer equipment |
| CN109271914A (en) * | 2018-09-07 | 2019-01-25 | 百度在线网络技术(北京)有限公司 | Detect method, apparatus, storage medium and the terminal device of sight drop point |
| CN110008835A (en) * | 2019-03-05 | 2019-07-12 | 成都旷视金智科技有限公司 | Sight prediction technique, device, system and readable storage medium storing program for executing |
Cited By (5)
| Publication number | Priority date | Publication date | Assignee | Title |
|---|---|---|---|---|
| CN114445663A (en) * | 2022-01-25 | 2022-05-06 | 百度在线网络技术(北京)有限公司 | Method, apparatus and computer program product for detecting challenge samples |
| JP2023108563A (en) * | 2022-01-25 | 2023-08-04 | キヤノン株式会社 | Gaze detection device, display device, control method, and program |
| CN116091541A (en) * | 2022-12-21 | 2023-05-09 | 哲库科技(上海)有限公司 | Eye movement tracking method, eye movement tracking device, electronic device, storage medium, and program product |
| WO2024245263A1 (en) * | 2023-05-29 | 2024-12-05 | 北京字跳网络技术有限公司 | Method and apparatus for constructing line-of-sight prediction model, and device and storage medium |
| CN118051772A (en) * | 2024-01-23 | 2024-05-17 | 哈尔滨工程大学 | A robust training method based on phase flipping |
Also Published As
| Publication number | Publication date |
|---|---|
| KR20210012888A (en) | 2021-02-03 |
Similar Documents
| Publication | Publication Date | Title |
|---|---|---|
| CN112307815A (en) | Image processing method and device, electronic equipment and readable storage medium | |
| CN111325851B (en) | Image processing method and device, electronic device, and computer-readable storage medium | |
| CN114707604B (en) | Twin network tracking system and method based on space-time attention mechanism | |
| CN106157307B (en) | A kind of monocular image depth estimation method based on multiple dimensioned CNN and continuous CRF | |
| CN112801890B (en) | Video processing method, device and equipment | |
| US12400349B2 (en) | Joint depth prediction from dual-cameras and dual-pixels | |
| CN105046659B (en) | A kind of simple lens based on rarefaction representation is calculated as PSF evaluation methods | |
| CN114973098B (en) | A short video deduplication method based on deep learning | |
| CN107871099A (en) | Face detection method and apparatus | |
| CN113066034A (en) | Face image restoration method and device, restoration model, medium and equipment | |
| CN115187474B (en) | A two-stage dehazing method for dense fog images based on inference | |
| CN112419191A (en) | Image motion blur removing method based on convolution neural network | |
| CN115511708B (en) | Depth map super-resolution method and system based on uncertainty perception feature transmission | |
| CN119131265A (en) | Three-dimensional panoramic scene understanding method and device based on multi-view consistency | |
| CN111445496B (en) | Underwater image recognition tracking system and method | |
| CN115439738A (en) | A method of underwater target detection based on self-supervised collaborative reconstruction | |
| CN111667495A (en) | Image scene analysis method and device | |
| CN112634331B (en) | Optical flow prediction method and device | |
| CN112418279B (en) | Image fusion method, device, electronic equipment and readable storage medium | |
| CN119323741B (en) | Unmanned aerial vehicle video target detection method and system based on space-time correlation | |
| CN109978928A (en) | A kind of binocular vision solid matching method and its system based on Nearest Neighbor with Weighted Voting | |
| CN120612354A (en) | Self-supervised monocular depth estimation method for dynamic scenes based on efficient parameter fine-tuning | |
| Qiu et al. | A GAN-based motion blurred image restoration algorithm | |
| WO2024221818A1 (en) | Definition identification method and apparatus, model training method and apparatus, and device, medium and product | |
| Wang et al. | Image deblurring using fusion transformer-based generative adversarial networks |
Legal Events
| Date | Code | Title | Description |
|---|---|---|---|
| PB01 | Publication | ||
| PB01 | Publication | ||
| SE01 | Entry into force of request for substantive examination | ||
| SE01 | Entry into force of request for substantive examination |