Interest point database establishing method based on eye movement fixation point moving track
Technical Field
The invention relates to a method for establishing a point-of-interest database of a three-dimensional model by collecting eye movement fixation point movement tracks when a human eye observes the three-dimensional model by using an eye movement instrument, analyzing and processing the tracks.
Background
The point-of-interest database, which is currently used as a three-dimensional model with a large reference amount, is a text "Evaluation of 3D interest point detection technologies via human-generated ground route" published in Vis Comut by Helin Dutagaci et al in 2012. In the experiment, 24 three-dimensional models are displayed on a window of a website, 23 experimenters can obtain views from different angles of the models through keys arranged on the window, and the experimenters can mark the interested areas of the models through a mouse after observing the views. And collecting data marked by the experimenter, eliminating pseudo interest points through an algorithm, and integrating an interest point data set of the three-dimensional model.
The points of interest obtained by this way of the database are not objective. Because the experimenter observes the data for a long time, the selected corner points are strongly subjective through brain thinking. This is in contrast to the point of interest where the model comes into sight of the human eye, drawing the attention of the human vision the first time. The invention establishes the interest point database through the eye movement fixation point moving track, and can truly reflect the interest points when the human observes the model.
Disclosure of Invention
In view of the fact that the current three-dimensional model interest point data set cannot reflect the reality of a human being when observing the model. The invention provides a method for establishing an interest point database based on eye movement fixation point movement tracks. The interest point database established by the method can reflect the real condition of human eyes when observing the model, and the data is more real and reliable.
In order to achieve the purpose, the invention is realized by the following technical scheme, which specifically comprises the following 4 steps:
and (1) collecting the three-dimensional model and making the three-dimensional model into a video material required by the experiment.
And (2) placing the video on an eye movement instrument for an experimenter to watch, obtaining data of the eye movement fixation point, and synthesizing the video with the eye movement fixation point through corresponding software.
And (3) generating a three-dimensional model with the eye movement fixation point by using the video with the eye movement fixation point through a motion point extraction and three-dimensional mapping algorithm.
And (4) analyzing the eye movement fixation points of the experimenters, sorting the eye movement fixation points of all the experimenters to obtain a three-dimensional model interest point set, discarding some inappropriate and abnormal data, merging the interest points and establishing an interest point database.
The method for making the video material in the step (1) is as follows:
the 24 stanford three-dimensional model libraries and models in the SHREC2007 model database were selected, and these three-dimensional models are widely used in the standard libraries for three-dimensional model research. And saving two groups of data for each three-dimensional model by using MATLAB, wherein one group is formed by sequentially rotating the three-dimensional models around an X axis at intervals of 60 degrees: three-dimensional models which rotate 0 degree, 60 degree, 120 degree, 180 degree, 240 degree, 300 degree and 360 degree respectively; one group is that the three-dimensional model rotates around the Y axis at intervals of 60 degrees in sequence: three-dimensional models which rotate 0 degree, 60 degree, 120 degree, 180 degree, 240 degree, 300 degree and 360 degree respectively; the Z-axis direction is selected as a viewpoint, and the model at 12 angles is projected on the XOY plane, so that 12 two-dimensional projection pictures are obtained by one model.
A formula for three-dimensional model rotation and a formula for projection onto the XOY plane.
Rotation around the X axis:
z′=zcosθ-xsinθ
x′=xcosθ+zsinθ
y′=y
rotation around the Y axis:
z′=zcosθ-ysinθ
x′=x
y′=ycosθ+zsinθ
formula for parallel projection:
x′=x
y′=y
z′=0
and transforming a picture every 1.5 seconds by using a video editor Movie Maker, synthesizing 12 two-dimensional pictures of each model into a short video, inserting a blank picture of ten seconds between every two models as rest time, and synthesizing 6 models into a long video. A total of 4 long videos were made from 24 models.
The precondition for the experimenter to carry out the experiment in the step (2) is as follows:
a) the display device of the image is placed on the left side, and the experimenter sits right in front of the display device of the image and keeps the distance at 70cm, while the experimenter keeps the eyes in accordance with the height of the screen, and can look up the center of the screen.
b) The operating personnel is on the right side, and with computer control video broadcast on display device, the operating personnel is kept apart with the light shield baffle with the experimenter.
c) The experiment interference to the experimenter caused by other light sources is prevented by isolating baffles around the experimenter and isolating curtains around the laboratory.
d) The sound of the environment of the laboratory is kept not more than 30dB, an ideal quiet environment is created, and the experiment interference of other sound sources to an experimenter is prevented.
The specific experiment in the step (2) is as follows:
the eye position of the test person was first adjusted using the iViewX software. After the pupil Image of the experimenter appears in the Eye Image frame, an operator needs to adjust the relative position of the screen and the experimenter, so that the pupil Image on the screen can be stably displayed in a centered mode. Wherein, the slight movement of the head of the experimenter can not influence the projection, and the loss of the image caused by blinking can be quickly recovered.
The movement locus of the eye movement fixation point when the experimenter watches the video is collected by using the expert Center software. The sight of the eyes of an experimenter needs to be calibrated, after calibration is finished, calibration feedback, namely deviation in the X, Y direction, can occur, when the deviation X and the deviation Y are smaller than 1.0, an experiment can be carried out, and then model video playing can occur;
and finally, synthesizing the material video and the eye movement fixation point tracking track by using BeGaze analysis software to obtain the model video with the experimenter eye movement fixation point track.
And (3) cutting the model video with the experimenter eye movement fixation point track acquired by the eye tracker according to frames, acquiring eye movement fixation point coordinates on each picture by a moving point extraction algorithm, and converting two-dimensional coordinates of a moving point on the picture into three-dimensional coordinates on a space by a three-dimensional mapping algorithm.
Firstly, extracting coordinates of a two-dimensional eye movement fixation point, and utilizing FFmpeg built software to cut a synthesized video into pictures according to frames to obtain two-dimensional pictures; and acquiring the color of the eye movement fixation point in the picture, wherein the movement track of the eye movement fixation point is embodied in a mode that the eye movement fixation point moves on the model, and the eye movement fixation point is orange through software setting in the last step. We call it the moving point color here and set the color tolerance value; then setting a two-dimensional array, initializing the two-dimensional array to set the value to be 1, comparing the two-dimensional array with a two-dimensional picture, and if the two-dimensional array is compared with the moving point color, obtaining three RGB values which are all in a tolerance range and meet the condition; otherwise, the point is not the eye movement point we are looking for. If no point satisfies the condition, the point of interest flag is set to 0, otherwise to 1. And (3) putting the points meeting the conditions into a point array, respectively recording the maximum and minimum row and column of the points meeting the conditions, taking the median of the maximum and minimum row and column coordinates, namely the coordinate (row, rank) of the interest point, and outputting the coordinate point of the interest point.
And then obtaining a three-dimensional model with an eye movement fixation point through a three-dimensional mapping algorithm. Because the two-dimensional picture of the model is obtained by projection on the three-dimensional XOY plane, we can see from the nature of the projection that the two-dimensional and three-dimensional coordinate points are linked to each other. The three-dimensional coordinates (x, y, z) are determined by first projecting the model onto the XOY plane so that x, y coordinate values are determined and the z coordinate is taken of the vertex of the model surface closest to the viewpoint and the z coordinate value of the vertex on the three-dimensional model is the z coordinate value of the eye gaze point on the three-dimensional model, and if the vertex is not present on the model surface, selecting the coordinate value of the model vertex at a distance within a threshold range as the coordinate value of the eye gaze point. Further, since the model is rotated around the coordinate axis when preparing the picture material, it is now required to be rotated in the reverse direction by a corresponding angle. And finally, obtaining the data of the interest points collected by the eye tracker.
And (4) correspondingly analyzing the experimental data result of the experimenter, sorting all the extracted eye movement fixation points to obtain a database of an interest point set, discarding some inappropriate and abnormal data, and appropriately merging the interest points, wherein the integration method comprises the following steps:
when constructing an evaluation library that evaluates the point of interest operator, two criteria are selected, one being the radius of the region of interest and the other being the number of action points within the region. The radius of the region of interest is set to σ dM, where dM represents the model diameter, i.e., the maximum euclidean distance between all vertices of the model M, and σ represents a constant coefficient. All interest points with a measurement distance of less than 2 σ dM are divided into the same region, and if the data volume of different experimenters in the region is less than n, the interest points in the region are discarded. Selecting a point from each region as a representative, and using the point as a standard interest point, wherein the standard is as follows: these points of interest that are selected as criteria need to satisfy the minimum sum of their geometric distances from all other points of interest in the area in which they are located. Note that this is also reasonable if the interest points of the two regions overlap. If the distance between two regions is less than 2 σ dM, the representation with the smaller number of points in the two regions is discarded from the interest point set of the evaluation criterion, and the representation with the larger number of points in the regions is taken as the interest point of the criterion. We denote the point of interest criteria library by the parameters n and σ, i.e., GM(n, σ) represents the point of interest data set for a particular model M. The corresponding values of these two parameters determine the point of interest criteria library. When the value of n is taken to be correspondingly higher, more moving points will fall within the interest area as σ increases, which are considered reasonable because not all volunteers select the details of a certain model as interest points, and σ increases accept more local variations of the labeled points. However, as σ is further increased, it doesOften, the defined regions will contain different regions of interest, so that the closely interesting points marked on different structures start to merge. The average number of point of interest evaluation criteria libraries given will vary with n and σ.
The invention has the following beneficial effects:
the method for establishing the interest point database is more suitable for reconstructing the model, because the eye tracker can be used for dividing the most concerned region part and the non-concerned region part of human eyes on the model. The reconstruction accuracy can be enhanced for the interested places of human beings through the interest point standard library, and the reconstruction accuracy is relatively reduced for the uninterested regions, so that the workload and the storage capacity of model reconstruction can be reduced.
Drawings
FIG. 1 is a database design flow diagram.
Fig. 2 is a video timing diagram.
Fig. 3 is an experimental environment diagram.
FIG. 4 is a flow chart of two-dimensional eye movement extraction.
Fig. 5 is a flow chart of data point integration.
Detailed Description
The invention is further described below with reference to the accompanying drawings.
A design flow of a method for establishing a point of interest database based on an eye movement fixation point moving track is shown in figure 1, and specifically comprises the following 4 steps:
step (1) firstly, a three-dimensional database model needs to be collected, and materials needed by an experiment are manufactured. The specific operation is as follows:
the 24 stanford three-dimensional model libraries and models in the SHREC2007 model database were selected, and these three-dimensional models are widely used in the standard libraries for three-dimensional model research. Storing two groups of data for each three-dimensional model by using MATLAB, wherein one group of data is the three-dimensional model which is sequentially rotated by 60 degrees around an X axis and is respectively rotated by 0 degree, 60 degrees, 120 degrees, 180 degrees, 240 degrees, 300 degrees and 360 degrees; one group is three-dimensional models of which the three-dimensional models rotate by 60 degrees around the Y axis in sequence, and the three-dimensional models rotate by 0 degree, 60 degrees, 120 degrees, 180 degrees, 240 degrees, 300 degrees and 360 degrees respectively; the Z-axis direction is selected as a viewpoint, and the model at 12 angles is projected on the XOY plane, so that 12 two-dimensional projection pictures are obtained by one model. The following are the equations for the rotation of the three-dimensional model and the equations projected onto the XOY plane.
Rotation around the X axis:
z′=zcosθ-xsinθ
x′=xcosθ+zsinθ
y′=y
rotation around the Y axis:
z′=zcosθ-ysinθ
x′=x
y′=ycosθ+zsinθ
formula for parallel projection:
x′=x
y′=y
z′=0
and transforming a picture every 1.5 seconds by using a video editor Movie Maker, synthesizing 12 two-dimensional pictures of each model into a short video, inserting a blank picture of ten seconds between every two models as rest time, and synthesizing 6 models into a long video. The 24 models were made into 4 long videos in total, and the timing diagram is shown in fig. 2.
And (2) putting the video on an eye tracker for an experimenter to watch, and acquiring experimental data by using the eye tracker and corresponding software, as shown in fig. 3.
The specific experimental conditions were as follows:
a) the display device of the image is placed on the left side, and the experimenter sits right in front of the display device of the image and keeps the distance at 70cm, while the experimenter keeps the eyes in accordance with the height of the screen, and can look up the center of the screen.
b) The operating personnel is on the right side, and with computer control video broadcast on display device, the operating personnel is kept apart with the light shield baffle with the experimenter.
c) The experiment interference to the experimenter caused by other light sources is prevented by isolating baffles around the experimenter and isolating curtains around the laboratory.
d) The sound of the environment of the laboratory is kept not more than 30dB, an ideal quiet environment is created, and the experiment interference of other sound sources to an experimenter is prevented.
The eye position of the test person was adjusted using the iViewX software. After the pupil Image of the experimenter appears in the Eye Image frame, an operator needs to adjust the relative position of the screen and the experimenter, so that the pupil Image on the screen can be stably displayed in a centered mode. Wherein, the slight movement of the head of the experimenter can not influence the projection, and the loss of the image caused by blinking can be quickly recovered. The movement locus of the eye movement fixation point when the experimenter watches the video is collected by using the expert Center software. The eye sight of an experimenter needs to be calibrated, after calibration is finished, calibration feedback, namely deviation in the X, Y direction, can occur, when the deviation X and Y are smaller than 1.0, an experiment can be carried out, and a model video is played on a screen; and finally, synthesizing the material video and the eye movement fixation point tracking track by using BeGaze analysis software to obtain the video with the eye movement fixation point track of the experimenter.
And (3) cutting the video with the experimenter eye movement fixation point track acquired by the eye movement instrument according to frames, extracting eye movement fixation point coordinates on each picture, and converting two-dimensional coordinates of the moving point on the picture into three-dimensional coordinates on a space through mapping. The specific operation comprises the following two steps:
1. coordinate extraction of two-dimensional eye movement fixation point
The flow of extracting the eye movement fixation point is shown in fig. 4. Firstly, utilizing FFmpeg built software to cut a synthesized video into pictures according to frames to obtain two-dimensional pictures; the color of the moving point in the picture is obtained, the moving track of the eye movement fixation point is embodied in the mode that the moving point moves on the model, and the moving point is orange through software setting in the last step. We call it the moving point color here and set the color tolerance value; then setting a two-dimensional array, initializing the two-dimensional array to set the value to be 1, comparing the two-dimensional array with a two-dimensional picture, and if the two-dimensional array is compared with the moving point color, obtaining three RGB values which are all in a tolerance range and meet the condition; otherwise, the point is not the eye movement point we are looking for. If no point satisfies the condition, the point of interest flag is set to 0, otherwise to 1. And (3) putting the points meeting the conditions into a point array, respectively recording the maximum and minimum row and column of the points meeting the conditions, taking the median of the maximum and minimum row and column coordinates, namely the coordinate (row, rank) of the interest point, and outputting the coordinate point of the interest point.
2. Three-dimensional mapping
The two-dimensional picture of the model is obtained by projection on a three-dimensional XOY plane, and by the nature of the projection, we can see that two-dimensional and three-dimensional coordinate points are related to each other. The three-dimensional coordinates (x, y, z) are determined by first projecting the model onto the XOY plane so that x, y coordinate values are determined and the z coordinate is taken of the vertex of the model surface closest to the viewpoint and the z coordinate value of the vertex on the three-dimensional model is the z coordinate value of the eye gaze point on the three-dimensional model, and if the vertex is not present on the model surface, selecting the coordinate value of the model vertex at a distance within a threshold range as the coordinate value of the eye gaze point. Further, since the model is rotated around the coordinate axis when preparing the picture material, it is now required to be rotated in the reverse direction by a corresponding angle. And finally, obtaining a data model of the interest point collected by the eye tracker.
And (4) correspondingly analyzing the experimental data result of the experimenter, sorting all the extracted eye movement fixation points to obtain a database of an interest point set, discarding some inappropriate and abnormal data, and appropriately combining the interest points, wherein the integration flow of the data is shown in fig. 5, so that the interest point database based on the eye movement fixation point movement track can be obtained. The specific operation is as follows:
when constructing an evaluation library that evaluates the point of interest operator, two criteria are selected, one being the radius of the region of interest and the other being the number of action points within the region. The radius of the region of interest is set to σ dM, where dM represents the model diameter, i.e., the maximum euclidean distance between all vertices of the model M, and σ represents a constant coefficient. All interest points with a measurement distance of less than 2 σ dM are divided into the same region, and if the data volume of different experimenters in the region is less than n, the interest points in the region are discarded. Selecting a point from each region as a representative by using a criterion, the criterion being a point of interest of the criterionComprises the following steps: these points of interest that are selected as criteria need to satisfy the minimum sum of their geometric distances from all other points of interest in the area in which they are located. Note that this is also reasonable if the interest points of the two regions overlap. If the distance between two regions is less than 2 σ dM, the representation with the smaller number of points in the two regions is discarded from the interest point set of the evaluation criterion, and the representation with the larger number of points in the regions is taken as the interest point of the criterion. We denote the point of interest criteria library by the parameters n and σ, i.e., GM(n, σ) represents the point of interest data set for a particular model M. The corresponding values of these two parameters determine the point of interest criteria library. When the value of n is taken to be correspondingly higher, more moving points will fall within the interest area as σ increases, which are considered reasonable because not all volunteers select the details of a certain model as interest points, and σ increases accept more local variations of the labeled points. However, as σ is further increased, the regions it defines tend to contain different regions of interest, so the closely interesting points marked on different structures begin to merge. The average number of point of interest evaluation criteria libraries given will vary with n and σ.