Detailed Description
The technical solutions of the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present specification, not all embodiments. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are intended to be within the scope of the present disclosure.
For a better understanding of the inventive concepts of the present application, first an image acquisition system according to an embodiment of the present description will be described. As shown in fig. 1, the image acquisition system 100 includes a camera 110, a processor 120, an eyeball photographing device 130, and a display device 140.
The photographing device 110 is a device for photographing an image, and for example, when a doctor needs to view organs and tissues of a surgical area of a patient, the photographing device 110 may be controlled to be moved to a specific position to photograph a corresponding image, thereby enabling the doctor to grasp the condition inside the patient.
Preferably, the camera 110 may be an endoscope when the image acquisition system 100 is applied to a minimally invasive surgical scene. The endoscope is generally an elongated tubular structure with a lens at the front end and a hand-held end at the back end. The hand-held end may be held by a robotic arm to control the spatial position of the camera 110 by the robotic arm. During surgery, an endoscope may be inserted into a patient through an incision in the patient's body surface to capture a photograph of the surgical site. The camera device may also be other medical imaging devices such as fluoroscopic, ultrasound imaging, etc., without limitation.
The display device 140 may be a display on a doctor-side control device, an image trolley, or other equipment having a display function, and is used for displaying a captured image. Preferably, in the embodiment of the present disclosure, the display device 140 displays an AR image (augmented reality image), so as to better display the three-dimensional space state of the surgical site.
The eyeball photographing device 130 is used for acquiring the sight line parameters of the user. When a user views an image, based on different gaze points, different pupil dilation states and pupil center positions exist in the eyeballs, and the gaze parameters can be used for representing the eyeball states of the user. By capturing an eye image of the user and analyzing the eye image, the gaze parameters of the user may be determined. By using the line of sight parameter, in combination with the image currently displayed by the display device 140, the gazing position of the user in the image can be determined, so as to realize the automatic adjustment of the pose of the camera device 110. After photographing an eyeball image of a user using the eyeball photographing device 130, a line-of-sight parameter of the user may be acquired by analyzing the eyeball image.
The user is preferably a doctor who performs a surgical operation currently, but may also be other medical staff or equipment operators, which is not limited.
Preferably, the eyeball photographing device 130 may be disposed on the display device 140, so that the gazing state of the eyeballs of the user on the image can be directly obtained when the user views the image displayed on the display device 140.
In some embodiments, the eyeball photographing device 130 may include a camera for photographing an eyeball image and a light source for providing illumination to ensure photographing quality of the eyeball image.
Fig. 2 is a schematic diagram of a user-operated doctor-side control device. The doctor-side control device is provided with a display for a user to grasp the condition in the patient by watching the image displayed by the display. An eyeball photographing device 130 is provided around the display of the doctor-side control device, and is capable of photographing an eyeball image of a user and determining the line of sight of the user by analyzing the eyeball image. The doctor end control device is also provided with a control device, so that the mechanical arm on the patient end control device can be operated to achieve the effect of manually moving the endoscope or the surgical instrument.
The processor 120 may be implemented in any suitable manner. For example, the processor 120 may take the form of, for example, a microprocessor or processor, and a computer-readable medium storing computer-readable program code (e.g., software or firmware) executable by the (micro) processor, logic gates, switches, application specific integrated circuits (Application Specific Integrated Circuit, ASIC), a programmable logic controller, and an embedded microcontroller, among others.
The processor 120 may analyze the gaze parameters of the user based on preset logic to determine the three-dimensional spatial location of the gaze point of the user. The processor 120 may also process the images to generate corresponding AR images in combination with the stored three-dimensional model of the target area.
The processor 120 may also communicate with the mechanical arm control device, and send an instruction to the mechanical arm control device, so as to achieve the effect of controlling the mechanical arm to drive the photographic device to move.
The image acquisition system described above is further described using a specific example view of a scene. As shown in fig. 3, in a practical minimally invasive surgical application environment, an image trolley, a patient operating end, and a doctor operating end are generally included. The patient handling end includes a plurality of robotic arms that can be used to hold corresponding surgical instruments and endoscopes, and that can be extended into the patient through an incision in the patient's body surface for viewing and surgical procedures. The doctor operation end is mainly used for controlling the mechanical arm on the patient operation end so as to adjust the observation visual angle and execute specific operation by using the surgical instrument. The image trolley is connected with an endoscope which can be extended to the patient operating end to be clamped by a corresponding mechanical arm. The image trolley can display images shot by the endoscope in real time, and can display images in the forms of a heat point diagram, a track diagram and the like after processing the images, so that other medical staff can acquire corresponding operation information, and the execution process of an operation is assisted.
Based on the above image acquisition system, an embodiment of the present disclosure proposes an image acquisition method for automatically capturing a corresponding image based on a user demand. The main body of execution of the image acquisition method may be a processor in the image acquisition system described above. As shown in fig. 4, the image acquisition method includes the following specific implementation steps.
S410: a presentation image is generated based on the target region three-dimensional model and the initial image.
The target area may be a focal area of the patient. For example, where the procedure performed is abdominal surgery, the target area may be the abdomen of the patient. As shown in fig. 5, a schematic view of a surgical scenario is performed using a laparoscope. In the laparoscopic field of view corresponding to the laparoscope, surgical instruments and intra-cavity organs are required to be included, so that a doctor performing the operation can be ensured to know the operation condition completely, and smooth implementation of the operation is ensured.
The target region three-dimensional model is a three-dimensional model constructed for the target region of the patient. Since the embodiment of the present specification is to perform three-dimensional spatial positioning of a camera device based on a user's demand, by constructing a three-dimensional model in advance, a corresponding three-dimensional spatial position can be determined in the subsequent analysis processing.
The specific method for obtaining the three-dimensional model of the target area may be to perform preoperative focus modeling on the patient, scan the operative area of the patient by using a tomography technology to obtain the three-dimensional model of the target area, for example, scan the focus area of the patient by means of preoperative CT, MRI, etc. to determine the morphology and position of organs and tissues of the focus area and the boundaries of each tissue structure, thereby completing the construction of the three-dimensional model of the target area. The specific process of constructing the three-dimensional model of the target area can be set according to the actual application requirements, and is not limited.
The initial image is an image directly captured by a camera. In general, an image directly captured by a camera is a planar image, that is, although each object observed at a capture point is present in the image, the spatial position of each object cannot be directly determined from the image. If eye tracking for the user is implemented based on the planar image, only the gaze direction of the user can be determined, but the spatial position of the actual gaze point cannot be determined, and thus the actual application effect is affected.
Accordingly, a presentation image may be generated based on the target region three-dimensional model and the initial image. The display image can be used for determining the spatial position of each object in the scene by combining the spatial structure of the three-dimensional model of the target area while displaying the scene corresponding to the initial image.
In some implementations, the presentation image can be an AR image (augmented reality image). The augmented reality image is an image obtained by combining the real scene and the virtual information, and the virtual information is enhanced through the real scene information, so that the display effect of the virtual information is optimized. In the embodiment of the present disclosure, the display image may be based on a spatial structure in the three-dimensional model of the target area, and the enhancement processing is performed on the initial image, so that the generated display image can correspond to the three-dimensional structure of the lesion area.
The process of generating the display image according to the initial image and the three-dimensional model of the target area may be that the image feature in the initial image and the model feature in the three-dimensional model of the target area are respectively identified first, and then the mapping relationship between the initial image and the three-dimensional model of the target area is determined according to the image feature and the model feature. And after determining registration parameters of the initial image on the three-dimensional model of the target area according to the mapping relation, generating a display image by utilizing the registration parameters and the three-dimensional model of the target area.
The image features and model features are feature points in the initial image and the three-dimensional model of the target region, respectively. Since the initial image corresponds to an image corresponding to the target region acquired from a certain specific view angle, it is necessary to determine a correspondence relationship between the initial image and the target region three-dimensional model before the initial image is enhanced by using the target region three-dimensional model. And after the corresponding relation between each image feature and model feature is determined by acquiring the image feature and the model feature and comparing the image feature and the model feature, the mapping relation between the initial image and the three-dimensional model of the target area is determined. The mapping relation can be used for representing the relative position relation between the initial image and the three-dimensional model of the target area.
In practical applications, in order to simplify the amount of computation, a modeling image may be generated for the target region three-dimensional model, which may be an image generated by capturing the target region three-dimensional model from a specific view angle. By performing a static registration for feature points in the initial image and the modeled image, a positional relationship of the initial image and the modeled image may be determined. And determining the position corresponding relation between the initial image and the three-dimensional model of the target area by combining the coordinate system of the modeling image relative to the three-dimensional model of the target area.
The registration parameters are parameters for correcting the three-dimensional model of the target region. On the one hand, because different view angle relations and mapping relations exist between the three-dimensional model of the target area and the initial image, the generation of the display image needs to be realized based on a coordinate system corresponding to the three-dimensional model of the target area and the initial image. On the other hand, since the three-dimensional model of the target region and the initial image are generated at different times, and the target region is in the patient, wherein organs, tissues and the like may dynamically change with time, in the case that the display image is an augmented reality image generated based on the three-dimensional model of the target region, in order to ensure timeliness of the display image, dynamic registration of the three-dimensional model of the target region needs to be performed based on the dynamic change condition of the current target region.
Thus, in some embodiments, the registration parameters may include static correspondence parameters and dynamically changing parameters. Accordingly, the process of determining the registration parameters may be: based on the three-dimensional model of the target area and the mapping relation, determining a static area and a dynamic area in the target image, determining static corresponding parameters for the static area and the three-dimensional model of the target area, and determining dynamic transformation parameters for the dynamic area and the three-dimensional model of the target area.
The static area and the dynamic area are respectively an area which is determined to be unchanged and an area which is determined to be changed after the initial image and the target area three-dimensional model are compared. The static correspondence parameter and the dynamic transformation parameter are used to describe the above-described region where no change is generated and the region where a change is generated, respectively.
After the registration parameters are acquired, the augmented reality image can be constructed according to the registration parameters and by combining the three-dimensional model of the target area.
In some embodiments, the manner of constructing the augmented reality image may be to determine the virtual scene image corresponding to the three-dimensional model of the target region. The process of determining the virtual scene image may be to determine parameters such as a viewing angle and a distance of the virtual scene image according to a spatial coordinate system corresponding to the initial image and according to parameters of the corresponding viewing angle and distance, and specifically may be to determine the virtual scene image having a corresponding relationship with the three-dimensional model of the target area through static corresponding parameters. And then, determining a virtual dynamic image area corresponding to the dynamic area in the virtual scene image, and dynamically compensating the virtual dynamic image area by utilizing dynamic change parameters to obtain a final augmented reality image, namely a display image. The dynamic compensation process enables the display image to be more in line with the display effect of the actual scene, the timeliness of the display image is guaranteed, and then the operation execution effect is optimized.
When the three-dimensional model of the image and the target area is aligned, specifically, the three-dimensional model can be processed by adopting methods such as an optical flow method, feature matching tracking and the like so as to compensate the generated display image and ensure the display effect of the display image. The specific process may be set based on the needs of the actual application, and is not limited herein.
In some embodiments, in order to optimize the display effect of the augmented reality image, the image may be further processed in a manner of video overlapping. Specifically, the display images of a plurality of continuous picture frames can be registered, and then the frame rate corresponding to the images is adjusted, so that the dynamic effect of video superposition is presented. Through the real-time dynamic display of the display image, a user can grasp the dynamic change condition of the operation area, and the understanding of the user on the real environment of the target area is enhanced.
As shown in fig. 6, a schematic view of a scene for generating the display image is shown, in which a real-time image captured by a cavity mirror and a virtual image generated by a computing device are input into a digital synthesizer to obtain a corresponding AR image, and then displayed on an AR display device, so as to ensure a display effect provided for a user and ensure subsequent determination of a spatial gaze point of the user.
S420: and acquiring the sight line parameters when the user watches the display image.
Based on the execution in step S410, after the presentation image is generated, the presentation image may be presented to the user. Specifically, based on the foregoing description, the display image may be directly displayed on the display device, so that the user may view the current state of the target area to perform the corresponding operation. In combination with the aforementioned minimally invasive surgery scenario, the display image may be displayed on a display of the doctor-side control device, or may be displayed on an image dolly, which is not limited thereto.
When a user views the display image, the line-of-sight parameter of the user when the user views the display image can be acquired. The line-of-sight parameter is used to represent information such as a viewing direction, a focus of attention, and the like, which are represented by the eyeballs of the user. Since the eyeball needs to be rotated when the user shifts the line of sight, the position of the pupil center in the photographed eyeball image may change. And calculating the coordinate position of the user for watching the screen according to the pupil coordinates.
Specifically, the line-of-sight parameter may be acquired based on an eyeball photographing device in the image acquisition system. As shown in fig. 7, since the pupil center position and the cornea reflection center position of the eyeball itself are changed when the eyeball looks at different directions, the corresponding parameters can be obtained by photographing with the eyeball photographing device. The description of the eyeball photographing device may refer to the foregoing description, and will not be repeated here.
In some embodiments, the eyeball image of the user may be captured using the eyeball-capturing device, and the pupil center position and the cornea reflection center position in the eyeball image may be determined as the line-of-sight parameters. In the case where the position and angle of view of the eyeball photographing device itself are fixed, a coordinate system may be constructed based on the photographing field of view of the eyeball photographing device, and the pupil center position and the cornea reflection center position may be determined from the positions of the pupil and the cornea in the coordinate system.
The above embodiments are described by way of example only, and other types of parameters may be set as line-of-sight parameters according to requirements in practical applications, such as pupil dilation, etc., which are not limited.
S430: and determining the three-dimensional space position of the gaze point of the user according to the sight line parameters.
After the sight line parameters of the user are acquired, the three-dimensional space position of the gaze point of the user can be determined according to the sight line parameters. The sight line parameters are used for determining the three-dimensional space position of the gaze point of the user, when the user changes the gaze point, the pupil position and other sight line parameters can be changed along with the change, and the three-dimensional space position of the gaze point of the user can be effectively determined according to the corresponding relation between the sight line parameters and the actual gaze point.
The gaze point three-dimensional spatial location is where the user's gaze point corresponds to a spatial location in the surgical field after the surgical field. Because the display image displayed by the display device is directly observed by the user, the image fixation position of the user on the display image can be determined, and then the image fixation position is converted into the fixation point three-dimensional space position according to the corresponding relation between the display image and the target area three-dimensional model.
Therefore, it is necessary to determine the image fixation position corresponding to the user first. In some embodiments, the conversion parameters corresponding to the user may be obtained first. The transition parameter is used to describe a transition relationship between the eye state and the gaze location of the user. The conversion parameters of different users may differ, and thus it is necessary to collect the conversion parameters of the users before executing the scheme. Specifically, the pupil position and the gazing position of the user may be calibrated in advance, and then the conversion parameter may be calculated according to the calibrated position.
Pupil position coordinates may then be determined based on the pupil center position and the corneal reflection center position in the gaze parameter. The pupil position coordinates may be coordinates corresponding to a coordinate system constructed based on the field of view of the eyeball photographing device. After the pupil position coordinates are obtained, the pupil position coordinates can be converted into image fixation positions by using conversion parameters, and then the image fixation positions of the user in the display images are obtained.
Describing a specific example, assuming that coordinates of an image fixation position are (x, y), coordinates of a pupil position are (x 1 ,y 1 ) Then the formula x=a can be passed 0 +a 1 ×x 1 +a 2 ×y 1 +a 3 ×x 1 ×y 1 +a 4 ×x 1 2 +a 5 ×y 1 2 And formula y=b 0 +b 1 ×x 1 +b 2 ×y 1 +b 3 ×x 1 ×y 1 +b 4 ×x 1 2 +b 5 ×y 1 2 Realizing conversion between the image fixation position and the pupil position, wherein a 0 、a 1 、a 2 、a 3 、a 4 、a 5 、b 0 、b 1 、b 2 、b 3 、b 4 、b 5 Respectively preset conversion parameters.
Because different users correspond to different conversion parameters, the identity of the user needs to be determined before the technical scheme is realized, and the conversion parameters corresponding to the current user are acquired. The user identity may be determined by the user actively entering identity information or by the device itself.
Preferably, the iris of the user can be identified to determine the identity information of the user, and the conversion parameters corresponding to the identity information are acquired. The iris recognition and the eyeball image shooting can share an eyeball shooting device in hardware, so that the utilization rate of the equipment is improved. After the eyeball image is shot, the shot eyeball image can be sent to a processor, the processor can match iris information corresponding to the eyeball image with a prestored iris template under the condition that the identity of a user is not determined, and after the matched iris template is determined, user identity information and conversion parameters corresponding to the iris template are acquired, so that the steps can be directly executed to determine the image fixation position of the user.
Since the image fixation position can only reflect the fixation point of a certain eyeball of the user on the display screen, and the display screen is a two-dimensional plane, the fixation point fixation position of the user is difficult to directly determine, and therefore, the fixation point three-dimensional space position needs to be determined according to the image fixation position.
In some implementations, the spatial point gaze location of the user may be determined based on binocular vision principles. Because the displayed image is an AR image, when a user views the image, eyes can focus at a position different from the plane of the display screen due to the display effect of the image, namely, the corresponding spatial position of the three-dimensional model corresponding to the displayed image. The gaze point three-dimensional spatial location may thus be determined by acquiring the left eye gaze location and the right eye gaze location of the user.
Specifically, the left eye gaze location and the right eye gaze location of the user may be determined based on the gaze parameters, respectively. The specific calculation method may refer to the foregoing process of obtaining the image fixation position, which is not described herein.
The gaze point spatial coordinates corresponding to the user may be determined by integrating the left eye gaze location and the right eye gaze location. Because a certain eyeball distance exists between the left eye and the right eye, the left eye and the right eye can have different gazing directions when gazing at the same space gazing point. In the embodiment of the present disclosure, when the augmented reality image is displayed, the user has different eyeball focusing depths based on the display effect of the image, and accordingly, the gaze point of the user in the target three-dimensional model corresponding to the display image can be determined based on the gaze parameters of the left eye and the right eye, so as to determine the corresponding spatial position.
Let the same gaze point correspond to a left eye gaze location coordinate (x l ,y l ) The coordinates corresponding to the right eye gaze location are (x r ,y r ) The following formula can be deduced by combining the spatial diagram in fig. 8Where z is the distance between the user and the gaze point of the image space, f is the focal length of the eye, and b is the left-right eye distance. Based on the above formula, the +.A can be obtained after transformation>I.e. determining the three-dimensional spatial position coordinates of the point of regard in the image coordinate system c P(x,y,z)。
S440: the camera is controlled to move to the target space position and the target image is shot.
After the three-dimensional space position of the fixation point is determined, the photographing device can be controlled to move to the target space position. Since the gaze point three-dimensional space position is already based on the coordinates of the coordinate system established by the camera, the target space position to be moved is determined directly from the gaze point three-dimensional space position and the current position of the camera. Specifically, the target spatial position may be a position where the camera is located when a center point of a visual field of the camera is the three-dimensional spatial position of the gaze point.
In some embodiments, since the three-dimensional spatial position of the gaze point is used to reflect the position of the center of view of the camera, the center of view of the camera and the camera itself, as well as the relative position between the camera and the robotic arm, are substantially fixed, and thus the camera can be moved by a fixed spatial positional relationship.
As shown in fig. 9, the AR image coordinate system C is a coordinate system corresponding to the display image, and the actual gaze point coordinate under the AR image coordinate system C is c P (x, y, z), the robot coordinate system S is a coordinate system corresponding to the mechanical arm and the camera part, and the gaze point coordinates under the robot coordinate system S are for the same eyeball gaze point s P (x, y, z). Based on specific analysis and calculation, the conversion relation between two coordinate systems can be obtained s P= s Tc c P, according to the conversion relation s Tc can directly determine the spatial position to which the camera needs to be moved, and thus the specific direction and distance of movement of the camera.
The specific manner of controlling the camera to move to the target space position may be to generate a corresponding control instruction based on a vector distance between the three-dimensional space position of the gaze point and the current position of the camera. The control instruction is used for indicating the moving direction and the moving distance of the photographic device. And sending a control instruction to the corresponding motion module, wherein the motion module and the photographic device are mutually fixed. For example, based on the example of the scenario in fig. 3, in the case where the camera device is an endoscope, the movement module may be a mechanical arm that grips the endoscope. The mechanical arm can move based on the motion module, and then the camera device is driven to move to the target space position.
In an actual application scene, as the user cannot ensure to watch at the position to be focused all the time, namely, the situation of line of sight deflection and the like can occur in the operation process, if the position of the camera device is adjusted blindly according to the line of sight variation situation of the user, the normal operation process can be obviously influenced.
Thus, in some embodiments, historical gaze location data may also be acquired prior to controlling the camera to move to the target spatial location. The historical gaze location data is image gaze point data of the user for a period of time having a preset period of time with the acquisition time of the currently targeted image gaze location as the end point. The amount of historical gaze location data may be determined based on the acquisition frequency. By way of a specific example, assuming that the acquisition time of the image fixation position, that is, the acquisition time of the line of sight parameter of the user is 9 hours 5 minutes 17 seconds and the preset time period is 3 seconds, data corresponding to all the image fixation positions of the user within 9 hours 5 minutes 14 seconds to 9 hours 5 minutes 17 seconds can be acquired as the historical fixation position data. Specifically, when the acquisition frequency is 10 times per second, data of 30 image fixation positions can be acquired.
When the historical gaze location data is acquired, a duration of an image gaze location for which the user is gazing currently may be determined from the historical gaze location data. The duration may be a period corresponding to a fixation position determined to be the same as the image fixation position based on the time instants of the respective fixation positions in the historical fixation position data. For example, based on the foregoing example, in a preset period of 3 seconds, the user gazes at the position a for the first 1 second, and gazes at the position B for the second two seconds, and the duration is 2 seconds when the image gazes at the position B, which is determined to be the position B, during the execution of the present method.
After the duration, the duration may be compared with a gaze duration threshold, and in the event that the duration is not less than the gaze duration threshold, the camera is controlled to move to the target spatial location and capture an image. When the duration is not less than the gazing duration threshold, the user is shown to be gazing at a certain position intentionally instead of changing the sight line to the position at will, and then the photographing device can be controlled to move and photograph, so that the corresponding requirement of the user is met.
By determining the duration of the user's gaze image gaze location, the effectiveness of the target space location to be moved can be effectively ensured, and the user experience is optimized.
In practical applications, it may also be determined whether the user is looking at a certain position by combining other parameters, so that there is a need to change the gaze point of the camera, for example, the gaze movement speed of the user may be determined by combining historical gaze position data, and whether the camera needs to be controlled to move to the target spatial position may be determined based on the magnitude of the gaze movement speed. The specific other set judgment modes can be adjusted based on the actual application requirements, and are not described herein.
After the target image is obtained through shooting, in order to enable the target image to obtain the same display effect as that of the display image, the target image and the target area three-dimensional model can be combined to obtain the augmented reality image.
The process of generating the display image according to the target image and the target region three-dimensional model may be that the image feature in the target image and the model feature in the target region three-dimensional model are respectively identified first, and then the mapping relationship between the target image and the target region three-dimensional model is determined according to the image feature and the model feature. And after determining registration parameters of the target image to the target region three-dimensional model according to the mapping relation, generating a display image by utilizing the registration parameters and the target region three-dimensional model.
The image features and model features are feature points in the three-dimensional model of the target image and target region, respectively. Since the target image corresponds to an image corresponding to the target region acquired from a certain specific view angle, it is necessary to determine a correspondence relationship between the target image and the target region three-dimensional model before the target image is enhanced by using the target region three-dimensional model. And the mapping relation between the target image and the three-dimensional model of the target area is determined after the corresponding relation between each image feature and the model feature is determined by acquiring the image feature and the model feature and comparing the image feature and the model feature. The mapping relation can be used for representing the relative position relation between the target image and the target region three-dimensional model.
In practical applications, in order to simplify the amount of computation, a modeling image may be generated for the target region three-dimensional model, which may be an image generated by capturing the target region three-dimensional model from a specific view angle. By performing static registration on feature points in the target image and the modeling image, the positional relationship of the target image and the modeling image can be determined. And determining the position corresponding relation between the target image and the target region three-dimensional model by combining the coordinate system of the modeling image relative to the target region three-dimensional model.
The registration parameters are parameters for correcting the three-dimensional model of the target region. On the one hand, because different view angle relations and mapping relations exist between the three-dimensional model of the target area and the target image, the generation of the display image needs to be realized based on the coordinate system corresponding to the three-dimensional model of the target area and the target image. On the other hand, since the three-dimensional model of the target region and the target image are generated at different times, and the target region is in the patient, wherein organs, tissues and the like may dynamically change with time, in the case that the display image is an augmented reality image generated based on the three-dimensional model of the target region, in order to ensure timeliness of the display image, dynamic registration of the three-dimensional model of the target region needs to be performed based on the dynamic change condition of the current target region.
Thus, in some embodiments, the registration parameters may include static correspondence parameters and dynamically changing parameters. Accordingly, the process of determining the registration parameters may be: based on the three-dimensional model of the target area and the mapping relation, determining a static area and a dynamic area in the target image, determining static corresponding parameters for the static area and the three-dimensional model of the target area, and determining dynamic transformation parameters for the dynamic area and the three-dimensional model of the target area.
The static area and the dynamic area are respectively an area which is determined to be unchanged and an area which is determined to be changed after the target image and the target area three-dimensional model are compared. The static correspondence parameter and the dynamic transformation parameter are used to describe the above-described region where no change is generated and the region where a change is generated, respectively.
After the registration parameters are acquired, the augmented reality image can be constructed according to the registration parameters and by combining the three-dimensional model of the target area.
In some embodiments, the manner of constructing the augmented reality image may be to determine the virtual scene image corresponding to the three-dimensional model of the target region. The process of determining the virtual scene image may be to determine parameters such as a viewing angle and a distance of the virtual scene image according to a spatial coordinate system corresponding to the target image and according to parameters of the corresponding viewing angle and distance, and specifically may be to determine the virtual scene image having a corresponding relationship with the three-dimensional model of the target area through static corresponding parameters. And then, determining a virtual dynamic image area corresponding to the dynamic area in the virtual scene image, and dynamically compensating the virtual dynamic image area by utilizing dynamic change parameters to obtain a final augmented reality image, namely a display image. The dynamic compensation process enables the display image to be more in line with the display effect of the actual scene, the timeliness of the display image is guaranteed, and then the operation execution effect is optimized.
When the three-dimensional model of the image and the target area is aligned, specifically, the three-dimensional model can be processed by adopting methods such as an optical flow method, feature matching tracking and the like so as to compensate the generated display image and ensure the display effect of the display image. The specific process may be set based on the needs of the actual application, and is not limited herein.
In some embodiments, in order to optimize the display effect of the augmented reality image, the image may be further processed in a manner of video overlapping. Specifically, the display images of a plurality of continuous picture frames can be registered, and then the frame rate corresponding to the images is adjusted, so that the dynamic effect of video superposition is presented. Through the real-time dynamic display of the display image, a user can grasp the dynamic change condition of the operation area, and the understanding of the user on the real environment of the target area is enhanced.
The process of generating the augmented reality image based on the target image is consistent with the process of generating the display image based on the initial image, namely, in an actual execution environment, a circulation process of acquiring the image shot by the camera device and generating the augmented reality image is always executed, so that the watching effect of a user is ensured.
In some embodiments, after the line-of-sight parameter of the user is acquired, at least one of an eye movement heat point map, a line-of-sight trajectory map, and a region-of-interest map may also be generated based on the line-of-sight parameter.
The eye movement heat point diagram can be an image obtained by counting the gazing frequency of different points in a display image of a user and marking the image with colors with different depths based on the gazing frequency of the different points. The eye movement heat point diagram can intuitively embody which areas in the display image are important focusing areas of a user.
The sight line track diagram can be an image obtained by drawing a corresponding movement track according to the movement condition of the sight line of the user and marking the movement track in the display image. The sight line movement condition of the user can be directly known through the sight line track diagram.
The interest area map may determine an area with longer user's gaze time in the display image according to the gaze time of the user at different positions, and use the determined area as the interest area, and mark the interest area in the display image. The regions in the display image can be directly known through the region of interest map, and the regions are regions with longer user fixation time and more interest.
After the eye movement heat point diagram, the sight line track diagram and the region of interest diagram are generated, the eye movement heat point diagram, the sight line track diagram and the region of interest diagram can be sent to a display device for display. The display device may be a display independent of other display devices, for example, based on the scene example corresponding to fig. 3, the display device may be a display disposed on the image trolley, so that other medical staff can know the vision fluctuation condition and the vision focusing condition of the user of the doctor-side control device.
Fig. 10 is a schematic diagram showing a corresponding image based on a display device, where a user may view a corresponding eye movement hot spot and eye movement track based on the display device, and may trace back and forth a fixation area by dragging a time axis, and acquire a specific eye movement data analysis, so as to better complete the execution of an operation. Other gaze point analysis data may be displayed in practical applications, and are not limited to the above examples, and are not described herein.
In addition, the image acquisition method based on eye tracking can be turned on and off based on the requirements of users, and the specific switching process can be realized by means of manual switching keys, pedal clutch and the like. The specific application process is not described here in detail.
Based on the description of the above embodiment, it can be seen that, by acquiring the line of sight parameter when the user views the display image, the method determines the three-dimensional space position of the gaze point of the user, and further controls the camera to move to the target space position to shoot the target image, so as to realize tracking of the gaze point of the user in the three-dimensional space. By the method, the display effect of the display image is optimized, so that a user can more intuitively determine the spatial position of the operation area, the gaze point tracking in the three-dimensional space is realized, the camera device can be controlled to perform spatial displacement in the three-dimensional space and shoot the image, and the method is favorable for the operation in practical application.
Based on the image acquisition method corresponding to fig. 4, the embodiment of the present disclosure further provides an image acquisition method. The execution subject of the image acquisition method is a photographic device. As shown in fig. 11, the image acquisition method includes the following specific implementation steps.
S1110: receiving a control instruction; the control instruction comprises an instruction which is generated after the sight line parameter of a user watching the display image is obtained, the three-dimensional space position of the gaze point of the user is determined according to the sight line parameter, and the target space position is determined according to the three-dimensional space position of the gaze point; the target space position is a position where the camera is located when a center point of a visual field of the camera coincides with the gaze point three-dimensional space position.
The description of this step may refer to the descriptions in steps S410, S420, and S430, and will not be repeated here.
S1120: and moving to the target space position based on the control instruction and shooting an image.
The description of this step may be referred to in step S440, and will not be repeated here.
Based on the image acquisition method corresponding to fig. 4, the present embodiment provides a computer-readable storage medium having a computer program/instructions stored thereon. The computer readable storage medium may be read by a processor based on an internal bus of a device, and program instructions in the computer readable storage medium are implemented by the processor.
In this embodiment, the computer-readable storage medium may be implemented in any suitable manner. The computer readable storage medium includes, but is not limited to, random access Memory (Random Access Memory, RAM), read-Only Memory (ROM), cache (Cache), hard Disk (HDD), memory Card (Memory Card), and the like. The computer storage medium stores computer program instructions. Program instructions or modules of the embodiments of fig. 4 of the present description are implemented when the computer program instructions are executed.
While the process flows described above include a plurality of operations occurring in a particular order, it should be apparent that the processes may include more or fewer operations, which may be performed sequentially or in parallel (e.g., using a parallel processor or a multi-threaded environment).
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the specification. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.
The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.
Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.
It will be appreciated by those skilled in the art that embodiments of the present description may be provided as a method, system, or computer program product. Accordingly, the present specification embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present description embodiments may take the form of a computer program product on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
The present embodiments may be described in the general context of computer-executable instructions, such as program modules, being executed by a computer. Generally, program modules include routines, programs, objects, components, data structures, etc. that perform particular tasks or implement particular abstract data types. The embodiments of the specification may also be practiced in distributed computing environments where tasks are performed by remote processing devices that are linked through a communications network. In a distributed computing environment, program modules may be located in both local and remote computer storage media including memory storage devices.
In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for system embodiments, since they are substantially similar to method embodiments, the description is relatively simple, as relevant to see a section of the description of method embodiments. In the description of the present specification, a description referring to terms "one embodiment," "some embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the embodiments of the present specification. In this specification, schematic representations of the above terms are not necessarily directed to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Furthermore, the different embodiments or examples described in this specification and the features of the different embodiments or examples may be combined and combined by those skilled in the art without contradiction.
The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.