[go: up one dir, main page]

WO2024098240A1 - Système et procédé de navigation de reconstruction visuelle d'endoscopie gastro-intestinale - Google Patents

Système et procédé de navigation de reconstruction visuelle d'endoscopie gastro-intestinale Download PDF

Info

Publication number
WO2024098240A1
WO2024098240A1 PCT/CN2022/130535 CN2022130535W WO2024098240A1 WO 2024098240 A1 WO2024098240 A1 WO 2024098240A1 CN 2022130535 W CN2022130535 W CN 2022130535W WO 2024098240 A1 WO2024098240 A1 WO 2024098240A1
Authority
WO
WIPO (PCT)
Prior art keywords
map
topological
module
network
navigation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/CN2022/130535
Other languages
English (en)
Chinese (zh)
Inventor
熊璟
谭敏
夏泽洋
谢高生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to PCT/CN2022/130535 priority Critical patent/WO2024098240A1/fr
Publication of WO2024098240A1 publication Critical patent/WO2024098240A1/fr
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A61MEDICAL OR VETERINARY SCIENCE; HYGIENE
    • A61BDIAGNOSIS; SURGERY; IDENTIFICATION
    • A61B1/00Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor
    • A61B1/31Instruments for performing medical examinations of the interior of cavities or tubes of the body by visual or photographical inspection, e.g. endoscopes; Illuminating arrangements therefor for the rectum, e.g. proctoscopes, sigmoidoscopes, colonoscopes
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods

Definitions

  • the embodiments of the present application relate to the field of medical image processing technology, and in particular to a digestive endoscopy visual reconstruction navigation system and method.
  • Colorectal cancer is the most common digestive system cancer in my country. Colonoscopy is the best way to detect malignant polyps. During colonoscopy, doctors rely on their clinical experience to observe endoscopic images while operating the colonoscope's control handle to move forward. However, the image areas inside the tissues obtained by digestive endoscopy have weak textures, many repeated textures, and large changes in scene lighting. In addition, camera movement will also produce motion blur, making it difficult to extract image features. Therefore, when an endoscopic image has a "no information frame", the cavity will be lost, and the correct direction cannot be identified.
  • the embodiments of the present application provide a digestive endoscope visual reconstruction navigation system and method to solve the problem that feature point extraction of endoscopic images under low light and little texture is difficult and the direction cannot be accurately identified.
  • an embodiment of the present application provides a digestive endoscopy visual reconstruction navigation system, comprising: a data acquisition module, a map construction module and a path planning module connected in sequence; the data acquisition module is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module; the map construction module is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and the image depth data; and according to the optical flow self-supervised network and the improved residual network, perform camera pose estimation and depth map estimation respectively to construct an environment map; the path planning module is used to extract a topological center line according to the environment map, and perform path planning and navigation around the topological center line.
  • the map construction module includes a camera pose estimation module and a depth map estimation module; the camera pose estimation module is used to obtain an estimated camera pose based on the optical flow self-supervised network; the depth map estimation module is used to obtain an estimated endoscopic image depth based on the improved residual network.
  • the map construction module constructs an environment map through three-dimensional reconstruction according to the estimated camera pose and the estimated endoscopic image depth.
  • the path planning module includes a topological centerline acquisition module and a navigation module; the topological centerline acquisition module is used to obtain the topological centerline of the intestinal cavity in combination with the pipeline characteristics of the intestinal cavity; the navigation module is used to extract the topological centerline and perform path planning and navigation around the topological centerline.
  • an embodiment of the present application also provides a digestive endoscope visual reconstruction and navigation method, which uses the above-mentioned digestive endoscope visual reconstruction and navigation system for navigation, including the following steps: obtaining virtual camera pose data and image depth data; building an optical flow self-supervised network based on the virtual camera pose data, and obtaining an estimated camera pose based on the optical flow self-supervised network; building an improved residual network based on the image depth data, and obtaining an estimated endoscopic image depth based on the improved residual network; constructing an environmental map based on the estimated camera pose and the estimated endoscopic image depth; extracting a topological centerline based on the environmental map, and performing path planning and navigation around the topological centerline.
  • the estimated camera pose is obtained based on the optical flow self-supervised network, including: taking at least two pictures as input, performing network training, and obtaining a feature descriptor corresponding to each picture; matching the feature descriptors according to a sorting rule to obtain corresponding pixel points between different pictures; constructing a confidence score loss function to extract feature points from the pixel points; and obtaining an estimated camera pose based on the feature points and the geometric relationship between different pictures.
  • two images are used as input to perform network training to obtain two feature descriptors; the feature descriptors of the two images are matched according to a sorting rule to obtain corresponding pixel points between the two images; the confidence score loss function is shown in formula (1):
  • the feature points are extracted through the loss function of average precision.
  • the loss function of average precision is shown in formula (2):
  • ( xi , y) and ( xi , y') are the position coordinates of the corresponding pixel points in the two images with overlapping areas.
  • an estimated endoscopic image depth is obtained through convolution and batch normalization processing;
  • the improved residual network includes an encoder module and a decoder module, and the decoder module uses a convolution block with an activation function and a loss function for decoding;
  • the activation function is an exponential linear unit function, as shown in formula (3):
  • the loss function includes a first loss function, a second loss function and a third loss function; the first loss function is shown in formula (4):
  • D i (p) represents the real depth value image
  • D i '(p) represents the predicted depth map
  • h i log D i '(p) - log D i (p)
  • T represents the number of valid values left after filtering, p ⁇ T;
  • the second loss function is shown in formula (5):
  • l i (p) represents the color image
  • It means taking the derivatives of the color image and the depth image in the x and y directions to obtain the gradient images of the color image and the depth image.
  • a topological centerline is extracted based on the environmental map, and path planning and navigation are performed around the topological centerline, including: obtaining the topological centerline of the intestinal cavity based on the pipeline characteristics of the intestinal cavity; constructing a topological map in the movable cavity in the intestinal cavity based on the topological centerline; and performing path planning from the current position of the camera to the target position based on the topological map.
  • a topological map is constructed in a movable cavity in the intestinal cavity based on the topological centerline, including: traversing all voxels in the free space in the metric map; comparing the parent direction of each voxel with the parent direction of the voxel adjacent to it; the parent direction is the direction from the current voxel to the nearest occupied point voxel; based on the angle of the topological centerline, the voxels are filtered and the key points are retained as nodes of the topological map; the nodes are connected to obtain a topological map.
  • the embodiment of the present application mainly aims at the problem that it is difficult to extract feature points and cannot accurately identify directions of endoscopic images under low light and low texture characteristics, and proposes a digestive endoscope visual reconstruction navigation system and method, the system includes: a data acquisition module, a map construction module and a path planning module; the data acquisition module is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module; the map construction module is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and image depth data; and according to the optical flow self-supervised network and the improved residual network, the camera pose estimation and depth map estimation are performed respectively to construct an environment map; the path planning module is used to extract the topological center line according to the environment map, and perform path planning and navigation around the topological center line.
  • the digestive endoscopy visual reconstruction navigation system and method provided in the embodiment of the present application can globally perceive the endoscope environment, and the visual reconstruction can record the historical trajectory of the endoscope.
  • the feature point network based on optical flow self-supervision constructed in the present application is more suitable for the characteristics of weak texture and smooth surface of endoscopic images, and can solve the problem that the feature points of endoscopic images under low light and few textures are difficult to extract.
  • the embodiment of the present application has built a data acquisition module to solve the problem that clinical images do not have true value labels. Therefore, the present application does not require pose true value labels for network training. The labels are only used to calculate accuracy indicators and errors in the verification stage.
  • FIG1 is a schematic diagram of the structure of a visual reconstruction navigation system for digestive endoscopy provided in one embodiment of the present application;
  • FIG2 is a schematic diagram of the framework structure of a digestive endoscope visual reconstruction navigation system provided in one embodiment of the present application;
  • FIG3 is a schematic diagram of a process of data acquisition performed on a virtual colon simulation platform by a data acquisition module provided in an embodiment of the present application;
  • FIG4 is a flow chart of a digestive endoscopy visual reconstruction navigation method provided in one embodiment of the present application.
  • FIG5 is a schematic diagram of a process of estimating a camera pose based on an optical flow self-supervisory network according to an embodiment of the present application
  • FIG6 is a schematic diagram of a process for estimating the depth of an endoscopic image based on an improved residual network provided in an embodiment of the present application.
  • the existing digestive endoscopy visual navigation method has the problem that it is difficult to extract feature points of endoscopic images under low light and little texture, and it is impossible to accurately identify the direction.
  • the existing visual navigation methods of digestive endoscopy are divided into traditional image processing algorithms and deep learning related algorithms.
  • the traditional image processing algorithm uses the significant contours and dark areas of the intestinal cavity, which are specifically divided into dark area extraction method, contour recognition method, etc.
  • the existing technology often combines the two as the basis for navigation. Since the endoscope advances in the closed intestinal cavity and the light is from far to near, the dark area is the most important and most significant feature for doctors to judge the direction of advancement. In addition, there are usually obvious muscle rings inside the colon. When the cavity is clearly visible, the semi-closed muscle curve shape of the intestine can be seen.
  • the contour recognition method based on the structural characteristics of the colon itself uses the direction of the curvature radius as the deepest part of the intestine for navigation.
  • Deep learning related algorithms in order to map the environment and locate the position of the robot, require the algorithm to estimate the camera pose and depth map based on the input image stream.
  • the pose of the camera is equivalent to the transformation from the world coordinate system to the camera coordinate system, which is also called external parameters in three-dimensional vision.
  • an internal parameter matrix related to the properties of the camera itself is also required.
  • the camera pose estimation is also known as the front end in the SLAM (Simultaneous Mapping and Localization) framework, called visual odometry. After obtaining the camera pose, if the pixel depth corresponding to the color image frame can be obtained, a map of the environment can be reconstructed.
  • SLAM Simultaneous Mapping and Localization
  • This self-constraint is that in the process of transforming from the pixel coordinate system of the imaging to the camera coordinate system and finally to the world coordinate system, the three-dimensional point position can be restored from the image if the depth map and the camera pose are known, satisfying the geometric consistency constraint.
  • the embodiment of the present application provides a digestive endoscope visual reconstruction navigation system and method
  • the system includes: a data acquisition module, a map construction module and a path planning module connected in sequence;
  • the data acquisition module is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module;
  • the map construction module is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and image depth data; and according to the optical flow self-supervised network and the improved residual network, the camera pose estimation and depth map estimation are performed respectively to build an environmental map;
  • the path planning module is used to extract the topological centerline according to the environmental map, and perform path planning and navigation around the topological centerline.
  • the digestive endoscope visual reconstruction navigation system provided in the embodiment of the present application, first, a data acquisition module is built to solve the problem that clinical images do not have true value labels; secondly, a pose estimation network and a depth map prediction network are built using a deep network to construct an environmental map; finally, a topological centerline navigation algorithm is used to complete path planning.
  • the digestive endoscopy visual reconstruction navigation system and method provided in the embodiments of the present application can solve the problems of existing digestive endoscopy visual navigation methods, such as difficulty in extracting feature points of endoscopic images under low light and little texture, and inability to accurately identify directions.
  • an embodiment of the present application provides a digestive endoscopy visual reconstruction navigation system, including: a data acquisition module 101, a map construction module 102 and a path planning module 1021 connected in sequence; the data acquisition module 101 is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module 102; the map construction module 102 is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and image depth data; and according to the optical flow self-supervised network and the improved residual network, perform camera pose estimation and depth map estimation respectively, and construct an environment map; the path planning module 103 is used to extract the topological center line according to the environment map, and perform path planning and navigation around the topological center line.
  • this application designs a digestive endoscope visual reconstruction and navigation system based on deep self-supervised deep learning.
  • the system can better assist doctors in navigation, and the constructed environmental map can also assist in diagnosis and record the location of lesions. If the algorithm is provided to a robot, the robot can complete autonomous navigation.
  • the data acquisition module 101 is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module 102. Since there are a lot of blurry situations in the real clinical colon surgery video stream, and the depth map and pose truth labels required for training the deep network are difficult to mark manually, the training data set collection work of the data acquisition module 101 is carried out on the virtual colon simulation platform.
  • Figure 3 shows a flow chart of data acquisition by the data acquisition module 101 on the virtual colon simulation platform.
  • the virtual camera pose and depth data collected by the data acquisition module 101 will be used for the truth labels during deep network training, as well as subsequent related evaluations. As shown in Figure 3, Figure 3 shows the internal simulation environment of the colon collected by the data set acquisition module 101 of the virtual colon simulation platform.
  • the colon model obtained by CT scanning is imported into the virtual simulation platform, and then the environment and highlight rendering can be performed through the simulation platform.
  • the corresponding camera pose and depth image can be collected by writing a custom script. These two parts of data will be used as the truth labels for the subsequent training network, so as to perform verification and other work.
  • the map construction module 102 includes a camera pose estimation module 1021 and a depth map estimation module 1022; the camera pose estimation module 1021 is used to obtain an estimated camera pose based on an optical flow self-supervised network; the depth map estimation module 1022 is used to obtain an estimated endoscopic image depth based on an improved residual network.
  • the map construction module 102 is the core module, and the map construction module 102 can be divided into pose estimation based on the optical flow self-supervised network and monocular endoscopic image depth map estimation based on the improved residual network, wherein the camera pose estimation module 1021 is used to realize pose estimation based on the optical flow self-supervised network; the depth map estimation module 1022 is used to realize monocular endoscopic image depth map estimation based on the improved residual network; based on the camera pose and intestinal cavity depth map output by the two networks, the environmental map can be reconstructed, and the path planning and navigation algorithms can be developed on this basis.
  • the main steps of the map construction module 102 to construct the environment map are as follows: (1) Build an optical flow self-supervised network and an improved residual network, which are used for camera pose estimation and depth map estimation respectively, and train them. (2) Use the optical flow self-supervised network to obtain the estimated camera pose. (3) Use the monocular endoscopic image depth map estimation network based on the improved residual network to obtain the estimated endoscopic image depth. (4) Use the camera pose and endoscopic image depth obtained in steps (2) and (3) to construct the environment map.
  • the map construction module 102 constructs an environment map through three-dimensional reconstruction according to the estimated camera pose and the estimated endoscopic image depth.
  • the path planning module 103 includes a topological centerline acquisition module 1031 and a navigation module 1032; the topological centerline acquisition module 1031 is used to obtain the topological centerline of the intestinal cavity in combination with the pipeline characteristics of the intestinal cavity; the navigation module 1032 is used to extract the topological centerline, and perform path planning and navigation around the topological centerline. Specifically, the navigation module 1031 constructs a topological map for describing the cavity in the movable cavity according to the topological centerline, and then performs path planning from the current position of the camera to the target position around the topological centerline. It should be noted that the target position is a special position seen during the endoscope advancement stage, and the path planning to the special position is completed during the retreat process; the special position includes one or both of the lesion position and the polyp position.
  • the main function of the path planning module 103 is to directly extract the topological center line of the cavity in combination with the pipeline characteristics of the intestinal cavity, construct a simple topological map for describing the cavity in the movable cavity, and then plan the path from the current position of the camera to the target position (also called the target point) based on the topological map.
  • the target point is defined as a special position such as a lesion or polyp seen during the endoscope's advancement stage, and the path planning to these special positions is completed during the retreat process.
  • the special location may be not only the lesion location and the polyp location, but also other locations with special markings.
  • the embodiment of the present application further provides a digestive endoscope visual reconstruction navigation method, which uses the above-mentioned digestive endoscope visual reconstruction navigation system for navigation, and includes the following steps:
  • Step S1 obtaining virtual camera position data and image depth data.
  • Step S2 Based on the virtual camera pose data, an optical flow self-supervised network is built, and based on the optical flow self-supervised network, an estimated camera pose is obtained; based on the image depth data, an improved residual network is built, and based on the improved residual network, an estimated endoscopic image depth is obtained.
  • Step S3 construct an environment map based on the estimated camera pose and the estimated endoscopic image depth.
  • Step S4 extract the topological center line based on the environment map, and perform path planning and navigation around the topological center line.
  • the digestive endoscope visual reconstruction navigation method provided in this application is also called an endoscope navigation algorithm, which includes: a pose estimation algorithm based on an optical flow self-supervised network, a monocular endoscope image depth map estimation algorithm based on an improved residual network, and a topological centerline path planning and navigation algorithm to perform digestive endoscope visual reconstruction navigation.
  • an endoscope navigation algorithm includes: a pose estimation algorithm based on an optical flow self-supervised network, a monocular endoscope image depth map estimation algorithm based on an improved residual network, and a topological centerline path planning and navigation algorithm to perform digestive endoscope visual reconstruction navigation.
  • step S1 the virtual camera pose data and image depth data are obtained through the data acquisition module 101.
  • the virtual camera pose and depth data collected by the data acquisition module 101 are used as true value labels during deep network training, as well as subsequent related evaluations.
  • step S2 is executed.
  • Step S2 can be divided into two steps: step S201, pose estimation based on optical flow self-supervised network; step S202, monocular endoscope image depth map estimation based on improved residual network.
  • step S201 of posture estimation based on the optical flow self-supervised network explains the specific steps of step S201 of posture estimation based on the optical flow self-supervised network.
  • an estimated camera pose is obtained based on the optical flow self-supervised network, including: taking at least two pictures as input, performing network training, and obtaining a feature descriptor corresponding to each picture; matching the feature descriptors according to a sorting rule to obtain corresponding pixel points between different pictures; constructing a confidence score loss function to extract feature points from the pixel points; and obtaining an estimated camera pose based on the feature points and the geometric relationship between different pictures.
  • two images are used as input to perform network training to obtain two feature descriptors; the feature descriptors of the two images are matched according to a sorting rule to obtain corresponding pixel points between the two images; the confidence score loss function is shown in formula (1):
  • the feature points are extracted through the loss function of average precision.
  • the loss function of average precision is shown in formula (2):
  • the optical flow self-supervised network framework uses optical flow to construct a confidence score loss function to extract more robust feature points. Subsequently, the essential matrix and the epipolar constraint are used to estimate the essential matrix from the two views to inversely solve the pose.
  • the parameters of each node of the optical flow self-supervised network are shown in Figure 5.
  • the network takes two images as input, which are represented by img1 and img2 respectively.
  • Conv convolution
  • ReLU activation function and batch normalization (BN) layer operations two feature descriptors are finally output.
  • the feature descriptors of the two images are matched according to the sorting rules to find the corresponding pixels between the two images (called corresponding points).
  • the matching sorting rules are self-supervised based on the pre-calculated optical flow drive.
  • the additional confidence score can help select more stable and reliable feature points from the corresponding points and filter out those with lower scores.
  • the design of the loss function introduces the optical flow of two additional images, which are generated during the data loading phase.
  • the specific values in the optical flow vector represent the position coordinates (x, y) of each pixel of img1 in img2.
  • the confidence evaluation loss function is shown in formula (1), where the confidence score Rij is 0-1; the larger the Rij , the greater the probability that the feature descriptor is a feature point; k ⁇ [0,1] is a threshold hyperparameter, which is usually set to 0.5 in the network.
  • the loss function of the calculated pixel position coordinate AP(i,j) in the image is less than k, the smaller the Rij .
  • Average Precision is an evaluation index for measuring classification results in multi-label classification. It is used here as a loss function to minimize the matching error between two feature descriptors.
  • the matching of feature description vectors can be modeled as a sorting optimization problem, that is, in two images I and I' with overlapping areas, the distance (such as Euclidean distance) between each feature description vector in image I and the feature description vector in image I' is calculated. After obtaining the distance, sort them from small to large, and the one with the smallest distance is the matching feature.
  • the true value of the label is obtained by sparse sampling of the optical flow in Figure 5, which is equivalent to knowing the matching relationship between the two frames in advance. After the feature points are extracted, the pose estimation needs to be performed using the classic two-view geometric relationship.
  • the present invention proposes a deep learning network based on self-supervised feature point extraction, which directly self-supervises the feature description extracted by the network through information such as optical flow, thereby extracting more robust feature points in the image.
  • This self-supervised learning route can solve the problem of feature point extraction of endoscopic images under low light and low texture characteristics. After extracting the feature points, the feature descriptors of the two images are matched, and finally the camera pose is solved by the traditional multi-view geometric algorithm.
  • the embodiment of the present application adopts a self-supervised optical flow deep network output algorithm to obtain feature points. It is understandable that feature points can also be obtained through other algorithms.
  • the process of solving the feature points of two images can also be replaced by traditional SIFT scale-invariant feature transform (Scale-invariant feature transform, SIFT) feature points or ORB (Oriented FAST and Rotated BRIEF) feature points.
  • SIFT Scale-invariant feature transform
  • ORB Oriented FAST and Rotated BRIEF
  • the self-supervised optical flow deep network used in this application outputs feature points that are more stable than the above-mentioned algorithm, thereby performing pose solution.
  • the monocular endoscopic image depth map prediction process can be replaced by other custom supervised networks.
  • step S2 includes: step S201, pose estimation based on optical flow self-supervised network; step S202, monocular endoscopic image depth map estimation based on improved residual network.
  • step S202 monocular endoscopic image depth map estimation based on improved residual network.
  • an estimated endoscopic image depth is obtained through convolution and batch normalization processing;
  • the improved residual network includes an encoder module and a decoder module, and the decoder module uses a convolution block with an activation function and a loss function for decoding;
  • the activation function is an exponential linear unit function, as shown in formula (3):
  • the loss function includes a first loss function, a second loss function and a third loss function; the first loss function is shown in formula (4):
  • D i (p) represents the real depth value image
  • D i '(p) represents the predicted depth map
  • h i log D i '(p) - log D i (p)
  • T represents the number of valid values left after filtering, p ⁇ T;
  • the second loss function is shown in formula (5):
  • l i (p) represents the color image
  • It means taking the derivatives of the color image and the depth image in the x and y directions to obtain the gradient images of the color image and the depth image.
  • the depth estimation of monocular endoscopic images uses the classic 18-layer residual network (ResNet) for improvement.
  • the deep network architecture based on the improved residual network is shown in Figure 6. It mainly uses convolution combined with batch normalization (BN) to extract features.
  • the deep network is mainly composed of an encoder (Encoder) and a decoder (Decoder).
  • the complete ResNet is used in the encoder part, and the convolution block with the exponential linear unit function (ELU) activation function is directly used in the decoding part.
  • ELU exponential linear unit function
  • the encoding stage downsampling operations are performed in each Basic Block to gradually increase the number of channels of the feature vector, and the average pooling layer (Avg Pool) and the fully connected layer (FN) are always increased to extract a 512-dimensional feature vector.
  • Avg Pool average pooling layer
  • FN fully connected layer
  • a 3*3 convolution block is directly used with the ELU activation function to achieve the purpose of upsampling.
  • the present application designs three loss functions, namely the first loss function, the second loss function and the third loss function, as shown in formula (4), formula (5) and formula (6).
  • T represents the number of valid values left after the validity mask filter, p ⁇ T.
  • Both the first loss function and the second loss function are difference losses used to directly compare the differences between two depth maps.
  • Di (p) represents the true depth value image
  • Di '(p) represents the predicted depth map. Since the depth map predicted by ResNet is too smooth and loses some detailed texture information, the present application improves the smoothness loss in the depth map predicted by ResNet and proposes a third loss function, as shown in formula (6).
  • l i (p) represents the color image
  • It means taking the derivatives of the color RGB image and the depth image in the x and y directions to obtain the gradient images of the color image and the depth image.
  • the basis is that the gradients of pixels at some curved edges in digestive endoscopy images are usually larger and the changes are more drastic.
  • step S3 is executed to construct an environment map through three-dimensional reconstruction according to the estimated camera pose and the estimated endoscope image depth.
  • step S4 is executed to construct a topology map based on the environment map and perform path planning around the topology map.
  • step S4 extracting a topological centerline based on the environment map, and performing path planning and navigation around the topological centerline includes the following steps:
  • Step S401 based on the pipeline characteristics of the intestinal cavity, obtain the topological center line of the intestinal cavity.
  • Step S402 construct a topological map in the traversable cavity in the intestinal cavity based on the topological center line.
  • Step S403 Based on the topological map, a path is planned from the current position of the camera to the target position.
  • step S4 is mainly an operation of path planning and navigation based on the topological centerline.
  • the basis of path planning is the topological centerline of the map.
  • the topological centerline can also be called the skeleton of the 3D generalized Voronoi diagram (GVD).
  • the generation of GVD depends on the Euclidean signed distance function (ESDF) metric map, that is, the ESDF metric map.
  • ESDF Euclidean signed distance function
  • post-processing such as skeleton refinement and pruning, a complete sparse topological map description is obtained as the forward route for navigation.
  • step S402 constructs a topological map in the traversable cavity in the intestinal cavity based on the topological center line, comprising the following steps:
  • Step S4021 traverse all voxels in the free space in the metric map.
  • Step S4022 Compare the parent direction of each voxel with the parent direction of its adjacent voxel; the parent direction is the direction from the current voxel to the nearest occupied point voxel.
  • Step S4023 based on the angle of the topological centerline, filter the voxels, retain the key points as nodes of the topological map; connect the nodes to obtain the topological map.
  • the GVD extraction process includes: first, traversing all voxels in the free space in ESDF; next, comparing their parent directions with the parent directions of the 6 connected neighbors, where the parent direction is defined as the direction from the current voxel to the nearest occupied point voxel; then, according to the pre-set GVD angle, discarding those voxels whose parent directions are too large. Finally, after filtering out the redundant voxels, the key points are left as the nodes of the topological map, and the nodes are connected to obtain the topological map.
  • the digestive endoscopy visual reconstruction navigation system and method provided by the present invention are feasible after experimental simulation.
  • a total of 21,671 colonoscopy image data collected by the virtual platform were used in the training phase, and two sets of virtual data, 400 and 447 respectively, were used in the test phase.
  • the two sets of clinical data, 82 and 109 respectively, were preliminarily verified by experiments to be feasible.
  • the calculated average precision of a1 and a2 of the predicted depth map is 0.7637 and 0.9471 respectively, and the average RMSE of the error is 0.0929, and the units are expressed in the grayscale of the depth value.
  • the above errors are all within the controllable range, and the calculation method of the evaluation indicators is as follows:
  • d * represents the predicted depth map and d represents the true depth map.
  • the present application proposes a digestive endoscope visual reconstruction navigation system and method, which includes: a data acquisition module, a map construction module and a path planning module; the data acquisition module is used to collect virtual camera pose data and image depth data, and send the collected virtual camera pose data and image depth data to the map construction module; the map construction module is used to build an optical flow self-supervised network and an improved residual network according to the virtual camera pose data and image depth data; and according to the optical flow self-supervised network and the improved residual network, the camera pose estimation and depth map estimation are performed respectively to construct an environment map; the path planning module is used to extract the topological center line according to the environment map, and perform path planning and navigation around the topological center line.
  • the digestive endoscope visual reconstruction navigation system and method provided in the embodiments of the present application can globally perceive the endoscopic environment, and the visual reconstruction can record the historical trajectory of the endoscope.
  • the feature point network based on optical flow self-supervision constructed in the present application is more suitable for the characteristics of weak texture and smooth surface of endoscopic images, and can solve the problem of difficulty in extracting feature points of endoscopic images under low light and few textures.
  • the embodiment of the present application has built a data acquisition module to solve the problem of clinical images without true value labels. Therefore, the present application does not require pose true value labels for network training. The labels are only used to calculate accuracy indicators and errors in the verification stage.

Landscapes

  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Engineering & Computer Science (AREA)
  • Surgery (AREA)
  • Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Animal Behavior & Ethology (AREA)
  • Radiology & Medical Imaging (AREA)
  • Optics & Photonics (AREA)
  • Nuclear Medicine, Radiotherapy & Molecular Imaging (AREA)
  • Biophysics (AREA)
  • Heart & Thoracic Surgery (AREA)
  • Medical Informatics (AREA)
  • Molecular Biology (AREA)
  • Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • Veterinary Medicine (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Endoscopes (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un système et un procédé de navigation de reconstruction visuelle d'endoscopie gastro-intestinale. Le système comprend un module d'acquisition de données (101), un module de construction de carte (102) et un module de planification de trajet (103). Le module d'acquisition de données (101) est conçu pour acquérir des données de pose de caméra virtuelle et des données de profondeur d'image et envoyer les données de pose de caméra virtuelle et les données de profondeur d'image acquises au module de construction de carte (102). Le module de construction de carte (102) est conçu pour construire un réseau d'auto-supervision de flux optique et un réseau résiduel amélioré selon les données de pose de caméra virtuelle et les données de profondeur d'image ; et effectuer une estimation de pose de caméra et une estimation de carte de profondeur selon le réseau d'auto-supervision de flux optique et le réseau résiduel amélioré, respectivement, pour construire une carte d'environnement. Le module de planification de trajet (103) est conçu pour extraire une ligne centrale topologique selon la carte d'environnement et effectuer une planification de trajet et une navigation autour de la ligne centrale topologique. Par conséquent, les problèmes de difficulté d'extraction de points caractéristiques et d'incapacité à discerner avec précision la direction dans une image endoscopique dans les caractéristiques de faible éclairage et peu de textures peuvent être résolus.
PCT/CN2022/130535 2022-11-08 2022-11-08 Système et procédé de navigation de reconstruction visuelle d'endoscopie gastro-intestinale Ceased WO2024098240A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/130535 WO2024098240A1 (fr) 2022-11-08 2022-11-08 Système et procédé de navigation de reconstruction visuelle d'endoscopie gastro-intestinale

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/130535 WO2024098240A1 (fr) 2022-11-08 2022-11-08 Système et procédé de navigation de reconstruction visuelle d'endoscopie gastro-intestinale

Publications (1)

Publication Number Publication Date
WO2024098240A1 true WO2024098240A1 (fr) 2024-05-16

Family

ID=91031707

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/130535 Ceased WO2024098240A1 (fr) 2022-11-08 2022-11-08 Système et procédé de navigation de reconstruction visuelle d'endoscopie gastro-intestinale

Country Status (1)

Country Link
WO (1) WO2024098240A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118766589A (zh) * 2024-06-25 2024-10-15 复旦大学 基于场景重建的鼻腔内窥镜手术导航系统
CN118840388A (zh) * 2024-06-20 2024-10-25 广东省科学院智能制造研究所 一种基于视觉传感的胶囊机器人三维运动估计方法
CN119090964A (zh) * 2024-11-07 2024-12-06 合肥工业大学 基于3d高斯溅射的器械6d位姿估计方法和系统
CN119180852A (zh) * 2024-11-22 2024-12-24 泉州装备制造研究所 一种人体肠道稠密深度估计方法
CN119273855A (zh) * 2024-12-10 2025-01-07 之江实验室 一种内窥图像的三维重建方法及装置
CN119679513A (zh) * 2024-12-18 2025-03-25 四川大学 基于多模态信息融合的消化道置管机器人导航方法及系统

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170084027A1 (en) * 2015-09-18 2017-03-23 Auris Surgical Robotics, Inc. Navigation of tubular networks
CN108685560A (zh) * 2017-04-12 2018-10-23 香港生物医学工程有限公司 用于机器人内窥镜的自动化转向系统和方法
CN111325784A (zh) * 2019-11-29 2020-06-23 浙江省北大信息技术高等研究院 一种无监督位姿与深度计算方法及系统
CN111739078A (zh) * 2020-06-15 2020-10-02 大连理工大学 一种基于上下文注意力机制的单目无监督深度估计方法
WO2020242949A1 (fr) * 2019-05-28 2020-12-03 Google Llc Systèmes et procédés de positionnement et de navigation par vidéo dans des interventions gastroentérologiques
CN113450410A (zh) * 2021-06-29 2021-09-28 浙江大学 一种基于对极几何的单目深度和位姿联合估计方法
CN114022527A (zh) * 2021-10-20 2022-02-08 华中科技大学 基于无监督学习的单目内窥镜深度及位姿估计方法及装置
CN114399527A (zh) * 2022-01-04 2022-04-26 北京理工大学 单目内窥镜无监督深度和运动估计的方法及装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170084027A1 (en) * 2015-09-18 2017-03-23 Auris Surgical Robotics, Inc. Navigation of tubular networks
CN108685560A (zh) * 2017-04-12 2018-10-23 香港生物医学工程有限公司 用于机器人内窥镜的自动化转向系统和方法
WO2020242949A1 (fr) * 2019-05-28 2020-12-03 Google Llc Systèmes et procédés de positionnement et de navigation par vidéo dans des interventions gastroentérologiques
CN111325784A (zh) * 2019-11-29 2020-06-23 浙江省北大信息技术高等研究院 一种无监督位姿与深度计算方法及系统
CN111739078A (zh) * 2020-06-15 2020-10-02 大连理工大学 一种基于上下文注意力机制的单目无监督深度估计方法
CN113450410A (zh) * 2021-06-29 2021-09-28 浙江大学 一种基于对极几何的单目深度和位姿联合估计方法
CN114022527A (zh) * 2021-10-20 2022-02-08 华中科技大学 基于无监督学习的单目内窥镜深度及位姿估计方法及装置
CN114399527A (zh) * 2022-01-04 2022-04-26 北京理工大学 单目内窥镜无监督深度和运动估计的方法及装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
WU HAIBIN, ZHAO JIANBO; XU KAIYANG; ZHANG YAN; XU RUOTONG; WANG AILI; IWAHORI YUJI: "Semantic SLAM Based on Deep Learning in Endocavity Environment", SYMMETRY, MOLECULAR DIVERSITY PRESERVATION INTERNATIONAL, vol. 14, no. 3, pages 614, XP093170401, ISSN: 2073-8994, DOI: 10.3390/sym14030614 *

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN118840388A (zh) * 2024-06-20 2024-10-25 广东省科学院智能制造研究所 一种基于视觉传感的胶囊机器人三维运动估计方法
CN118766589A (zh) * 2024-06-25 2024-10-15 复旦大学 基于场景重建的鼻腔内窥镜手术导航系统
CN119090964A (zh) * 2024-11-07 2024-12-06 合肥工业大学 基于3d高斯溅射的器械6d位姿估计方法和系统
CN119180852A (zh) * 2024-11-22 2024-12-24 泉州装备制造研究所 一种人体肠道稠密深度估计方法
CN119273855A (zh) * 2024-12-10 2025-01-07 之江实验室 一种内窥图像的三维重建方法及装置
CN119679513A (zh) * 2024-12-18 2025-03-25 四川大学 基于多模态信息融合的消化道置管机器人导航方法及系统

Similar Documents

Publication Publication Date Title
WO2024098240A1 (fr) Système et procédé de navigation de reconstruction visuelle d'endoscopie gastro-intestinale
Moreau et al. Crossfire: Camera relocalization on self-supervised features from an implicit representation
CN112766416B (zh) 一种消化内镜导航方法和系统
CN115797448A (zh) 一种消化内镜视觉重建导航系统及方法
US20180174311A1 (en) Method and system for simultaneous scene parsing and model fusion for endoscopic and laparoscopic navigation
US20250151977A1 (en) Computer-implemented systems and methods for analyzing examination quality for an endoscopic procedure
CN111798451A (zh) 基于血管3d/2d匹配的3d导丝跟踪方法及装置
CN108830150A (zh) 一种基于三维人体姿态估计方法及装置
CN111524170B (zh) 一种基于无监督深度学习的肺部ct图像配准方法
CN119580985B (zh) 基于多模态图像分析的机器人辅助伤口治疗方法及系统
WO2006127713A2 (fr) Procede d'alignement rapide d'images 2d-3d avec application a l'endoscopie a guidage continu
US12118737B2 (en) Image processing method, device and computer-readable storage medium
CN116485851A (zh) 面向腹腔镜手术导航的三维网格模型配准融合系统
Chen et al. FRSR: Framework for real-time scene reconstruction in robot-assisted minimally invasive surgery
CN111080778A (zh) 一种双目内窥镜软组织图像的在线三维重建方法
Wang et al. Deep convolutional network for stereo depth mapping in binocular endoscopy
WO2022170562A1 (fr) Procédé et système de navigation d'endoscope digestif
CN115841602A (zh) 基于多视角的三维姿态估计数据集的构建方法及装置
CN117710279A (zh) 内窥镜定位方法、电子设备和非暂态计算机可读存储介质
CN116421311A (zh) 基于术前术中三维网格融合的术中危险区域生成系统
Shan et al. ENeRF-SLAM:# A dense endoscopic SLAM with neural implicit representation
CN119055358A (zh) 基于虚拟标记追踪与器械位姿的手术操作力反馈引导方法
Lu et al. S2P-Matching: Self-supervised patch-based matching using transformer for capsule endoscopic images stitching
CN116071407A (zh) 支气管图像配准方法、装置、计算机设备、存储介质
WO2025214308A1 (fr) Procédé et appareil de reconstruction de modèle tridimensionnel de paroi interne de cavité digestive, dispositif et support

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22964716

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22964716

Country of ref document: EP

Kind code of ref document: A1