HK1171852B

HK1171852B - Image registration

Info

Publication number: HK1171852B
Application number: HK12112621.8A
Authority: HK
Inventors: E．科努克鲁; S.帕塔克; K．M．西迪克; A．克里米尼斯; S.怀特; J．D．J．肖顿; D．P．罗伯逊
Original assignee: 微软技术许可有限责任公司
Priority date: 2011-02-11
Filing date: 2012-12-06
Publication date: 2016-08-26

Description

Image registration

Technical Field

The present invention relates to image registration (imageregistration).

Background

Image registration is used in many application areas, such as medical applications, computer-aided manufacturing, robotics, and others. For example, two or more images of the same scene may have been captured at different times and/or from different points of view (viewpoints) and/or using different image capture devices. It may be desirable to register the images in order to detect differences between the images that are not a result of changes in the pose of objects in the scene and/or the different points of view of the image capture device used to capture the images.

Image registration is particularly useful in the field of medical applications. For example, in clinical scenarios such as patient follow-up (follow-up) and surgery or therapy planning. In order for medical professionals to be able to compare medical images in an accurate manner, the images need to be registered in order to remove the posture variations of the patient. For example, the registration may result in an alignment of the corresponding anatomical structures in the two medical images.

There is a continuing need to improve the accuracy and robustness of image registration systems and to reduce the time taken and computational resources required for image registration.

The embodiments described below are not limited to implementations that address any or all of the disadvantages of known image registration systems.

Disclosure of Invention

The following presents a simplified summary of the invention in order to provide a basic understanding to the reader. This summary is not an extensive overview of the invention, and it does not identify key/critical elements of the invention or delineate the scope of the invention. Its sole purpose is to present some concepts disclosed herein in a simplified form as a prelude to the more detailed description that is presented later.

Image registration is described. In an embodiment, an image registration system performs automatic registration of an image (e.g., a medical image). In an example, semantic information is computed for each of the images to be registered, the semantic information including information about the type of object in the image and the certainty of that information. In one example, a mapping is found that registers the image, taking into account the intensity of the image elements and the semantic information in the following way: weighted according to the certainty of the semantic information. The semantic information is calculated, for example, by estimating the posterior distribution (posteriordistribution) of the positions of the respective anatomical structures using a regression forest and transforming the posterior distribution into a probability map (probabilitymap). In an example, the mapping is found as a global inflection point of an energy function having terms related to the semantic information.

Many of the attendant features will be more readily appreciated as the same becomes better understood by reference to the following detailed description considered in connection with the accompanying drawings.

Drawings

The invention will be better understood from the following detailed description when read in light of the accompanying drawings, in which:

fig. 1 is a schematic diagram of a medical image registration system;

FIG. 2 is a schematic illustration of a pair of medical images to be registered and using either of two types of energy functions shown schematically;

FIG. 3 is a flow diagram of an example method of image registration at a medical image registration system;

FIG. 4 is a schematic diagram of an example portion of a decision forest;

FIG. 5 is an example of a training image for estimating organ position;

FIG. 6 is a flow diagram of an example method of training a regression forest;

FIG. 7 is a flow diagram of an example method of predicting organ positions using a trained regression forest;

FIG. 8 is a flow diagram of an example method of computing a probability map;

FIG. 9 is an example of a probability map computed from an input image;

FIG. 10 is a flow diagram of an example method of image registration;

FIG. 11 is an example of a registered image;

FIG. 12 illustrates an exemplary computing-based device in which embodiments of a medical image registration system may be implemented;

like reference numerals are used to refer to like parts throughout the various drawings.

Detailed Description

The detailed description provided below in connection with the appended drawings is intended as a description of the present examples and is not intended to represent the only forms in which the present examples may be constructed or utilized. The description sets forth the functions of the example of the invention, as well as the sequence of steps for constructing and operating the example of the invention. However, the same or equivalent functions and sequences may be accomplished by different examples.

Although the present examples are described and illustrated herein as being implemented in a medical image registration system, the system is provided by way of example and not limitation. Those skilled in the art will appreciate that the present examples are suitable for application in a variety of different types of image registration systems.

The following techniques are described with reference to medical images, which may be two-or three-dimensional images representing the internal structure of a body (of a human or animal), or a sequence of such images. These medical images may be two-dimensional or higher-dimensional volume images (or sequences of such images) representing the internal structure of a human or animal body. The volume image is formed of voxels (voxels). Voxels in a three-dimensional volumetric image are similar to pixels in a two-dimensional image and represent units of a volume.

Although described with reference to medical images, the techniques described below may also be applied in other areas, such as registration of satellite data, registration of sonar data, topological modeling, or other computer vision processes.

The term "image element" is used herein to refer to a pixel in a two-dimensional image, a voxel in a three-or higher-dimensional image or a sequence of images that change over time, or a group of pixels or voxels such as a blob, patch, or other set of two or more pixels or voxels. Each image element has one or more values associated with it. Each value represents an attribute such as intensity or color. The attributes may depend on the type of medical image device that generated the image. In an example, the image intensity may be related to the density of tissue or the proportion of water present in the material at a given portion of the image.

Fig. 1 is a schematic diagram of a medical image registration system including a computer-implemented image registration system 106. In an example, a user 110 (e.g., a physician) may wish to review multiple images 102, 104, for example, to perform a patient follow-up or to plan a procedure.

The image registration system 106 is arranged to take a reference image 102 of two or more dimensions and a target image 104, also of two or more dimensions, and perform a transformation to register the second medical image with the first medical image or vice versa. In an example, the reference image and the target image may depict the same patient. Images may be taken using different devices (e.g., multimodal) or at different times or from different angles. The image registration system enables these changes to be removed before the images are compared by medical personnel, other users, or automated processes.

In an example, the plurality of images may be registered with a reference image. For example, a physician may wish to perform a multi-modal examination of a patient using, for example, X-rays, Computed Tomography (CT) scans, and Positron Emission Tomography (PET) scans. Once these images have been registered with the reference image, the user 110 can compare the registered images 108 output from the image registration system 106 for any differences. For example, the user may examine the registered images for discrepancies. In an example, the registered image 108 may show changes in the size and shape of the organ over time, which may be indicative of injury, disease, or other problems with the organ. The multi-modal registered images may also be used for further automated analysis. For example, scientific data mining or rapid visualization. An automated organ identification process may be applied to registered images with multi-modal features to achieve greater accuracy than using such organ identification processes prior to image registration.

In some examples described below, the image is at least three-dimensional, although this is not essential. The image registration system 106 may also be used with two-dimensional images.

The image registration process may display the output images in a number of different ways. In an example, the registered image 108 may display the reference image 102 as a black-and-white image or a color image with the target image 104 overlaid thereon as a contour map (contourmap) or a partially transparent image. In another example, the registered images may be displayed as a sequence of images. The registered images 108 may be displayed in a pre-specified manner, or the user 110 may select a different manner in which to display the images. For example, the user 110 may choose to first display the target image as a contour overlay on the reference image, and then choose to display two or more of these images in order. For example, the image is caused to blink. In some embodiments, the estimated uncertainty in image registration may also be displayed by, for example, changing color or transparency. This uncertainty value may be provided as, for example, a variance or standard deviation value, and provides an indication of how confident the image registration system 106 is that the registration of the target image 104 to the reference image 102 is correct.

In an example, the medical image may be from a CT scanner 100. However, the images may be produced by many types of medical devices, such as Magnetic Resonance Imaging (MRI) scanners, Computed Tomography (CT) scanners, Single Photon Emission Computed Tomography (SPECT) scanners, Positron Emission Tomography (PET) scanners, and ultrasound scanners.

Fig. 2 is a schematic illustration of a pair of medical images to be registered and two types of energy functions schematically shown. Many common techniques for registering images design this linear registration problem as a minimization of a global energy function based on intrinsic image properties, such as an energy function e (i) based on image intensity. In an example of this technique, the target image 202 may be registered with the reference image 200 using minimization of the energy function 204. However, the minimization step of algorithms that minimize the energy function based only on the image intensity without using semantic information about the image content may suffer from local minima, leading to unsatisfactory results.

In another example, the images 200, 202 may be registered using optimization of an energy function 206E (I + semantic information) based on intrinsic image properties and other semantic information. For example, the semantic information may be information about the type of anatomical structure or organ in the medical image and the certainty of the information. The semantic information may be a probability measure, such as the probability that an organ is located in a certain part of the image. In other examples, the semantic information may be classification information, segmentation information, or other information about the content of the image. Adding semantic information as a term in the energy function enables to calculate an energy function with a more clearly defined global inflection point (minimum or maximum) to be found and improves the robustness of the image registration.

In an example, an automatic image registration system includes an input arranged to receive a first image and a second image, both images having a plurality of objects and at least a portion of one object being common to both images. For example, the images may be two medical images of the thorax of a patient, both showing the lungs of the patient but from slightly different points of view. The system may have a processor arranged to calculate semantic information for each of the first and second images, the semantic information comprising information about the type of object in the image and the certainty of the information. In some cases, this semantic information about the image content is provided as a probability map (described in more detail below). However, other formats may be used for the semantic information. The processor may be further arranged to find a mapping registering the first and second images, the mapping taking into account the intensities of the image elements of the respective images and also taking into account the semantic information in the following manner: weighted according to the certainty of the semantic information. This leads to a better image registration quality.

In an example, a method of automatic image registration includes receiving a first image and a second image, both images having a plurality of objects and at least a portion of one object being common to both images. For each of the first and second images, a probability map is computed, the probability map comprising: for each image element, the probability that the image element contains a particular object. The mapping registering the first and second images can then be found by optimizing an energy function, which is a function of the intensities of the first and second images, or a function of the probability map.

Fig. 3 is a flow chart of an example method of image registration at a medical image registration system. Such as the image registration system of fig. 1. The image 300 to be registered is input to an organ predictor 302. The organ predictor may predict the location of an organ or other anatomical structure in an image. For example, organ predictor 302 may be arranged to provide an estimated region or bounding box for each organ it detects, and an associated uncertainty. In an example where the input image is a three-dimensional image, the estimated region is a volume that closely contains or encompasses an organ, which may be a box or any other 3D form. In other examples, the organ predictor may be configured to mark or segment the image.

In one embodiment, the organ predictor comprises a regression forest. A regression forest is a whole classifier (ensembleclassifier) that includes a plurality of trained regression trees. The ensemble of many randomly trained regression trees (regression forest) yields improved generalization over a single tree that may suffer from over-fitting. The regression tree is different from the classification tree in that the regression tree provides a continuous output of real values, rather than the class to which the object belongs. The number of trees is fixed during the training process. In one example, the number of trees is three, although other values may be used. The number of trees and the depth of these trees are chosen depending on the application domain and factors such as available computing resources.

Predicting organ positions using regression forests is described in more detail with reference to fig. 4-7. In a non-exhaustive list of examples, the organ predictor may alternatively be: alternative types of trained decision trees, such as classification forest systems; vector machines, such as Support Vector Machines (SVMs); a Relevance Vector Machine (RVM); in addition, adaptive enhancement techniques (e.g., Ada enhancement) may be used.

The prediction of the position of the organ in the image 300 received from the organ predictor 302 is used as input to the transformer 304. Transformer 304 is used to bring the semantic information into a form that can be easily used as part of the process of finding a mapping between images. For example, the transformer 304 may transform the output of the organ predictor into a probability map 306 for each image. The probability map can be thought of as an image (which may have two dimensions, three dimensions, or higher): each image element position in such an image holds one or more values, each value corresponding to a probability that the image element depicts a particular organ. Each element in the probability map may correspond to an image element in the input image or may be of different resolution. For example, each element in the probability map may be an average of the probabilities at several image elements. In some examples herein, the probability map is a voxel-wise (voxel-wise) probability map. A color table of the probability map may be used to visualize the probability values. An example of a probability map is shown in fig. 9.

In an example, the probability map 306 for each image is used by an optimizer 308 to compute a spatial transformation 310 that registers the input images. For example, the optimizer 308 attempts to find an optimal solution to the energy function as described above with reference to FIG. 2. This energy optimization problem can be formulated with image intensity and semantic information to find global corners. In one example, the registration problem is an energy optimization problem. An example of energy optimization is described with reference to fig. 10.

FIG. 4 is a schematic diagram of an example portion of a random forest, which may be a random regression forest, a random decision forest, or other type of random forest. The illustrative decision forest of FIG. 4 includes three decision trees: a first tree 400 (denoted as tree Ψ)₁) A second tree 402 (denoted as tree Ψ)₂) And a third tree 404 (denoted as tree Ψ)₃). Each decision tree includes a root node (e.g., root node 406 of first decision tree 400), a plurality of internal nodes referred to as partition nodes (e.g., partition nodes 408 of first decision tree 400), and a plurality of leaf nodes (e.g., leaf node 410 of first decision tree 400).

Highly non-linear mappings are handled by splitting the original problem into a set of smaller problems that can be solved with a simple predictor. Fig. 4 shows an illustrative 1D example, with the goal of learning the analytic function of the predicted real-valued output. A single linear function fits poorly to this data. For example, a single linear function 412 output by the root node 406 does not fit the data. A set of training pairs (x, y)410 are given to a tree (e.g., tree 400) at a root node (e.g., root node 406). Each node in the tree is designed to split the data to form clusters that can perform accurate predictions with simpler models (e.g., linear in this example). Using more tree levels may result in a more accurate fit to the regression model. For example, the training pair 410 may be modeled 414 using a combination of linear regressors from nodes 416 to 422 that fit well to the data. In operation, each root and split node of each tree performs a binary test on the input data and directs the data to the left or right child nodes based on the results thereof. In formulation, each node performs a test ξ > f (x) > τ, where ξ, τ is a scalar. Based on this result, each data point is sent to the left or right child node 408.

In the example where the decision forest is a regression forest, the leaf nodes store continuous parameters η characterizing each regressor_S. The regressor may be linear, constant, gaussian, quadratic or any other functional form. This is in contrast to classification or decision forests where the predicted variables are discrete. Complex non-linear mappings can be modeled via a hierarchical combination of many simple linear regressors. More tree levels produce smaller clusters and smaller fitting errors, but may over-fit the input data. The manner in which the parameters used by each split node are selected and how the leaf node probabilities are calculated is described with reference to figures 5 and 6.

The tree-based regressor allows for the simultaneous processing of multiple anatomical structures, which encourages feature sharing among anatomical structures and thus better generalization. For example, the presence of a lung may indicate the presence of a heart. Unlike classification-based schemes that can locate the center of an organ but not the extent of the organ, the continuous model estimates the position of the walls of the bounding box containing each organ, thus enabling simultaneous organ localization and extent estimation.

As described in the examples below, the regression forest algorithm is able to incorporate atlas (atlas) information into a compact tree-based model. The atlas-based technique has conceptual simplicity. An atlas is a manually classified image that is mapped to a topic image by deforming the atlas until it closely resembles the topic. Thus, this technique depends on the availability of good atlases. Robustness may be improved by techniques based on multiple atlases, however this may be computationally inefficient. Furthermore, the conceptual simplicity of such an algorithm is different from robust cross-image registration. By incorporating the atlas information into the regression model, greater efficiency and robustness is achieved and anatomical localization is achieved faster than using multiple atlases.

In the example described with reference to FIG. 4, a regression forest is used to estimate a two-dimensional continuous vector representing a line. However, a regression forest may be used to estimate a vector having multiple dimensions. In the example where the position of the organ in the volumetric image is being estimated, the output is a six-dimensional vector b for each organ_cAll of (1) toA continuous parameter. In an embodiment, anatomical structures that can be identified include, but are not limited to

Fig. 5 is an example of a training image for estimating organ position. Note that the schematic diagram of fig. 5 is shown in two dimensions only for clarity, however an exemplary volumetric image is three-dimensional. The medical image 500 includes a representation of several organs, including a kidney 502, a liver 504, and a spine 506, but these are merely examples for illustrative purposes. Other typical organs that may be shown in the figures and identified using the techniques described herein include, but are not limited to, the head, heart, eyes, lungs, and major blood vessels. The bounding box 508 is shown (in dashed lines) drawn around the kidney 502. Note that in the illustration of fig. 5, the bounding box 508 is shown in only two dimensions, however in the volumetric image the bounding box 508 surrounds the kidney 502 in three dimensions.

For each classEach voxel v (e.g., v) in the training volume₁，v₂510, 512) can be associated with bounding box b_cOffset of 508And (4) associating. Two voxels v are shown in fig. 5₁510 and v₂Example of an offset of 520. v. of₁510 is offset byGiven, v₂512 is offset byGiven the offset vector that would exist in a 3-D volumetric imageAndthe other two components of (a) are not shown in the two-dimensional representation in fig. 5.

In an example, all voxels in the test CT volume contribute with different confidence to estimating the position of the bounding box walls. In other embodiments, only one selection of voxels may contribute. Some voxel clusters may predict the location of an organ with high confidence. Even if the voxel does not comprise a part of the organ, e.g. a cluster of voxels being part of a rib or a vertebra may predict the position of the heart with a high confidence. These clusters may be used as a reference for the position of the anatomical structure during the testing process. Voxels may be clustered based on their appearance, their spatial background, their intensity values, their confidence levels, or any other suitable metric. Feature selection and parametric regression can be performed by using multi-class random regression forests (e.g., the ensemble of regression trees) to predict the location and size of all desired anatomical structures simultaneously.

FIG. 6 is a flow diagram of an example method of training a regression forest. FIG. 6 depicts an example method of training a regression forest. Other methods may be used, such as breadth first training, where the level of the complete tree is trained at one time, or other methods. An associated ground-truth organ bounding box may be created for the training set of labeled volumes. In an example, the training set is created from a subset of all the volumes. The training set may be created by manually drawing a bounding box (600), which may be a cube in the case of a 3D image or a rectangle in the case of a 2D image, centered on the organ of interest. In an example, there may be more than one organ of interest per image. In the case of an image sequence, the bounding box may also extend in the temporal direction.

The bounding box may be drawn using a dedicated annotation tool, which is a software program that is capable of quickly drawing the bounding box from different views of the image (e.g., axial, coronal, sagittal (saggital), and 3D views). The bounding box may be drawn manually. In some embodiments, a radiologist may be used to confirm that the marking is anatomically correct.

The number of regression trees to be used in the regression forest is set (602). A regression forest is a collection of regression trees. The regression forest is represented by₁，....，Ψ_t，...，Ψ_TWherein T indexes each tree. The number of trees is fixed during the training process. In one example, the number of trees is three (e.g., the regression forest of fig. 4), but other values may also be used. All trees are trained in parallel. However, these trees may also be trained separately.

At each tree in the forest, a root node (e.g., root node 406) is selected (604). All image elements from each of these training images are then selected (606). An image element in the image V is defined by its coordinates V ═ x, y, z.

The manner in which the parameters used by each split node are selected and how the successive parameters are stored and calculated is now described.

A random set of test parameters is generated for use in the binary test performed at the root node 406 (608). In one example, the binary test is of the form: ξ > f (v; θ) > τ, so that f (v; θ) is a function applied to the image element at location v and its surrounding image elements with parameter θ, and the output of this function is compared with thresholds ξ and τ. If the result of f (v; θ) is in the range between ξ and τ, the result of the binary test is true, otherwise the result of the binary test is false. In other examples, only one of the thresholds may be used, such that the binary test is true if the result of f (v; θ) is greater than (or less than) the threshold. In the example described herein, the parameter θ defines the visual characteristic of the image.

Anatomical structures may be difficult to identify in medical images, as different organs may share similar intensity values, e.g. similar tissue density in case of CT and X-ray scans. Thus, the local intensity information does not have sufficient discrimination to identify the organ, and as described above, this may cause the optimization of the energy function to suffer from local minima.

The parameters theta of the function f (v; theta) are randomly generated during training. The process of generating the parameter θ includes generating, for example, randomly sized boxes (cubic boxes for 3D images, or rectangles for 2D images, both of which may extend in the time dimension in the case of a sequence of images) and spatial offset values. All dimensions of the box are randomly generated. The spatial offset value is in the form of a two-dimensional or three-dimensional displacement. In other examples, the parameter θ may further include one or more additional randomly generated boxes and spatial offset values. In alternative examples, differently shaped regions (rather than boxes) or offset points may be used. In one example, only a single signal channel (e.g., only intensity) is used for all cartridges. In other examples, the channel may be the magnitude of the intensity gradient, or a more complex filter may be used.

Given the above parameter θ, the result of the function f (v; θ) is calculated by aligning the randomly generated box with the image element of interest v such that the box is shifted from the image element v in the image by the spatial offset value_q∈FI (q), where q is a picture element within box F. This sum is normalized by the number of pixels in the box. In the case of two cassettes, f (v; θ) is given by:wherein F₁Is a first box, F₂May be a second cartridge, or in other examples, F₂May be an empty set of unary features. Again, through each cartridgeThe respective numbers of picture elements normalize these two sums, respectively.

The result of the binary test performed at the root node or the split node determines to which child node the image element is passed. For example, if the result of the binary test is true, the image element is passed to the first child node, and if the result is false, the image element is passed to the second child node.

The generated random set of test parameters includes a plurality of random values for the function parameter θ and the threshold parameter values ξ and τ. To inject randomness into the decision tree, the function parameters θ for each split node are optimized only with respect to a randomly sampled subset Θ of all possible parameters. For example, the size of the subset may be 100. This is an effective way to increase generalization.

Each combination of test parameters is applied to each image element in the training image (610). In other words, for each image element in each training image, all available values θ ∈ Θ are tried one after the other, in combination with all available values of ξ and τ. For each combination, an information gain (also referred to as relative entropy) is calculated by selecting a parameter that maximizes the given information (612).

For example, as described above, for eachEach voxel v in the training volume is associated with a bounding box b_cIs offset fromAre associated with whereinAnd isNode optimization may be performed by optimizing an information gain metric. An example of an information optimization metric isWherein H represents the entropy of the sample to be analyzed,is the set of training points to reach that node, and L, R represent the left and right children.

In the example where the decision forest is a classification forest, entropy is measured on discrete class labels. In contrast, in the example where the decision forest is a regression forest, the purity of the probability density of the real-valued prediction is measured. For a single class c, vector d_cCan be modeled as a multivariate gaussian at each node. For example,wherein forAll points in (a), matrix Λ_cCode d_cThe covariance of (a). In other examples, the regression forest may alternatively be used with non-parametric distributions.

Differential entropy of multivariate Gaussian can be shown asWhere n is the number of dimensions. In the example where the image is a volume image, n is 6, and in the example where a two-dimensional image is described, n is 4. However, n may take any suitable value. The regression information gain in this example is thus

In examples where the information gain is adapted to process multiple organs simultaneously, the information gain may be adapted toThis formula can be rewritten as:wherein

Maximize (1) so thatThe determinant of the covariance matrix is minimized, thereby reducing the uncertainty in the probability vote each voxel cluster applies to each organ pose. As an alternative to the information gain, other criteria may be used, such as Gini entropy or "double-ing" criteria.

In some embodiments, if the value of the maximized information is less than a fixed threshold (614), this indicates that further expansion of the tree will not provide significant benefit. This results in an asynchronous tree that naturally stops growing when no further nodes are needed. In other examples, the node stops growing when the maximum tree depth is reached or too few points reach the node. In one example, the maximum depth of these trees is 7 levels, although other values may be used. In an exampleThe maximum depth of these trees is chosen to minimize the average error. When further nodes will not provide significant benefit, the node is set to a leaf node (616). At each leaf node, a prediction of the continuous localization parameters is stored (618). E.g. learned average(wherein) And covariance.

If the value of the maximized information gain is greater than or equal to the threshold value and the depth of the tree is less than the maximum value, the current node is set as a split node (620). If the current node is a split node, it has child nodes and the process moves to training these child nodes. Each child node is trained at the current node using a subset of the training image elements. Using a parameter θ that maximizes information gain^*、ξ^*And τ^*To determine the subset of image elements that are sent to the child node. These parameters are used in a binary test and the binary test is performed on all image elements at the current node (622). Image elements that pass the binary test form a first subset that is sent to the first child node, while image elements that do not pass the binary test form a second subset that is sent to the second child node. After training, the jth segmentation node is still connected with the feature theta_jAnd threshold ξ_j，τ_jAnd (4) associating.

For each of the child nodes, the process outlined in block 608 and 622 of FIG. 6 is recursively performed for the subsets of image elements leading to the respective child node (624). In other words, for each child node, new random test parameters are generated (608) and applied to the respective subset of image elements leading to the respective child node (610), the parameter that maximizes the information gain is selected (612), and the type of node (whether a segmentation node or a leaf node) is determined (614). If it is a leaf node, the current recursive branch is stopped. If it is a split node (620), a binary test (622) is performed to determine a further subset of image elements and another recursive branch begins. Thus, this process moves through the tree in a recursive manner, training each node until a leaf node is reached at each branch. When a leaf node is reached, the process waits (626) until all nodes in all branches have been trained. In other examples, alternative recursive techniques may be used to achieve the same functionality. Once the leaf nodes of all trees in the forest are reached, the training process is complete and the process terminates (628).

In another example, assume diagonal (and diagonal-class covariance Λ), assuming diagonal (and diagonal-class covariance Λ)_c) Possibly resulting in irrelevant output predictions. In a further example, the correlation between a selected subset of classes may be sparse but may be allowed to capture, for example, class hierarchy or other forms of spatial context.

FIG. 7 is a flow chart of an example method of predicting a position of an organ in a previously unseen image. In one example, a regression forest that has been trained as described in FIG. 6 may be used. An unseen image is received at organ predictor 302 (700). The image is referred to as "not seen" to distinguish it from a training image with organ present and/or bounding boxes identified.

Image elements from unseen images are selected (702), and a trained regression tree from a regression forest is also selected. The selected image is pushed through the selected regression tree (in a manner similar to that described above with reference to fig. 6) (706) so that it is tested against the trained parameters at a node and then passed to the appropriate child node according to the output of the test. This process is repeated until the image element reaches a leaf node l (v), where l indexes the leaf across the entire forest.

If it is determined that there are more regression trees in the forest (710), a new regression tree is selected (704), and the image element is pushed through the tree until a leaf node is reached. This process is repeated until the process is performed for all regression trees in the forest (710). In an example, image elements may be pushed in parallel through multiple trees in the regression forest. It is determined whether there are other unanalyzed image elements in the unseen image (712), and if so, another image element is selected and the process is repeated. In one example, the body may be alignedThe process is repeated for each voxel in the series.

In an example, stored predictions at leaf nodes of class cIs also defined fromAbsolute bounding box probability of onsetThe posterior of (1). A set of leaves can be selectedIn one example of this, the first and second sensors are,may be a subset of all the forest leaves. Example (b)Such as, for example,may be a set of leaves with minimal uncertainty (for each class c) and contain a certain threshold level in all tested image elements. In one example, the particular threshold level is 1%. However, another level may be used. B can also be calculated_cThe posterior probability of (d). In one example, the posterior probability is

In the above example, most of the confidence leaves are used for the prediction output. For example, ifThenOtherwise 0, no matter where they come from the forest.

In an example, whether an image element is within a bounding box may be predicted from the probability (714). In one example, ifIn one example, β ═ 0.5, however, any threshold may be usedIt is given. In an example, a prediction for an image element is assigned to the image element for future use.

The example methods described herein have reduced errors and are relatively faster and more robust in computing bounding box predictions compared to atlas-based techniques. As described above, in many cases, atlas-based techniques suffer from local minima, which may lead to inaccurate results. In some examples, a local registration step may be added to the atlas-based technique, however this may not improve the situation that suffers from local minima and may significantly increase the runtime of the algorithm. Furthermore, regression forest-based techniques require significantly less memory than atlas-based techniques. Regression-based methods can compute the position of each wall instead of just the center of the organ, allowing approximate range estimation. The regression techniques described in the examples herein are also more accurate than classification-based methods.

FIG. 8 is a flow diagram of an example method of computing a probability map. A probability map is one example of semantic information that may be used to formulate the type of energy function. The estimated organ positions are received from the organ predictor 302 (800). In an example, these organ positions may be predictions of the absolute bounding box position of the c-th organ, which are expected values calculated using a trained regression forest algorithm as described above with reference to fig. 7It is given. The output from the organ predictor may be transformed into a map (map) of semantic information (802). In an example, the semantic information is a probability that each image element is located within the organ bounding box. In one embodiment, the probability map may be calculated from the following equation:

wherein B is_iDenotes the ith bounding box, x ═ x₁，x₂，x₃) The position of the picture element is represented,andis B_iAbove and below in the j direction, andandis the standard deviation of these coordinates estimated by the organ predictor.

The semantic information may be output in the form of a probability map. In some embodiments, the probability map may be a volume probability map. In an example, the plurality of probability maps may be outputs for each input image. For example, a probability map may be output showing the probability that an organ is located within one bounding box (e.g., the bounding box that encloses the left kidney). In one example, a probability map may be output for each predicted organ bounding box. In another example, an aggregate probability map may be output that shows the probability of an image element being located within any of the predicted organ bounding boxes.

Fig. 9 is an example of a probability map calculated from an input image. In an example, the input image 900 may be a CT image showing a liver 902 and a right kidney 904. However, the image 900 may be any suitable image. In an example, the output of the transformer 304 is a first probability map 906 and a second probability map 910, the first probability map 906 showing the likelihood that each image element in the input image 900 is inside the liver bounding box 908, the second probability map 910 showing the likelihood that each image element in the input image 900 is inside the right kidney bounding box 912. The probability map is schematically represented using an ellipse to indicate those regions where the probability value may increase towards the center of the region. The transformer 304 may output more than two probability maps. For example, the transformer may output a probability map for each bounding box previously in the image.

In other embodiments, the probability map may be an aggregated probability map showing the likelihood of an image element being inside any of the estimated bounding boxes. For example, the probability map may show the probability that an image element belongs to the liver or kidney or any other bounding box in the input image 900.

In an example, the color table of the probability map is scaled according to the likelihood of the image element within the bounding box. For example, in an example where 256 colors exist in the color table, an image element having a probability of 1.0 inside the bounding box is set to the color 256, and an image element having a probability of 0 inside the bounding box is set to the color 1. In other examples, the colors may be scaled differently, such as logarithmically or according to gamma.

Although the probability map depicted in fig. 9 is displayed to the user, this is not necessary. In other embodiments, the probability map may be calculated as described in FIG. 8 without displaying the output of the transducer to the user. For example, the computed probability map or other semantic information may be temporarily held in cache memory, stored at a data store, or otherwise retained without displaying the information to the user.

Fig. 10 is a flow chart of an example method of image registration. Semantic data is received at optimizer 308 from transformer 304 (1000). In an embodiment, the problem of how to register the reference image with the target image is an energy optimization problem. In an embodiment, the energy optimization problem includes finding a global inflection point, e.g., a minimum or a maximum of an energy function. Such as the energy function 206 described above with reference to fig. 2. An example method of computing energy optimization (1002) is to formulate an energy optimization problem as an energy minimization task (1002).

To calculate the energy optimization function, mutual information between the reference image and the target image may be calculated (1004). This mutual information is a quantity that measures the interdependence of two variables (e.g., an image element in the reference image I and an image element in the target image J). In an example, high mutual information indicates a large reduction in uncertainty; low mutual information indicates a small reduction; and zero mutual information between two random variables means that the variables are independent.

In one example, x is a random variable at a coordinate location in the reference image. In some embodiments, samples are drawn from x to approximate mutual information. In one example, the mutual information may be defined as:

MI(I(x)，J(T(x)))≡h(I(x))+h(J(T(x)))-h(I(x)，J(T(x)))

where h (i (x)) is the edge entropy of the reference image (marginalendrony) and h (J (t (x)) is the edge entropy of the portion of the test image in which the reference image is projected h (i (x), J (t (x)) is the joint entropy of the reference image and the target image (jointropy).

In some embodiments, a Kullback-Leibler (KL) divergence may be calculated (1006). The KL divergence is an asymmetric measure of the distance between two probability distributions. An example of computing a probability distribution is described above with reference to block 802 of FIG. 8. The KL divergence may be given by:

in an embodiment, the energy optimization problem is an energy minimization task that can be computed from mutual information and KL divergence. For example, the energy minimization problem may be:

where I is the object image, J is the moving image, T is the spatial transformation, MI is the mutual information, and KL is the Kullback-Leibler divergence, and the point probability distribution p (x) is assumed to be uniform over the image domain Ω. In other examples, the energy optimization problem may be an energy maximization task.

In other embodiments, other methods may be used to calculate the energy optimization. For example, other entropy measures may be used. An example of an alternative entropy measure is Shannon (Shannon) entropy.

Global optimization of the energy function e (t) enables computation of a linear spatial transform (1008) for registering the images and registration of the target image with the reference image (1010). In other embodiments, a non-linear spatial transformation may be computed.

Fig. 11 is an example of a registered image. The image 1100 may be a CT scan or other suitable image. In this example, the image shows a reference image of the right kidney 1102, with a registered target image of the right kidney overlaid thereon as a contour map (contourmap) 1104. In one example, the contour may be an intensity contour, e.g., contours at 90, 70, and 60 percent of the intensity of the reference image. In other examples, the registered target images may be overlaid or displayed in sequence as partially transparent images.

Fig. 12 illustrates components of an exemplary computing-based device 1200 that can be implemented as any form of a computing and/or electronic device, and in which embodiments of an image registration system can be implemented.

Computing-based device 1200 includes one or more inputs 1202, which are of any suitable type, for receiving media content, Internet Protocol (IP) input, volumetric image data, input from a media device, input from an image capture device, input from a user, input from other computing-based devices. The device also includes a communication interface 1204 that may be used to enable the device to communicate with other devices over a communication network.

The computing-based device 1200 also includes one or more processors 1206, which may be microprocessors, controllers, or any other suitable type of processor for processing computing-executable instructions to control the operation of the device in order to perform image registration. The processor 1206 may include one or more specialized image processing processors, such as a graphics processing unit. In some examples, such as in examples using a system-on-a-chip architecture, the processor 1206 may include one or more fixed function blocks (also known as accelerators) that implement portions of the image registration method in hardware (rather than software or firmware). Platform software including an operating system 1208 or any other suitable platform software may be provided at the computing-based device to enable execution of application software 1210 on the device.

Further software that may be provided at the computer-based device 1200 includes tree training logic 1212 (which implements the techniques described for training the regression tree, or other suitable tree training techniques), image analysis logic 1214 (which implements techniques for traversing unseen images through the regression tree as described herein), image transformation logic 1216 (which computes semantic information as described herein), and image optimization logic 1218 (which solves the registration problem). A data store 1220 is provided to store data such as training parameters, probability distributions, and analysis results.

Computer-executable instructions may be provided using any computer-readable media accessible by computing-based device 1200. Computer-readable media may include, for example, computer storage media such as memory 1222 and communication media. Computer storage media, such as memory 1222, includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Computer storage media includes, but is not limited to, RAM, ROM, EPROM, EEPROM, flash memory or other memory technology, CD-ROM, Digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information for access by a computing device. In contrast, communication media may embody computer readable instructions, data structures, program modules, or other data in a modulated data signal, such as a carrier wave or other transport mechanism. As defined herein, computer storage media does not include communication media. While computer storage media (memory 1222) is shown in computing-based device 1200, it is to be understood that the storage can be distributed or remotely located and accessed via a network or other communication link (e.g., using communication interface 1204).

An output 1224 is also provided, such as an audio and/or video output to a display system integral with or in communication with the computing-based device. The display system may provide a graphical user interface, or any other suitable type of user interface, but this is not required. In some examples, the display system includes a touch-sensitive display screen.

The term 'computer' as used herein refers to any device with processing capabilities such that it can execute instructions. Those skilled in the art will recognize that such processing capabilities are integrated into many different devices, and thus, the term 'computer' includes PCs, servers, mobile phones, personal digital assistants and many other devices.

The methods described herein may be performed by software in machine-readable form on a tangible storage medium, for example in the form of a computer program comprising computer program code means adapted to perform all the steps of any of the methods described herein when the program is run on a computer and wherein the computer program may be included on a computer-readable medium. Examples of tangible (or non-transitory) storage media may include disks, thumb drives, memory, and the like, without the propagated signal. The software may be adapted to be executed on a parallel processor or a serial processor such that the method steps may be performed in any suitable order, or simultaneously.

This confirms that the software can be a valuable, separately tradable commodity. It is intended to encompass software running on, or controlling, "dumb" or standard hardware to carry out the desired functions. It is also intended to encompass software, such as HDL (hardware description language) software, used to design silicon chips, or to configure general purpose programmable chips, that describes or defines the configuration of hardware to achieve a desired function.

Those skilled in the art will realize that storage devices utilized to store program instructions may be distributed across a network. For example, the remote computer may store an example of the process described as software. A local or terminal computer may access the remote computer and download a part or all of the software to run the program. Alternatively, the local computer may download pieces of the software, as needed, or execute some software instructions at the local terminal and other software instructions at the remote computer (or computer network). Those skilled in the art will also realize that by utilizing conventional techniques known to those skilled in the art that all, or a portion of the software instructions may be carried out by a dedicated circuit, such as a DSP, programmable logic array, or the like.

As will be clear to those skilled in the art, any of the ranges or device values given herein may be extended or altered without losing the effect sought.

It is to be appreciated that the advantages described above may relate to one embodiment or may relate to multiple embodiments. The embodiments are not limited to embodiments that solve any or all of the problems or embodiments having any or all of the benefits and advantages described. It will further be understood that references to 'an' item refer to one or more of those items.

The steps of the methods described herein may be performed in any suitable order, or simultaneously, where appropriate. In addition, individual blocks may be deleted from any of the methods without departing from the spirit and scope of the subject matter described herein. Aspects of any of the examples described above may be combined with aspects of any of the other examples described to form further examples without losing the effect sought.

The term "comprises/comprising" when used herein is intended to cover the identified blocks or elements of the method, but does not constitute an exclusive list and a method or apparatus may contain additional blocks or elements.

It will be understood that the above description of a preferred embodiment is given by way of example only and that various modifications may be made by those skilled in the art. The above specification, examples and data provide a complete description of the structure and use of exemplary embodiments of the invention. Although various embodiments of the present invention have been described above with a certain degree of particularity, or with reference to one or more individual embodiments, those skilled in the art could make numerous alterations to the disclosed embodiments without departing from the spirit or scope of this invention.

Claims

1. A computer-implemented method of automatic image registration, comprising:

receiving (300) a first image and a second image, both images having a plurality of objects, and at least a portion of at least one object of the plurality of objects being common to both images;

for each of the first and second images, a probability map (306) is computed, the probability map comprising: for each image element, the image element containing a probability of a specified object, wherein computing a probability map comprises predicting a location of the specified object using a regression forest comprising a plurality of regression trees (400, 402, 404) that each have been trained, predicting the location of the specified object comprising estimating a posterior distribution of the location of the specified object using the regression forest;

a map registering (310) the first and second images is found by optimizing an energy function (206), which is a function of the intensities of the first and second images and also of the probability map.

2. The method of claim 1, wherein the first and second images are medical images.

3. A method as claimed in claim 1 or claim 2, wherein the first and second images are of different modalities.

4. A method as claimed in claim 1 or 2, wherein the first and second images have three or more dimensions.

5. The method of claim 1, wherein computing the probability map comprises: for each image element, a plurality of probabilities are calculated such that there is a probability for each of a plurality of specified objects associated with each image element.

6. A method according to claim 1 or 2, wherein calculating the probability map comprises clustering image elements based on their appearance, spatial background and their confidence in the prediction of the bounding box of at least one specified object.

7. The method of claim 1 or 2, wherein computing the probability map comprises classifying image elements into specified classes using a classification forest comprising a plurality of classification trees that have been trained.

8. The method of claim 1 or 2, wherein finding the mapping comprises globally optimizing the energy function.

9. An automatic image registration system, comprising:

means for receiving a first image and a second image, both images having a plurality of objects, at least a portion of at least one object of the plurality of objects being common to both images;

means for calculating semantic information for each of the first and second images, the semantic information including information about a type of object in the image and a certainty of that information, wherein calculating semantic information includes generating semantic information using a regression forest including a plurality of regression trees each having been trained, wherein the regression forest is used to estimate a posterior distribution of locations of objects in the image and the posterior distribution is transformed into a probability map; and

means for finding a mapping registering the first and second images, the mapping taking into account intensities of the image elements of the images and the semantic information in the following manner: weighting according to the certainty factor of the semantic information.

10. An automatic image registration method, comprising:

receiving input (1202) of a first image and a second image, both images having a plurality of objects, at least a portion of at least one object of the plurality of objects being common to both images;

calculating semantic information for each of the first and second images, the semantic information including information about a type of the object in the image and a certainty of that information, wherein calculating semantic information includes generating semantic information using a regression forest including a plurality of regression trees each having been trained, wherein the regression forest is used to estimate a posterior distribution of locations of the object in the image and the posterior distribution is transformed into a probability map; and

finding a mapping that registers the first and second images, the mapping taking into account the intensities of the image elements of the images and the semantic information in the following way: weighting according to the certainty factor of the semantic information.