[go: up one dir, main page]

WO2013019856A1 - Automated malignancy detection in breast histopathological images - Google Patents

Automated malignancy detection in breast histopathological images Download PDF

Info

Publication number
WO2013019856A1
WO2013019856A1 PCT/US2012/049155 US2012049155W WO2013019856A1 WO 2013019856 A1 WO2013019856 A1 WO 2013019856A1 US 2012049155 W US2012049155 W US 2012049155W WO 2013019856 A1 WO2013019856 A1 WO 2013019856A1
Authority
WO
WIPO (PCT)
Prior art keywords
feature
features
nuclei
mean
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Ceased
Application number
PCT/US2012/049155
Other languages
French (fr)
Inventor
Andrei-chakib CHEKKOURY-IDRISSI
Parmeshwar Khurd
Jeffrey P. Johnson
Claus Bahlmann
Amar H. PATEL
Jie Ni
Ali Kamen
Leo Grady
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Siemens Healthcare Diagnostics Inc
Original Assignee
Siemens Healthcare Diagnostics Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Siemens Healthcare Diagnostics Inc filed Critical Siemens Healthcare Diagnostics Inc
Publication of WO2013019856A1 publication Critical patent/WO2013019856A1/en
Anticipated expiration legal-status Critical
Ceased legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/69Microscopic objects, e.g. biological cells or cellular parts
    • G06V20/698Matching; Classification
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features

Definitions

  • the present disclosure relates to malignancy detection and, more specifically, to automated malignancy detection in breast histopathological images.
  • histopathology is the examination of tissue in the study of the manifestations of disease.
  • a histological section of a specimen is placed onto glass slide for study. In some cases this section may be imaged to generate a virtual slide.
  • the analysis of virtual slides by pathologists and computer algorithms is often limited by the technologies currently available for digital pathology workstations as described by Patterson et al., "Barriers and facilitators to adoption of soft copy interpretation from the user perspective: Lessons learned from filmless radiology for slideless pathology" J. Pathol. Inform. 2(1), 2011, E. Krupinski, "Virtual slide telepathology workstation-of-the-future: lessons learned from teleradiology," Sem Diag. Path. 26, pp. 194-205, 2009, and Johnson et al.,"Usingavisualdiscrimination model for the detection of compression artifacts in virtual pathology images," IEEE Trans. Med. Imaging 30(2), pp. 306-314, 2011.
  • the histopathological diagnosis is the foundation of modern oncology, and plays a major role in the treatment of many other types of disease. Errors in these reports can critically affect patient care and may become the subject of media concern. Detection of malignancy from histopathological images of breast cancer is a labor-intensive and error- prone process.
  • biopsy samples are obtained by extracting tissue from a region of suspicion. This may be achieved, for example, using a needle biopsy.
  • a trained practitioner for example, a pathologist, may then visually inspect the extracted tissue. The pathologist may then make a determination as to whether the sample is benign or malignant based on its appearance. However, this approach may be prone to human error.
  • a method for automatically classifying tissue includes obtaining training data including a plurality of microscope images that have been manually classified.
  • a plurality of features is calculated from the training data, each of which is a texture feature, a network feature, or a morphometric feature.
  • a subset of features is selected from the calculated subset of features based on both maximum relevance and minimum redundancy.
  • a classifier is trained based on the selected subset of features and the manual classifications.
  • a diagnostic microscope image is classified in a computer-aided diagnostic system using the trained classifier.
  • the subset of features may include at least one texture feature, at least one network feature, and at least one morphometric feature.
  • the at least one texture feature may be an H texton feature.
  • the at least one network feature may include a mean cycle weighted
  • the at least one morphometric feature may include a mean ratio of Eigenvalues, a standard deviation in nuclei size, or a mean of angular feature.
  • the classifier may be a support vector machine.
  • the diagnostic microscope image may be a breast histopathological image.
  • the classifying of the diagnostic microscope image may include determining whether the image is benign or malignant.
  • the classifying of the diagnostic microscope image may include determining a grade of malignancy.
  • Calculating the plurality of features from the training data may include transforming the training images from an RGB color space to a CMY color space.
  • Calculating the plurality of features from the training data may include obtaining H & E component vectors from the training images.
  • Calculating the plurality of features from the training data may include detecting and segmenting nuclei from the training images. The nuclei detection may be based on fast radial symmetry. The nuclei
  • segmentation may be based on the Random Walker approach.
  • a method for automatically classifying tissue includes obtaining training data including a plurality of microscope images that have been manually classified.
  • a classifier is trained based on the plurality of features and the manual classifications.
  • a diagnostic microscope image is classified in a computer- aided diagnostic system using the trained classifier.
  • the at least one texture feature may be an H texton feature.
  • the at least one network feature may include a mean cycle weighted Euclidean length, a number of connected components, or an average shortest path between nuclei.
  • the at least one morphometric feature includes a mean ratio of Eigenvalues, a standard deviation in nuclei size, or a mean of angular feature.
  • a method for automatically classifying tissue includes obtaining training data including a plurality of microscope images that have been manually classified. At least one feature, each of which is a texture feature, a network feature, or a morphometric feature, is classified. A classifier is trained based on the at least one feature and the manual
  • a diagnostic microscope image is classified in a computer-aided diagnostic system using the trained classifier.
  • the at least one feature includes a feature directed to one or more of the following: mean ratio of Eigenvalues of a Hessian Matrix, standard deviation in nuclei size, mean of angular feature; H texton, mean cycle weighted Euclidean length, number of connected components, or average shortest path between nuclei.
  • a method for automatically classifying tissue includes obtaining training data including a plurality of microscope images that have been manually classified. A plurality of features is calculated from the training data. A subset of features is selected from the calculated subset of features. A classifier is trained based on the selected subset of features and the manual classifications. A diagnostic microscope image in a computer-aided diagnostic system is classified using the trained classifier. The method is characterized by basing the selection of the subset of features on both maximum relevance and minimum redundancy.
  • a computer system includes a processor and a non-transitory, tangible, program storage medium, readable by the computer system, embodying a program of instructions executable by the processor to perform method steps for automatically classifying tissue.
  • the method includes obtaining a microscope image, calculating at least one feature, each of which is either a texture feature, a network feature, or a morphometric feature, from the obtained microscope image, and classifying the diagnostic microscope image based on the calculated at least one feature and a trained classifier.
  • the at least one feature includes a feature directed to one or more of the following: mean ratio of Eigenvalues of a Hessian Matrix, standard deviation in nuclei size, mean of angular feature; H texton, mean cycle weighted Euclidean length, number of connected components, or average shortest path between nuclei.
  • FIG. 1 is a flow chart illustrating a method for automatically classifying
  • FIG. 2(a) is an example of an original image from which the texton map is created
  • FIG. 2(b) is an example of a texton map that may be created from the original image in accordance with exemplary embodiments of the present invention
  • FIG. 2(c) is an example of a texton histogram generated from the original image in accordance with exemplary embodiments of the present invention
  • FIGS. 3(a) and 3(b) are images illustrating Random Walker segmentation of nuclei in accordance with exemplary embodiments of the present invention
  • FIGS. 4(a)-(c) are images graphically demonstrating segmentation of nuclei using the random walker technique in accordance with exemplary embodiments of the present invention.
  • FIG. 5 is a flow chart illustrating steps performed in computing Fourier shape descriptors in accordance with exemplary embodiments of the present invention
  • FIGS. 6(a)-(c) is a set of images and associated Fourier descriptors, as generated in accordance with exemplary embodiments of the present invention.
  • FIG. 7 is a diagram illustrating nuclei parallelism used as a basis for establishing morphometric features in accordance with exemplary embodiments of the present invention.
  • FIG. 8 shows an example of a computer system capable of implementing the method and apparatus according to embodiments of the present disclosure.
  • Exemplary embodiments of the present invention seek to provide an approach for the automatic detection of malignancy in histopathological images. By accurately and automatically performing this analysis, exemplary embodiments of the present invention may be used to complement the specialized opinion of the pathologist, by using an objective judgment, making use of quantitative measures.
  • exemplary embodiments of the present invention provide an efficient Computer Aided Diagnostic (CAD) system that can differentiate between cancerous and non-cancerous tissues that have been stained using a hematoxylin and eosin (H&E) stain.
  • CAD Computer Aided Diagnostic
  • exemplary embodiments of the present invention may utilize a set of novel textural, topological and morphometric features that exploit special patterns of the nuclei cells in breast cancer histopathological images. Support Vector Machine classifiers may then be used on these features to diagnose malignancy. While the full set of features may be used for this purpose, to save computational complexity, feature selection may be performed to determine a subset of features with a high potential for providing an accurate classification. This feature selection may utilize a combination of maximum relevance and minimum redundancy so that high sensitivity and specificity may be achieved. Exemplary embodiments of the present invention may also allow for the use of image compression in classification without significant detriment to performance.
  • exemplary embodiments of the present invention may be described herein with reference to distinguishing between malignant and benign breast cancer samples, based on specially designed textural, topological and morphometric features that can capture representative structures in cancer tissue, the invention is not limited to breast biopsies and the techniques discussed herein maybe applied to automatic classification of other forms of tissue. Moreover, the invention is not limited to simply classifying between benign and malignant tissue, exemplary embodiments of the present invention may also use the techniques described herein to report grades of malignancy.
  • FIG. 1 is a flow chart illustrating a method for automatically classifying
  • the tissue sample may be acquired and prepared (Step S100).
  • Sample acquisition may be performed, for example, by needle biopsy or lumpectomy.
  • Preparation may include, for example, mounting the tissue sample on a slide and applying the H&E stain.
  • the H&E stain may be applied to enhance the appearance of specific structures within the sample.
  • Hematoxylin is a blue dye that binds to the nuclear chromatin and eosin is an acidic pink dye that binds to the cytoplasmic structures and blood cells. Depending on the application, a separation of these dyes may be helpful.
  • the prepared sample may be imaged using a microscope device (Step S101).
  • a microscope device For example, a line scanner microscope, another form of digital microscope, or an optical microscope may be used. Where an optical microscope is used, a digital image may be captured from the microscope output. In either event, a microscopic digital image of the sample may be procured. The microscopic digital image may then be saved to and later retrieved from an image database where analysis is not performed at the time of acquisition.
  • the image may be, for example, a 512 x 512 pixel image patch taken from a relatively large, for example, several gigapixel H&E stained "virtual slide.”
  • the slides may be sampled at, for example, 0.47 microns per pixel, which may correspond to a 40X objective scan.
  • the images may be compressed and analysis may be performed on the compressed image. While any compression standard may be used, JPEG2000 at 5, 16, 32, 64, or 128 levels of compression are some exemplary cases.
  • Exemplary embodiments of the present invention may be robust for at least any of the above levels of compression, as classification may primarily depend on nuclei location, which may not be compromised by image compression.
  • Feature extraction may include the qualification of various properties found within the image data.
  • Exemplary embodiments of the present invention may utilize one or more features of three distinct areas: morphometric, textural and topological features (which may also be referred to as network features and may include, for example, network cycle features). This is to say, exemplary embodiments of the present invention utilize at least one feature from any one these three categories. However, according to some exemplary embodiments of the present invention, multiple features may be extracted from each of these categories. Accordingly, a wide variety of features may be extracted, for example, a set of anywhere from 20-30 features may be extracted.
  • the feature extraction step may include a set of sub-steps (Steps S103- 111), each of which is described in detail below. While FIG. 1 shows various pre-processing steps (i.e., S103, S104, S106, S107, S109, and S110) as being performed with respect to particular categories of feature extraction (i.e., S105, S108, and Si l l), it is to be understood that the results of any preprocessing step may be used in performing any category of feature extraction.
  • the arrangement illustrated in FIG. 1 is offered as an exemplary approach.
  • pre-processing these images may reduce the computational cost associated with feature extraction.
  • Exemplary embodiments of the present invention may utilize one or more texture features in classifying the sample image.
  • the acquired microscope image may be transformed from normal RGB color space, in which each pixel is represented as a value of red, a value of blue, and a value of green, into a CMY color space, in which each pixel is represented as a value of cyan, a vale of magenta, and a value of yellow (Step S103).
  • the CMY color space may make it easier to distinguish between tissue absorbing the hematoxylin stain, which may appear substantially cyan, and the eosin stain, which may appear substantially magenta.
  • Step S104 From the image transformed to the CMY color space, separation may be performed to obtain a hematoxylin component vector and an eosin component vector (Step S104).
  • Step S105 One or more texture features may be calculated (Step S105).
  • Approaches for performing texture classification in accordance with exemplary embodiments of the present invention may involve a filtering step followed by clustering. This may be done to identify basic texture elements, which may be referred to herein as "textons.” The distribution of these textons may be used by exemplary embodiments of the present invention as a discriminative signature for each tumor grade present in a sample, and may be used as input for a support vector machine (SVM) classifier, for example as described in detail below.
  • SVM support vector machine
  • an appropriate rotationally invariant filter bank may be used to extract responses at each pixel level.
  • the Maximum Response (MR) filter bank consists of a number of filters at multiple orientations, but their output consists of a record computed only at the maximum filter response.
  • the MR8 filter is an example of a filter that may be used in texton generation.
  • the MR8 filter is computed at three scales, giving a total of six responses, three for the edge filter and three for the bar filter.
  • the remaining two filters are the Gaussian Filter and the
  • a clustering technique may be performed on the filter response space to obtain basic texture elements, or textons, for each texture class.
  • a K-means clustering technique with k clusters for each class may be used.
  • a texton map may be created.
  • FIG. 2(b) is an example of a texton map that may be so created
  • FIG. 2(a) is an example of an original image from which the texton may is created.
  • the value of each pixel in the texton maps corresponds to the index of the cluster centroid that is closest to the filter response vector at that pixel.
  • the SVM classifier may use as input features a histogram of each texton map image.
  • FIG. 2(c) is an example of a texton histogram generated in accordance with exemplary embodiments of the present invention.
  • the K-means clustering technique may be used to identify clusters and obtain cluster centers.
  • the texton maps may then be computed for the image and the histograms generated. Thereafter, multiple texture features may be calculated. These features may include, for example, RGB textons may be calculated.
  • RBG textons are texture elements calculated directly from the image while still in the RGB color space.
  • H&E textons are texture elements calculated from both the H & E component vectors.
  • Another texture feature that may be calculated is grayscale textons.
  • Grayscale textons are texture elements calculated from the image data transformed into grayscale, which may be, intensity information without color information.
  • Another texture feature that may be calculated is red textons.
  • Red textons are texture elements calculated from the red- value data of the RGB image. In calculating red textons, the green and blue value data is not used. Another texture feature that may be calculated is H textons. H textons are texture elements calculated from only the H component vectors. In calculating H textons, the E component vector data is not used.
  • Exemplary embodiments of the present invention may be able to achieve a greater degree of computational efficiency by restricting the calculation of texture features only to the H textons as these features may be sufficient to perform accurate classification without the use of additional texture-based features.
  • Exemplary embodiments of the present invention may utilize one or more network features in classifying the sample image.
  • Exemplary embodiments of the present invention seek to recognize patterns of cancerous sample images by taking a set of points representing detected nuclei in the histopathology images and finding a structure among these points in the form of edges connecting a subset of the pairs of points.
  • the nuclei may accordingly be detected (Step S106).
  • the nuclei may be detected, for example, using ellipse radial symmetry and edges interested in surrounding structures presenting interest for the pathologists.
  • Exemplary embodiments of the present invention may use an approximation of the relative neighborhood graph, for example, the Urquhart graph, to provide fast computation and good human matching perspective of the shape of the set.
  • Exemplary embodiments of the present invention may take into consideration the distribution of the stromal tissue and segmentation of the extracellular matrix is performed. Based on this approach stromal tissue may be separated. The separation of stromal/non- stromal tissue may be performed by applying the K-mean approach, for example, using four classes (e.g.
  • Post-processing by spatial smoothing may be performed using a Random Walker technique.
  • the random walker prior probabilities may be obtained by fitting a Gaussian mixture model to the K-means clusters.
  • Exemplary embodiments of the present invention may perform separation of structures and a more representative network computation that takes into consideration different structures present in the sample.
  • network cycles may be employed in extracting network statistics designed to capture cancer specific hallmarks.
  • the weighted and un- weighted lengths of different cycles, as well as various statistics, for example, a number of cycles with length greater than three, an average non-triangular cycle length, and a maximum non-triangular cycle length, may be calculated as features.
  • Various network features for capturing the specific pattern of cancer cells in the malignant tissue may be used. Examples of these features include: number of vertices, number of graph components, clustering coefficients, Fiedler values computed from the vertex Laplacian and the edge Laplacian, and average shortest path length.
  • Exemplary embodiments of the present invention may calculate a class of features described as "network features.”
  • Calculating network features may include calculating network cycle features (Step S107a), calculating network distance-based features (Step S107b), and calculating network clustering based features (Step S107c).
  • the network cycle features may take advantage of cycle structure present within the cell networks, created using different networks. First Delaunay triangulation may be computed and Delaunay edges with Euclidean lengths above a particular threshold may be removed. The threshold may be defined as lengths above a value ⁇ . Thereafter, the face-tracing algorithm may be used to extract network cycles and mean Euclidean cycle lengths may be calculated.
  • un-weighted and weighted lengths may be computed, for example, with weights based upon the Euclidean distance. Thereafter, various statistics may be computed as features. These features may include, for example, a number of cycles with length greater than three, an average non-triangular cycle length, and a maximum non- triangular cycle length. Additional features may be computed, for example, the edge
  • Laplacian, Fiedler Value, Kirchoff index, and/or Wiener index may be used to obtain the best classification results, based on different types of networks and features.
  • Urquhart graphs K-nearest neighbor graphs, epsilon-nearest neighbor and Delaunay Triangulation may be used.
  • the average shortest path between nuclei may be an effective feature employed by exemplary embodiments of the present invention.
  • Exemplary embodiments of the present invention may utilize one or more morphometric features in classifying the sample image. Morphometric features may be designed to capture the variation in nuclei size and shape, these features may accordingly be useful in characterizing the sample image. Exemplary embodiments of the present invention may utilize three categories of morphometric features: information extracted from the Hessian matrix, information provided by the Fourier Shape Descriptors, and a feature encoding the spatial arrangement of nuclei surrounding a ductal structure.
  • Step S109 Extracting characteristic features from nuclei size and shape may require good nuclei detection. Extracting a precise location of nuclei may play an important role in detecting cancer in histopathology of breast cancer. Fast Radial Symmetry techniques may be modified and adapted for detecting nuclei in histopathology slides.
  • nuclei segmentation may be performed (Step SI 10).
  • Nuclei segmentation may be performed, for example, using a Random Walker segmentation technique based on the previous accurate detection of nuclei.
  • the Random Walker is a segmentation technique that captures weak or missing boundaries, but is also able to cope with noise in the image, identifying multiple objects simultaneously and avoiding trivial solutions.
  • Random Walker may be initialized with a set of voxels taking one of a predefined set of labels. The probability that a random walker starting at a given unlabeled voxel will first reach a voxel of a particular label is computed, based on solving a so-called Dirichlet problem with boundary conditions at the locations of the seed points and the seed point in question fixed to unity while the others are set to zero.
  • the introduced Random Walker segmentation may utilize input seeds in generating the random walks.
  • Exemplary embodiments of the present invention may utilize an approach, based on ellipse, for detecting nuclei. This approach may take into consideration the points obtained as belonging to the ellipse.
  • the seed points taken into consideration for the Random Walker segmentation may be represented by shrunken ellipses. Shrinkage may be performed, for example, by a factor of two.
  • FIG. 3(a) and (b) are images illustrating Random Walker segmentation of nuclei in accordance with exemplary embodiments of the present invention.
  • the foreground seed points are shown as the dots within the nuclei and the ellipses are shown in solid outline.
  • the Random Walker Segmentation is shown as the dotted outline of each nucleus.
  • FIG. 4 includes a set of images graphically demonstrating segmentation of nuclei using the random walker technique in accordance with exemplary embodiments of the present invention.
  • the seed points are shown as dots. These points may be set as points belonging to the detected ellipses and may be used as seed points for the random walker.
  • FIG. 4(b) these points are dilated by a predefined value to be included as background seeds for the random walker segmentation. In this way, an accurate segmentation can be performed to preserve borders of touching or overlapping nuclei. Allowing the random walker algorithm to segment the image taking into consideration the boundary conditions previously defined, the resulting segmentation is displayed in FIG. 4(c).
  • the ellipses surrounding detected nuclei having a high confidence value may be highlighted using a first color while the ellipses surrounding detected nuclei having a low confidence value may be highlighted using a second color.
  • exemplary embodiments of the present invention might only take into consideration nuclei having a confidence value above a pre-set threshold value so that outliers may be removed.
  • one or more morphometric features may be calculated (Step Si l l). These features may fall into one of the following categories: Hessian based features, Fourier shape descriptors, and angular features.
  • Exemplary embodiments of the present invention may also, or alternatively, utilize morphometric features obtained by clustering the Fourier shape spectra to identify different classes of normal and abnormal nuclei.
  • the Hessian matrix may be computed at each pixel position. Based on the information encoded by the Hessian matrix, a precise analysis of the anatomical structure for that pixel may be performed.
  • the Eigenvalues of the Hessian matrix may be of particular interest as they encode shape information that may be used as features for classification. By analyzing the Hessian matrix at each pixel of the image, a precise analysis can be performed to investigate if the pixel belongs to a blob or to a line structure, as well as if it belongs to a high contrast or to a low contrast region. For 2D structures, the Eigenvalues of the Hessian matrix may provide enough information to distinguish between ridge-like and blob-like structures.
  • the deviation from a blob like structure may be defined as the ratio of the eigenvalues of the Hessian Matrix:
  • the structureness may be used as a measure in differentiating between foreground and background objects:
  • exemplary embodiments of the present invention may focus on finding blob-like structures with dark appearance. This may translate into positive similar values for the two eigenvalues ( ⁇ ⁇ ⁇ 2 » 0) and a high value for the structureness measure.
  • a non-max suppression may be used on the image obtained from the ratio of the Eigenvalues and then the threshold condition imposed on the Frobenius norm image may be used to identify the nuclei for which these Hessian-based features are computed.
  • the feature RB may be higher for malignant nuclei and also malignant nuclei may exhibit a greater variation.
  • the Fourier descriptors may encode size and shape information for different geometric shapes, these descriptors may be extracted and, based on the resulting spectra, features that distinguish malignant nuclei from benign ones may also be extracted.
  • the number of Fourier coefficients represents a relevant parameter for the Fourier Shape Descriptors. This number may be chosen in such a way to encode a best description of the contour. Exemplary embodiments of the present invention may utilize, for example, ten Fourier descriptors.
  • FIG. 5 is a flow chart illustrating steps performed in computing Fourier shape descriptors in accordance with exemplary embodiments of the present invention.
  • accurate nuclei detection may be performed, for example, using a generalized Fast Radial Transform method (Step S51).
  • nucleus boundary segmentation may be performed, for example, using the Random Walker Segmentation (Step S52).
  • Several morphological operations may then be employed to obtain only the boundary of the nucleus (Step S53). Center of masses may then be computed for the nucleus center coordinates, followed by a computation of Euclidean distance from the center to all the boundary points, and the angle, with respect to the center coordinates, is computed (Step S54).
  • the discrete Fourier Transform may be computed for the contour points extracted during Step S54 (Step S55). This may include extract the first cosine term corresponding to the average value of the signal (e.g., the DC component thereof) and the general cosine and sine terms.
  • FIG. 6 includes a set of images and their associated Fourier descriptors, as generated in accordance with exemplary embodiments of the present invention.
  • FIG. 6(a) provides an image of a circular nucleus and its associated Fourier descriptors
  • FIG. 6(b) provides an image of an elliptical nucleus and its associated Fourier descriptors
  • FIG. 6(c) provides an image of an irregular nucleus and its associated Fourier descriptors.
  • the Fourier descriptors are reflective of nucleus shape.
  • the feature extraction method may be designed to accommodate relatively small non- DC values for circular nuclei and a relatively high non-DC peak value for the elliptical nucleus, for example, as may be seen in FIG. 6(b).
  • a non-DC peak followed by a smaller non-DC peak at a higher frequency can be observed.
  • the features used in accordance with exemplary embodiments of the present invention may focus on counting the number of irregular nuclei in each image sample, by detecting the second non-DC peak. As a malignant image may have a high number of irregular nuclei, and a benign image may exhibit small irregularities at the shape of nuclei, these features may be used to classify the image as benign or malignant.
  • Novel angular features may be used in accordance with exemplary embodiments of the present invention to characterize the orientation of nuclei around ductal structures.
  • a benign sample may have nuclei displaced in a regular arrangement with small variation in size and shape.
  • a uniform pattern of nuclei may be considered to be a sign of non-cancerous tissue.
  • the white tissue representing the lumen may be surrounded by epithelium and then by a layer of nuclei, displaced in a parallel way.
  • the glands may be missing and the nuclei may have a high variation in size and shape.
  • the angular nuclei and random orientation observed in malignant images may be considered to be a sign of malignancy.
  • Exemplary embodiments of the present invention may utilize a principal component analysis (PCA) on each of the nuclei surrounding glandular structures.
  • the regular pattern around glands may be captured by computing the angle between the principal direction of the nucleus and the normal vector to the gland surface.
  • FIG. 7 is a diagram illustrating nuclei parallelism used as a basis for establishing morphometric features in accordance with exemplary embodiments of the present invention.
  • FIG. 7(a) illustrates an approach for computing the angle between the principal direction of the nucleus (71) and the normal vector to the gland surface (72).
  • a consistent parallel distribution of nuclei may result in a small standard deviation for angles surrounding the gland, but also in small mean angular values.
  • Gland detection and segmentation may employ the K-means method as an unsupervised learning method.
  • Exemplary embodiments of the present invention may segment the images in four different clusters: glandular lumen, stromal tissue, epithelial-cell cytoplasm, and cell nuclei.
  • the K-means clustering method aims at assigning data into k clusters based on the nearest mean. Since the centroid identification number corresponding to the k clusters is different between images, a consistent assignment must is performed.
  • Each centroid location returned by the K-means algorithm encodes the R, G, and B value corresponding to one of the k tissues. Given the matrix representing the centroid locations as a matrix of 4(tissues) x3(RGB values), the following condition may be applied in obtaining the minimum line index corresponding to the index belonging to the lumen tissue:
  • Noise removal may be performed during the post-processing steps and may be accomplished using a mathematical morphological dilation followed by an erosion (e.g., closing) operation.
  • a connected components methods may then be applied to separate lumen structures for a further analysis.
  • a series of conditions may be applied to ensure accurate gland detection. These conditions may include one or more of:
  • Condition 1 A size threshold may be applied based on empirical observations to reduce structures present in the image that are relatively too small or too large.
  • Condition 2 By applying a median filter with a set neighborhood window, the structure would not be split into more than n structures. Since gland structures usually have a circular aspect, by applying a median filter with a larger neighborhood value long and thin lumen structures that separate into many components and fail to behave as typical glandular structures may be discarded.
  • Condition 3 A gland structure is surrounded in 360° by nucleus.
  • the nuclei segmentation based on Random Walker is used in detecting nuclei surrounding the lumen structure.
  • An empirically selected radius may be used for the circular structure centered in the center of masses belonging to the lumen area.
  • the histogram of nuclei surrounding the lumen area may be computed and used to represent the distribution of the angles.
  • the selection criteria may use a threshold value to select how many empty bins the histogram may have.
  • Nuclei shape may be used as a feature in classifying the sample image. Since exemplary embodiments of the present invention may be concerned with finding an axis indicating the principal direction of the nucleus. A series of conditions may be enforced to ensure that only relevant structures are kept for further analysis. These conditions may include, one or more of:
  • Size Threshold A removal of structures smaller than an empirically determined threshold may be performed to discard small nuclei that may not have a significant impact on classification.
  • Circularity A removal of nuclei having a perfect circular shape may be performed, since the principal direction may not be an accurate measure for this type of nuclei shape. A principal component analysis may be performed on the nuclei shape to find the two
  • the introduced angle computation method may take into consideration nuclei in close relation to the glands. Exemplary embodiments of the present invention may take into account those nuclei encountered in close proximity to the gland borders. A distance transform may be used to obtain the nuclei in a circular radius around the glandular structure. Nuclei that are too far away may then be discarded as not relevant. A Delaunay Triangulation may be performed to make this determination. Given the center of masses for each of the nucleus and the points belonging to the lumen surface, a Delaunay triangulation may be applied. A closer analysis of the obtained triangles may result in discarding nuclei present in the second row. A nucleus will be removed if it has edges connecting only other nuclei centroids.
  • nucleus may be kept for further analysis.
  • all nuclei belonging to triangle 1 have connections to the lumen surface, so their edges will be kept and nucleus b and c will be taken into consideration for future angle computation use.
  • nucleus a For triangle number 2 all the connecting edges belong to nuclei centroid positions, so the nucleus a, will be removed.
  • the set of morphometric features may be captured from the nuclei size based on the Random Walker Segmentation, the Fourier Shape Descriptors, and from the computation of the Hessian matrix for each pixel in the image.
  • the nuclear size may be computed by finding the number pixels belonging to each segmented nucleus.
  • the random walker segmentation may then be applied on the original image data set, pre-processed by extracting only the H channel.
  • Computing the standard deviation of the nuclear size over each of the images from the data set may capture this relevant cancer specific mark.
  • the mean value of the R b defined above may be computed over the entire image. Exemplary embodiments of the present invention may use this feature in distinguishing between malignant and benign samples.
  • Detecting irregularity in nuclei shape may be performed by analyzing the Fourier spectrum.
  • An example of the Fourier spectrum may be seen in FIG. 6.
  • Each of the non-DC values for circular nuclei is relatively small.
  • the irregular nuclei, presented in FIG. 6(c) a non-DC peak followed by a smaller non-DC peak at a higher frequency can be observed.
  • the features used in accordance with exemplary embodiments of the present invention may focus on counting the number of irregular nuclei, for example by detecting the second non-DC peak.
  • the first peak may be considered to occur at a value of 0.15 of the maximum energy value and the second peak may be considered to occur at 0.05 of the maximum energy value. Accordingly, a malignant image will have a high number of irregular nuclei while a benign image will exhibit small irregularities at the shape of nuclei. The feature taken into consideration is the number of irregular nuclei in each sample.
  • exemplary embodiments of the present invention may use as morphometric features, one or more of the following: mean ratio of Eigenvalues of the Hessian Matrix, standard deviation in nuclei size, mean of the angular feature, and irregular number of nuclei, for example, as measured by a Fourier shape descriptor.
  • Step SI 12 After the full set of features has been calculated (Step S102), feature selection may be performed (Step SI 12).
  • the selected features may later be used to train one or more classifiers for distinguishing between benign and cancerous samples.
  • Using and storing excessive features may introduce an undesirable level of complexity.
  • Feature selection may include selecting a subset of the calculated and using only these features in performing classification. By reducing the number of features used in classification, training complexity and computational cost may be minimized. However, by accurately selecting a set of most- discriminating features, little to no accuracy may be sacrificed. For example, a small gain in classification accuracy may be obtained because the classifier may suffer from over-fitting problems when the original large number of features is used.
  • Exemplary embodiments of the present invention may perform feature selection by minimizing redundancy in selected features while maximizing the selection of relevance features.
  • redundancy may be defined as multiple features that serve to differentiate between classes in the same way. Accordingly, according to the combined approach of exemplary embodiments of the present invention, even though features are selected in accordance with their ability to differentiate between the classes (relevance), it may still be possible to reject features that are highly relevant because there may already be selected features that distinguish between classes in the same way. Accordingly, selection extracts features that are minimally redundant among themselves and maximally relevant to the target classes.
  • I is the mutual information and ISI is the number of features in S. Since the remaining features might have a high redundancy property another feature selection is performed after applying the maximum relevance criterion.
  • exemplary embodiments of the present invention may select features based on minimal redundancy.
  • This feature selection criterion selects mutually exclusive features and may be calculated, for example, using the equation
  • exemplary embodiments of the present invention may utilize maximum relevance/minimum redundancy criteria to find the subset of features to use in classification.
  • the subset of features so found may maximize D-R in an incremental manner, for example, by adding one feature at a time.
  • At least one feature may be selected from each category (texture features, network features, and morphometric features).
  • Feature selection may be optionally omitted and in such a case, a predetermined set of features may be used.
  • This predetermined set of features may include, for example, at least one feature from each category, as the type of information captured from features of the various three classes may be complementary and may provide information pertaining to different manifestations in breast histopathology.
  • the predetermined set of features may include at least one feature from each group, including, among the morphometric features: mean ratio of Eigenvalues of Hessian matrix, standard deviation in nuclei size, or mean of the angular feature; one feature from among the network cycle features: mean cycle weighted Euclidean length, number of connected components, or average shortest path between nuclei; and the texture feature H textons.
  • predetermined set of features may include at least one feature listed above from at least one of the groups.
  • predetermined set of features may include one network cycle feature (e.g. mean cycle weighted Euclidean length), two network features (average shortest path between nuclei and number of connected components in the graph), two morphometric features (mean size of nuclei and mean value for the ratio between the eigenvalues of the Hessian matrix), and twenty-two texton features computed on the H channel (e.g., where the histogram for each class contains eleven bins). In total, a twenty- seven-dimensional feature set is used.
  • one network cycle feature e.g. mean cycle weighted Euclidean length
  • two network features average shortest path between nuclei and number of connected components in the graph
  • two morphometric features mean size of nuclei and mean value for the ratio between the eigenvalues of the Hessian matrix
  • twenty-two texton features computed on the H channel e.g., where the histogram for each class contains eleven bins. In total, a twenty- seven-dimensional feature set is used.
  • exemplary embodiments of the present invention may thereafter train a support vector machine (SVM) classifier on these features (Step SI 13).
  • SVM support vector machine
  • SVM is a universal learning algorithm based on the statistical learning theory.
  • Training may be performed, for example, by receiving a set of training images that have been manually classified as benign/malignant and/or according to a degree of malignancy by one or more users, for example, pathologists.
  • the SVM classifier may then be generated to differentiate between the various classifications based on the selected or predetermined subset of features.
  • Each training image may be a microscope view of a tissue sample that the expert user has manually classifier.
  • any number of training images may be used, for example, thirty training images may be used, for example, fifteen malignant images and fifteen benign images.
  • the trained SVM classifier may then be employed by a computer aided diagnosis system (CAD) to classify one or more clinical images to diagnose the presence of malignancy and/or to grade a malignancy (Step SI 14).
  • CAD computer aided diagnosis system
  • exemplary embodiments of the present invention may utilize the above- described approaches for generating and selecting features, training a classifier based on these features, and programing a CAD system to detect malignancy based on the trained classifier.
  • This method may be implemented, for example, in a computer system.
  • the CAD system using the trained classifier may also be implemented, for example, in a computer system.
  • FIG. 8 shows an example of a computer system, which may implement a method and system of the present disclosure.
  • the system and method of the present disclosure may be implemented in the form of a software application running on a computer system, for example, a mainframe, personal computer (PC), handheld computer, server, etc.
  • the software application may be stored on a recording media locally accessible by the computer system and accessible via a hard wired or wireless connection to a network, for example, a local area network, or the Internet.
  • the computer system referred to generally as system 1000 may include, for example, a central processing unit (CPU) 1001, random access memory (RAM) 1004, a printer interface 1010, a display unit 1011, a local area network (LAN) data transmission controller 1005, a LAN interface 1006, a network controller 1003, an internal bus 1002, and one or more input devices 1009, for example, a keyboard, mouse etc.
  • the system 1000 may be connected to a data storage device, for example, a hard disk, 1008 via a link 1007.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

A method for automatically classifying tissue includes obtaining training data including a plurality of microscope images that have been manually classified. A plurality of features is calculated from the training data, each of which is a texture feature, a network feature, or a morphometric feature. A subset of features is selected from the calculated subset of features based on both maximum relevance and minimum redundancy. A classifier is trained based on the selected subset of features and the manual classifications. A diagnostic microscope image is classified in a computer-aided diagnostic system using the trained classifier.

Description

AUTOMATED MALIGNANCY
DETECTION IN BREAST HISTOPATHOLOGICAL IMAGES
CROSS-REFERENCE TO RELATED APPLICATION
[0001] The present application is based on provisional application Serial No. 61/514,085, filed August 2, 2011, the entire contents of which are herein incorporated by reference.
TECHNICAL FIELD
[0002] The present disclosure relates to malignancy detection and, more specifically, to automated malignancy detection in breast histopathological images.
DISCUSSION OF THE RELATED ART
[0003] In the field of disease pathology, histopathology is the examination of tissue in the study of the manifestations of disease. Typically, a histological section of a specimen is placed onto glass slide for study. In some cases this section may be imaged to generate a virtual slide. The analysis of virtual slides by pathologists and computer algorithms is often limited by the technologies currently available for digital pathology workstations as described by Patterson et al., "Barriers and facilitators to adoption of soft copy interpretation from the user perspective: Lessons learned from filmless radiology for slideless pathology" J. Pathol. Inform. 2(1), 2011, E. Krupinski, "Virtual slide telepathology workstation-of-the-future: lessons learned from teleradiology," Sem Diag. Path. 26, pp. 194-205, 2009, and Johnson et al.,"Usingavisualdiscrimination model for the detection of compression artifacts in virtual pathology images," IEEE Trans. Med. Imaging 30(2), pp. 306-314, 2011.
[0004] Methods for Computer Aided Diagnosis (CAD) for histopathology based cancer detection and grading are discussed in Khurd et al., "Computer-aided gleason grading of prostate cancer histopathological images using texton forests," in Proceedings of the 2010 IEEE international conference on Biomedical imaging: from nano to Macro, ISBI' 10, pp. 636-639, (Piscataway, NJ, USA), 2010 and Khurd et al., "Network cycle features:
Application to computer-aided gleason grading of prostate cancer histopathological images," in ISBI, pp. 1632-1636, 2011. Further CAD methods are described by S. Naik, S. Doyle, S. Agner, A. Madabhushi, M. D. Feldman, and J. Tomaszewski, "Automated gland and nuclei segmentation for grading of prostate and breast cancer histopathology," in ISBI, pp. 284-287, 2008, Huang and Lee, "Automatic classification for pathological prostate images based on fractal analysis," IEEE Trans. Med. Imaging 28(7), pp. 1037-1050, 2009, and Tabesh et al., "Multifeature prostate cancer diagnosis and gleason grading of histological images," IEEE Trans. Med. Imaging 26(10), pp. 1366-1378, 2007.
[0005] The histopathological diagnosis is the foundation of modern oncology, and plays a major role in the treatment of many other types of disease. Errors in these reports can critically affect patient care and may become the subject of media concern. Detection of malignancy from histopathological images of breast cancer is a labor-intensive and error- prone process. Generally, biopsy samples are obtained by extracting tissue from a region of suspicion. This may be achieved, for example, using a needle biopsy. A trained practitioner, for example, a pathologist, may then visually inspect the extracted tissue. The pathologist may then make a determination as to whether the sample is benign or malignant based on its appearance. However, this approach may be prone to human error.
SUMMARY
[0006] A method for automatically classifying tissue includes obtaining training data including a plurality of microscope images that have been manually classified. A plurality of features is calculated from the training data, each of which is a texture feature, a network feature, or a morphometric feature. A subset of features is selected from the calculated subset of features based on both maximum relevance and minimum redundancy. A classifier is trained based on the selected subset of features and the manual classifications. A diagnostic microscope image is classified in a computer-aided diagnostic system using the trained classifier.
[0007] The subset of features may include at least one texture feature, at least one network feature, and at least one morphometric feature. The at least one texture feature may be an H texton feature. The at least one network feature may include a mean cycle weighted
Euclidean length, a number of connected components, or an average shortest path between nuclei. The at least one morphometric feature may include a mean ratio of Eigenvalues, a standard deviation in nuclei size, or a mean of angular feature. The classifier may be a support vector machine. The diagnostic microscope image may be a breast histopathological image.
[0008] The classifying of the diagnostic microscope image may include determining whether the image is benign or malignant. The classifying of the diagnostic microscope image may include determining a grade of malignancy. Calculating the plurality of features from the training data may include transforming the training images from an RGB color space to a CMY color space. Calculating the plurality of features from the training data may include obtaining H & E component vectors from the training images. Calculating the plurality of features from the training data may include detecting and segmenting nuclei from the training images. The nuclei detection may be based on fast radial symmetry. The nuclei
segmentation may be based on the Random Walker approach.
[0009] A method for automatically classifying tissue includes obtaining training data including a plurality of microscope images that have been manually classified. A a plurality of features including at least one texture feature, at least one network feature, and at least one morphometric feature, is calculated. A classifier is trained based on the plurality of features and the manual classifications. A diagnostic microscope image is classified in a computer- aided diagnostic system using the trained classifier.
[0010] The at least one texture feature may be an H texton feature. The at least one network feature may include a mean cycle weighted Euclidean length, a number of connected components, or an average shortest path between nuclei. The at least one morphometric feature includes a mean ratio of Eigenvalues, a standard deviation in nuclei size, or a mean of angular feature.
[0011] A method for automatically classifying tissue includes obtaining training data including a plurality of microscope images that have been manually classified. At least one feature, each of which is a texture feature, a network feature, or a morphometric feature, is classified. A classifier is trained based on the at least one feature and the manual
classifications. A diagnostic microscope image is classified in a computer-aided diagnostic system using the trained classifier. The at least one feature includes a feature directed to one or more of the following: mean ratio of Eigenvalues of a Hessian Matrix, standard deviation in nuclei size, mean of angular feature; H texton, mean cycle weighted Euclidean length, number of connected components, or average shortest path between nuclei.
[0012] A method for automatically classifying tissue includes obtaining training data including a plurality of microscope images that have been manually classified. A plurality of features is calculated from the training data. A subset of features is selected from the calculated subset of features. A classifier is trained based on the selected subset of features and the manual classifications. A diagnostic microscope image in a computer-aided diagnostic system is classified using the trained classifier. The method is characterized by basing the selection of the subset of features on both maximum relevance and minimum redundancy. [0013] A computer system includes a processor and a non-transitory, tangible, program storage medium, readable by the computer system, embodying a program of instructions executable by the processor to perform method steps for automatically classifying tissue. The method includes obtaining a microscope image, calculating at least one feature, each of which is either a texture feature, a network feature, or a morphometric feature, from the obtained microscope image, and classifying the diagnostic microscope image based on the calculated at least one feature and a trained classifier. The at least one feature includes a feature directed to one or more of the following: mean ratio of Eigenvalues of a Hessian Matrix, standard deviation in nuclei size, mean of angular feature; H texton, mean cycle weighted Euclidean length, number of connected components, or average shortest path between nuclei.
BRIEF DESCRIPTION OF THE DRAWINGS
[0014] A more complete appreciation of the present disclosure and many of the attendant aspects thereof will be readily obtained as the same becomes better understood by reference to the following detailed description when considered in connection with the accompanying drawings, wherein:
[0015] FIG. 1 is a flow chart illustrating a method for automatically classifying
histopathological images in accordance with exemplary embodiments of the present invention;
[0016] FIG. 2(a) is an example of an original image from which the texton map is created;
[0017] FIG. 2(b) is an example of a texton map that may be created from the original image in accordance with exemplary embodiments of the present invention;
[0018] FIG. 2(c) is an example of a texton histogram generated from the original image in accordance with exemplary embodiments of the present invention; [0019] FIGS. 3(a) and 3(b) are images illustrating Random Walker segmentation of nuclei in accordance with exemplary embodiments of the present invention;
[0020] FIGS. 4(a)-(c) are images graphically demonstrating segmentation of nuclei using the random walker technique in accordance with exemplary embodiments of the present invention;
[0021] FIG. 5 is a flow chart illustrating steps performed in computing Fourier shape descriptors in accordance with exemplary embodiments of the present invention;
[0022] FIGS. 6(a)-(c) is a set of images and associated Fourier descriptors, as generated in accordance with exemplary embodiments of the present invention;
[0023] FIG. 7 is a diagram illustrating nuclei parallelism used as a basis for establishing morphometric features in accordance with exemplary embodiments of the present invention; and
[0024] FIG. 8 shows an example of a computer system capable of implementing the method and apparatus according to embodiments of the present disclosure.
DETAILED DESCRIPTION OF THE DRAWINGS
[0025] In describing exemplary embodiments of the present disclosure illustrated in the drawings, specific terminology is employed for sake of clarity. However, the present disclosure is not intended to be limited to the specific terminology so selected, and it is to be understood that each specific element includes all technical equivalents which operate in a similar manner.
[0026] Exemplary embodiments of the present invention seek to provide an approach for the automatic detection of malignancy in histopathological images. By accurately and automatically performing this analysis, exemplary embodiments of the present invention may be used to complement the specialized opinion of the pathologist, by using an objective judgment, making use of quantitative measures.
[0027] To streamline this process, exemplary embodiments of the present invention provide an efficient Computer Aided Diagnostic (CAD) system that can differentiate between cancerous and non-cancerous tissues that have been stained using a hematoxylin and eosin (H&E) stain. In making this determination, exemplary embodiments of the present invention may utilize a set of novel textural, topological and morphometric features that exploit special patterns of the nuclei cells in breast cancer histopathological images. Support Vector Machine classifiers may then be used on these features to diagnose malignancy. While the full set of features may be used for this purpose, to save computational complexity, feature selection may be performed to determine a subset of features with a high potential for providing an accurate classification. This feature selection may utilize a combination of maximum relevance and minimum redundancy so that high sensitivity and specificity may be achieved. Exemplary embodiments of the present invention may also allow for the use of image compression in classification without significant detriment to performance.
[0028] While exemplary embodiments of the present invention may be described herein with reference to distinguishing between malignant and benign breast cancer samples, based on specially designed textural, topological and morphometric features that can capture representative structures in cancer tissue, the invention is not limited to breast biopsies and the techniques discussed herein maybe applied to automatic classification of other forms of tissue. Moreover, the invention is not limited to simply classifying between benign and malignant tissue, exemplary embodiments of the present invention may also use the techniques described herein to report grades of malignancy.
[0029] FIG. 1 is a flow chart illustrating a method for automatically classifying
histopathological images in accordance with exemplary embodiments of the present invention. First, the tissue sample may be acquired and prepared (Step S100). Sample acquisition may be performed, for example, by needle biopsy or lumpectomy. Preparation may include, for example, mounting the tissue sample on a slide and applying the H&E stain.
[0030] The H&E stain may be applied to enhance the appearance of specific structures within the sample. Hematoxylin is a blue dye that binds to the nuclear chromatin and eosin is an acidic pink dye that binds to the cytoplasmic structures and blood cells. Depending on the application, a separation of these dyes may be helpful.
[0031] Thereafter, the prepared sample may be imaged using a microscope device (Step S101). For example, a line scanner microscope, another form of digital microscope, or an optical microscope may be used. Where an optical microscope is used, a digital image may be captured from the microscope output. In either event, a microscopic digital image of the sample may be procured. The microscopic digital image may then be saved to and later retrieved from an image database where analysis is not performed at the time of acquisition. The image may be, for example, a 512 x 512 pixel image patch taken from a relatively large, for example, several gigapixel H&E stained "virtual slide." The slides may be sampled at, for example, 0.47 microns per pixel, which may correspond to a 40X objective scan.
[0032] Alternatively, or additionally, the images may be compressed and analysis may be performed on the compressed image. While any compression standard may be used, JPEG2000 at 5, 16, 32, 64, or 128 levels of compression are some exemplary cases.
Exemplary embodiments of the present invention may be robust for at least any of the above levels of compression, as classification may primarily depend on nuclei location, which may not be compromised by image compression.
[0033] Next, feature extraction may be performed (Step S102). Feature extraction may include the qualification of various properties found within the image data. Exemplary embodiments of the present invention may utilize one or more features of three distinct areas: morphometric, textural and topological features (which may also be referred to as network features and may include, for example, network cycle features). This is to say, exemplary embodiments of the present invention utilize at least one feature from any one these three categories. However, according to some exemplary embodiments of the present invention, multiple features may be extracted from each of these categories. Accordingly, a wide variety of features may be extracted, for example, a set of anywhere from 20-30 features may be extracted.
[0034] The feature extraction step (Step S102) may include a set of sub-steps (Steps S103- 111), each of which is described in detail below. While FIG. 1 shows various pre-processing steps (i.e., S103, S104, S106, S107, S109, and S110) as being performed with respect to particular categories of feature extraction (i.e., S105, S108, and Si l l), it is to be understood that the results of any preprocessing step may be used in performing any category of feature extraction. The arrangement illustrated in FIG. 1 is offered as an exemplary approach.
[0035] As current imaging techniques provide very high-resolution output, pre-processing these images may reduce the computational cost associated with feature extraction.
[0036] Exemplary embodiments of the present invention may utilize one or more texture features in classifying the sample image. To this end, the acquired microscope image may be transformed from normal RGB color space, in which each pixel is represented as a value of red, a value of blue, and a value of green, into a CMY color space, in which each pixel is represented as a value of cyan, a vale of magenta, and a value of yellow (Step S103). The CMY color space may make it easier to distinguish between tissue absorbing the hematoxylin stain, which may appear substantially cyan, and the eosin stain, which may appear substantially magenta.
[0037] From the image transformed to the CMY color space, separation may be performed to obtain a hematoxylin component vector and an eosin component vector (Step S104).
Thereafter, one or more texture features may be calculated (Step S105). [0038] Approaches for performing texture classification in accordance with exemplary embodiments of the present invention may involve a filtering step followed by clustering. This may be done to identify basic texture elements, which may be referred to herein as "textons." The distribution of these textons may be used by exemplary embodiments of the present invention as a discriminative signature for each tumor grade present in a sample, and may be used as input for a support vector machine (SVM) classifier, for example as described in detail below.
[0039] In building the appropriate classifier, there may be two classes, cancer and non- cancer. Alternatively, where tumor grading is desired, there may be multiple classes of cancer. For every image belonging to cancer/non cancer class, an appropriate rotationally invariant filter bank may be used to extract responses at each pixel level.
[0040] The Maximum Response (MR) filter bank, consists of a number of filters at multiple orientations, but their output consists of a record computed only at the maximum filter response. The MR8 filter is an example of a filter that may be used in texton generation. The MR8 filter is computed at three scales, giving a total of six responses, three for the edge filter and three for the bar filter. The remaining two filters are the Gaussian Filter and the
Laplacian of Gaussian.
[0041] An accurate description of the texture characteristics can be achieved using the high- dimensional feature space of filter-responses, but a sparser representation preserving the information content may be required for the classifier. A clustering technique may be performed on the filter response space to obtain basic texture elements, or textons, for each texture class. A K-means clustering technique with k clusters for each class may be used. By concatenating the cluster centers obtained from the two classes (malignant and benign), a texton map may be created. FIG. 2(b) is an example of a texton map that may be so created, FIG. 2(a) is an example of an original image from which the texton may is created. For each analyzed image, the value of each pixel in the texton maps corresponds to the index of the cluster centroid that is closest to the filter response vector at that pixel. The SVM classifier may use as input features a histogram of each texton map image. FIG. 2(c) is an example of a texton histogram generated in accordance with exemplary embodiments of the present invention.
[0042] As discussed above, the K-means clustering technique may be used to identify clusters and obtain cluster centers. The texton maps may then be computed for the image and the histograms generated. Thereafter, multiple texture features may be calculated. These features may include, for example, RGB textons may be calculated. RBG textons are texture elements calculated directly from the image while still in the RGB color space. Another texture feature that may be calculated is H&E textons. H&E textons are texture elements calculated from both the H & E component vectors. Another texture feature that may be calculated is grayscale textons. Grayscale textons are texture elements calculated from the image data transformed into grayscale, which may be, intensity information without color information. Another texture feature that may be calculated is red textons. Red textons are texture elements calculated from the red- value data of the RGB image. In calculating red textons, the green and blue value data is not used. Another texture feature that may be calculated is H textons. H textons are texture elements calculated from only the H component vectors. In calculating H textons, the E component vector data is not used.
[0043] Exemplary embodiments of the present invention may be able to achieve a greater degree of computational efficiency by restricting the calculation of texture features only to the H textons as these features may be sufficient to perform accurate classification without the use of additional texture-based features.
[0044] Exemplary embodiments of the present invention may utilize one or more network features in classifying the sample image. Exemplary embodiments of the present invention seek to recognize patterns of cancerous sample images by taking a set of points representing detected nuclei in the histopathology images and finding a structure among these points in the form of edges connecting a subset of the pairs of points.
[0045] The nuclei may accordingly be detected (Step S106). The nuclei may be detected, for example, using ellipse radial symmetry and edges interested in surrounding structures presenting interest for the pathologists. Exemplary embodiments of the present invention may use an approximation of the relative neighborhood graph, for example, the Urquhart graph, to provide fast computation and good human matching perspective of the shape of the set. Exemplary embodiments of the present invention may take into consideration the distribution of the stromal tissue and segmentation of the extracellular matrix is performed. Based on this approach stromal tissue may be separated. The separation of stromal/non- stromal tissue may be performed by applying the K-mean approach, for example, using four classes (e.g.
glandular lumen, stroma, and epithelial-cell cytoplasm and cell nuclei), followed by an extraction of the non-stromal tissue. Post-processing by spatial smoothing may be performed using a Random Walker technique. The random walker prior probabilities may be obtained by fitting a Gaussian mixture model to the K-means clusters. Exemplary embodiments of the present invention may perform separation of structures and a more representative network computation that takes into consideration different structures present in the sample.
[0046] By taking into consideration the extracellular matrix and using the Urquhart graphs, network cycles may be employed in extracting network statistics designed to capture cancer specific hallmarks. The weighted and un- weighted lengths of different cycles, as well as various statistics, for example, a number of cycles with length greater than three, an average non-triangular cycle length, and a maximum non-triangular cycle length, may be calculated as features. Various network features for capturing the specific pattern of cancer cells in the malignant tissue may be used. Examples of these features include: number of vertices, number of graph components, clustering coefficients, Fiedler values computed from the vertex Laplacian and the edge Laplacian, and average shortest path length.
[0047] Exemplary embodiments of the present invention may calculate a class of features described as "network features." Calculating network features may include calculating network cycle features (Step S107a), calculating network distance-based features (Step S107b), and calculating network clustering based features (Step S107c). The network cycle features may take advantage of cycle structure present within the cell networks, created using different networks. First Delaunay triangulation may be computed and Delaunay edges with Euclidean lengths above a particular threshold may be removed. The threshold may be defined as lengths above a value σ. Thereafter, the face-tracing algorithm may be used to extract network cycles and mean Euclidean cycle lengths may be calculated.
[0048] Given different cycles, un-weighted and weighted lengths may be computed, for example, with weights based upon the Euclidean distance. Thereafter, various statistics may be computed as features. These features may include, for example, a number of cycles with length greater than three, an average non-triangular cycle length, and a maximum non- triangular cycle length. Additional features may be computed, for example, the edge
Laplacian, Fiedler Value, Kirchoff index, and/or Wiener index. A network parameter variation may be used to obtain the best classification results, based on different types of networks and features. For network creation Urquhart graphs, K-nearest neighbor graphs, epsilon-nearest neighbor and Delaunay Triangulation may be used. The average shortest path between nuclei may be an effective feature employed by exemplary embodiments of the present invention.
[0049] Exemplary embodiments of the present invention may utilize one or more morphometric features in classifying the sample image. Morphometric features may be designed to capture the variation in nuclei size and shape, these features may accordingly be useful in characterizing the sample image. Exemplary embodiments of the present invention may utilize three categories of morphometric features: information extracted from the Hessian matrix, information provided by the Fourier Shape Descriptors, and a feature encoding the spatial arrangement of nuclei surrounding a ductal structure.
[0050] Extracting characteristic features from nuclei size and shape may require good nuclei detection (Step S109). Extracting a precise location of nuclei may play an important role in detecting cancer in histopathology of breast cancer. Fast Radial Symmetry techniques may be modified and adapted for detecting nuclei in histopathology slides.
[0051] After the nuclei have been detected, nuclei segmentation may be performed (Step SI 10). Nuclei segmentation may be performed, for example, using a Random Walker segmentation technique based on the previous accurate detection of nuclei. The Random Walker (RW) is a segmentation technique that captures weak or missing boundaries, but is also able to cope with noise in the image, identifying multiple objects simultaneously and avoiding trivial solutions.
[0052] Random Walker may be initialized with a set of voxels taking one of a predefined set of labels. The probability that a random walker starting at a given unlabeled voxel will first reach a voxel of a particular label is computed, based on solving a so-called Dirichlet problem with boundary conditions at the locations of the seed points and the seed point in question fixed to unity while the others are set to zero.
[0053] The introduced Random Walker segmentation may utilize input seeds in generating the random walks. Exemplary embodiments of the present invention may utilize an approach, based on ellipse, for detecting nuclei. This approach may take into consideration the points obtained as belonging to the ellipse. The seed points taken into consideration for the Random Walker segmentation may be represented by shrunken ellipses. Shrinkage may be performed, for example, by a factor of two. FIG. 3(a) and (b) are images illustrating Random Walker segmentation of nuclei in accordance with exemplary embodiments of the present invention. In FIG. 3(a), the foreground seed points are shown as the dots within the nuclei and the ellipses are shown in solid outline. In FIG. 3(b), the Random Walker Segmentation is shown as the dotted outline of each nucleus.
[0054] FIG. 4 includes a set of images graphically demonstrating segmentation of nuclei using the random walker technique in accordance with exemplary embodiments of the present invention. In FIG. 4(a), the seed points are shown as dots. These points may be set as points belonging to the detected ellipses and may be used as seed points for the random walker. In FIG. 4(b), these points are dilated by a predefined value to be included as background seeds for the random walker segmentation. In this way, an accurate segmentation can be performed to preserve borders of touching or overlapping nuclei. Allowing the random walker algorithm to segment the image taking into consideration the boundary conditions previously defined, the resulting segmentation is displayed in FIG. 4(c).
[0055] In FIG. 3, the ellipses surrounding detected nuclei having a high confidence value may be highlighted using a first color while the ellipses surrounding detected nuclei having a low confidence value may be highlighted using a second color. For nuclei segmentation, exemplary embodiments of the present invention might only take into consideration nuclei having a confidence value above a pre-set threshold value so that outliers may be removed.
[0056] After the nuclei have been detected and segmented, one or more morphometric features may be calculated (Step Si l l). These features may fall into one of the following categories: Hessian based features, Fourier shape descriptors, and angular features.
Exemplary embodiments of the present invention may also, or alternatively, utilize morphometric features obtained by clustering the Fourier shape spectra to identify different classes of normal and abnormal nuclei.
[0057] In calculating Hessian based features, the Hessian matrix may be computed at each pixel position. Based on the information encoded by the Hessian matrix, a precise analysis of the anatomical structure for that pixel may be performed. The Eigenvalues of the Hessian matrix may be of particular interest as they encode shape information that may be used as features for classification. By analyzing the Hessian matrix at each pixel of the image, a precise analysis can be performed to investigate if the pixel belongs to a blob or to a line structure, as well as if it belongs to a high contrast or to a low contrast region. For 2D structures, the Eigenvalues of the Hessian matrix may provide enough information to distinguish between ridge-like and blob-like structures.
[0058] The deviation from a blob like structure may be defined as the ratio of the eigenvalues of the Hessian Matrix:
Figure imgf000017_0001
[0059] The structureness may be used as a measure in differentiating between foreground and background objects:
Figure imgf000017_0002
[0060] Taking into consideration the appearance of nucleus structures in histopathology slides, exemplary embodiments of the present invention may focus on finding blob-like structures with dark appearance. This may translate into positive similar values for the two eigenvalues (λΐ ~ λ2 » 0) and a high value for the structureness measure. A non-max suppression may be used on the image obtained from the ratio of the Eigenvalues and then the threshold condition imposed on the Frobenius norm image may be used to identify the nuclei for which these Hessian-based features are computed. The feature RB may be higher for malignant nuclei and also malignant nuclei may exhibit a greater variation.
[0061] In calculating Fourier shape descriptor features, precise nuclei detection and segmentation may be used. Since the Fourier descriptors may encode size and shape information for different geometric shapes, these descriptors may be extracted and, based on the resulting spectra, features that distinguish malignant nuclei from benign ones may also be extracted.
[0062] The number of Fourier coefficients represents a relevant parameter for the Fourier Shape Descriptors. This number may be chosen in such a way to encode a best description of the contour. Exemplary embodiments of the present invention may utilize, for example, ten Fourier descriptors.
[0063] FIG. 5 is a flow chart illustrating steps performed in computing Fourier shape descriptors in accordance with exemplary embodiments of the present invention. First, accurate nuclei detection may be performed, for example, using a generalized Fast Radial Transform method (Step S51). Thereafter, nucleus boundary segmentation may be performed, for example, using the Random Walker Segmentation (Step S52). Several morphological operations may then be employed to obtain only the boundary of the nucleus (Step S53). Center of masses may then be computed for the nucleus center coordinates, followed by a computation of Euclidean distance from the center to all the boundary points, and the angle, with respect to the center coordinates, is computed (Step S54). In computing the angles, only unique angles associated with distances may be taken into consideration. The discrete Fourier Transform may be computed for the contour points extracted during Step S54 (Step S55). This may include extract the first cosine term corresponding to the average value of the signal (e.g., the DC component thereof) and the general cosine and sine terms.
[0064] Detecting the irregularity in nuclei shape may be accomplished by analyzing the Fourier spectrum produced in accordance with the approach described above. FIG. 6 includes a set of images and their associated Fourier descriptors, as generated in accordance with exemplary embodiments of the present invention. FIG. 6(a) provides an image of a circular nucleus and its associated Fourier descriptors, FIG. 6(b) provides an image of an elliptical nucleus and its associated Fourier descriptors, and FIG. 6(c) provides an image of an irregular nucleus and its associated Fourier descriptors. As can be seen from these figures, the Fourier descriptors are reflective of nucleus shape.
[0065] The feature extraction method may be designed to accommodate relatively small non- DC values for circular nuclei and a relatively high non-DC peak value for the elliptical nucleus, for example, as may be seen in FIG. 6(b). For the irregular nuclei, for example, as may be seen in FIG. 6(c), a non-DC peak followed by a smaller non-DC peak at a higher frequency can be observed. The features used in accordance with exemplary embodiments of the present invention may focus on counting the number of irregular nuclei in each image sample, by detecting the second non-DC peak. As a malignant image may have a high number of irregular nuclei, and a benign image may exhibit small irregularities at the shape of nuclei, these features may be used to classify the image as benign or malignant.
[0066] Novel angular features may be used in accordance with exemplary embodiments of the present invention to characterize the orientation of nuclei around ductal structures. A benign sample may have nuclei displaced in a regular arrangement with small variation in size and shape. A uniform pattern of nuclei may be considered to be a sign of non-cancerous tissue. For a benign case, the white tissue representing the lumen may be surrounded by epithelium and then by a layer of nuclei, displaced in a parallel way. For a malignant slide, however, the glands may be missing and the nuclei may have a high variation in size and shape. The angular nuclei and random orientation observed in malignant images may be considered to be a sign of malignancy. Exemplary embodiments of the present invention may utilize a principal component analysis (PCA) on each of the nuclei surrounding glandular structures. The regular pattern around glands may be captured by computing the angle between the principal direction of the nucleus and the normal vector to the gland surface. FIG. 7 is a diagram illustrating nuclei parallelism used as a basis for establishing morphometric features in accordance with exemplary embodiments of the present invention. FIG. 7(a) illustrates an approach for computing the angle between the principal direction of the nucleus (71) and the normal vector to the gland surface (72). A consistent parallel distribution of nuclei may result in a small standard deviation for angles surrounding the gland, but also in small mean angular values.
[0067] Since investigating the arrangement of nuclei around ducts and glands may utilize an accurate gland and nuclei segmentation, in the following, the gland segmentation and nuclei segmentation post processing steps are introduced.
[0068] Gland detection and segmentation may employ the K-means method as an unsupervised learning method. Exemplary embodiments of the present invention may segment the images in four different clusters: glandular lumen, stromal tissue, epithelial-cell cytoplasm, and cell nuclei. The K-means clustering method aims at assigning data into k clusters based on the nearest mean. Since the centroid identification number corresponding to the k clusters is different between images, a consistent assignment must is performed. [0069] Each centroid location returned by the K-means algorithm encodes the R, G, and B value corresponding to one of the k tissues. Given the matrix representing the centroid locations as a matrix of 4(tissues) x3(RGB values), the following condition may be applied in obtaining the minimum line index corresponding to the index belonging to the lumen tissue:
. stdC
lumeni ndex = mm
meanC } ,
[0070] where stdC is the standard deviation for a sample x containing n elements: s = ( Τ Ί i= 1 ( / ~~ )2) 2and x ~ = n /= 1 x' is the mean value for the values belonging to sample.
[0071] Noise removal may be performed during the post-processing steps and may be accomplished using a mathematical morphological dilation followed by an erosion (e.g., closing) operation. A connected components methods may then be applied to separate lumen structures for a further analysis. For each of the resulted structures, a series of conditions may be applied to ensure accurate gland detection. These conditions may include one or more of:
[0072] Condition 1: A size threshold may be applied based on empirical observations to reduce structures present in the image that are relatively too small or too large.
[0073] Condition 2: By applying a median filter with a set neighborhood window, the structure would not be split into more than n structures. Since gland structures usually have a circular aspect, by applying a median filter with a larger neighborhood value long and thin lumen structures that separate into many components and fail to behave as typical glandular structures may be discarded.
[0074] Condition 3: A gland structure is surrounded in 360° by nucleus. The nuclei segmentation based on Random Walker is used in detecting nuclei surrounding the lumen structure. An empirically selected radius may be used for the circular structure centered in the center of masses belonging to the lumen area. The histogram of nuclei surrounding the lumen area may be computed and used to represent the distribution of the angles. The selection criteria may use a threshold value to select how many empty bins the histogram may have.
[0075] Nuclei shape may be used as a feature in classifying the sample image. Since exemplary embodiments of the present invention may be concerned with finding an axis indicating the principal direction of the nucleus. A series of conditions may be enforced to ensure that only relevant structures are kept for further analysis. These conditions may include, one or more of:
[0076] Size Threshold: A removal of structures smaller than an empirically determined threshold may be performed to discard small nuclei that may not have a significant impact on classification. [0077] Circularity: A removal of nuclei having a perfect circular shape may be performed, since the principal direction may not be an accurate measure for this type of nuclei shape. A principal component analysis may be performed on the nuclei shape to find the two
Eigenvectors and their corresponding Eigenvalues. By computing the ratio of these two Eigenvalues, a measure of circularity may be further analyzed.
[0078] Irrelevant Nuclei: The introduced angle computation method may take into consideration nuclei in close relation to the glands. Exemplary embodiments of the present invention may take into account those nuclei encountered in close proximity to the gland borders. A distance transform may be used to obtain the nuclei in a circular radius around the glandular structure. Nuclei that are too far away may then be discarded as not relevant. A Delaunay Triangulation may be performed to make this determination. Given the center of masses for each of the nucleus and the points belonging to the lumen surface, a Delaunay triangulation may be applied. A closer analysis of the obtained triangles may result in discarding nuclei present in the second row. A nucleus will be removed if it has edges connecting only other nuclei centroids. If there is a direct connection to the lumen surface, the nucleus may be kept for further analysis. In FIG. 7(b), all nuclei belonging to triangle 1 have connections to the lumen surface, so their edges will be kept and nucleus b and c will be taken into consideration for future angle computation use. For triangle number 2 all the connecting edges belong to nuclei centroid positions, so the nucleus a, will be removed.
[0079] Accordingly, the set of morphometric features may be captured from the nuclei size based on the Random Walker Segmentation, the Fourier Shape Descriptors, and from the computation of the Hessian matrix for each pixel in the image.
[0080] The nuclear size may be computed by finding the number pixels belonging to each segmented nucleus. The random walker segmentation may then be applied on the original image data set, pre-processed by extracting only the H channel. Computing the standard deviation of the nuclear size over each of the images from the data set may capture this relevant cancer specific mark. The mean value of the Rb defined above may be computed over the entire image. Exemplary embodiments of the present invention may use this feature in distinguishing between malignant and benign samples.
[0081] Detecting irregularity in nuclei shape may be performed by analyzing the Fourier spectrum. An example of the Fourier spectrum may be seen in FIG. 6. Each of the non-DC values for circular nuclei is relatively small. In the elliptical nucleus presented in FIG. 6(b), only one non-DC peak value is included. For the irregular nuclei, presented in FIG. 6(c), a non-DC peak followed by a smaller non-DC peak at a higher frequency can be observed. The features used in accordance with exemplary embodiments of the present invention may focus on counting the number of irregular nuclei, for example by detecting the second non-DC peak. Here, the first peak may be considered to occur at a value of 0.15 of the maximum energy value and the second peak may be considered to occur at 0.05 of the maximum energy value. Accordingly, a malignant image will have a high number of irregular nuclei while a benign image will exhibit small irregularities at the shape of nuclei. The feature taken into consideration is the number of irregular nuclei in each sample.
[0082] In satisfying these objectives, exemplary embodiments of the present invention may use as morphometric features, one or more of the following: mean ratio of Eigenvalues of the Hessian Matrix, standard deviation in nuclei size, mean of the angular feature, and irregular number of nuclei, for example, as measured by a Fourier shape descriptor.
[0083] After the full set of features has been calculated (Step S102), feature selection may be performed (Step SI 12). The selected features may later be used to train one or more classifiers for distinguishing between benign and cancerous samples. Using and storing excessive features may introduce an undesirable level of complexity. Feature selection may include selecting a subset of the calculated and using only these features in performing classification. By reducing the number of features used in classification, training complexity and computational cost may be minimized. However, by accurately selecting a set of most- discriminating features, little to no accuracy may be sacrificed. For example, a small gain in classification accuracy may be obtained because the classifier may suffer from over-fitting problems when the original large number of features is used.
[0084] Exemplary embodiments of the present invention may perform feature selection by minimizing redundancy in selected features while maximizing the selection of relevance features. Here, redundancy may be defined as multiple features that serve to differentiate between classes in the same way. Accordingly, according to the combined approach of exemplary embodiments of the present invention, even though features are selected in accordance with their ability to differentiate between the classes (relevance), it may still be possible to reject features that are highly relevant because there may already be selected features that distinguish between classes in the same way. Accordingly, selection extracts features that are minimally redundant among themselves and maximally relevant to the target classes.
[0085] Features exhibiting maximum relevance may be selected. Exemplary embodiments of the present invention may approximate maximum relevance with the mean value of all mutual information values between individual feature xt and class c:
Figure imgf000024_0001
[0086] where I is the mutual information and ISI is the number of features in S. Since the remaining features might have a high redundancy property another feature selection is performed after applying the maximum relevance criterion.
[0087] As described above, in addition to selecting features based on maximum relevance, exemplary embodiments of the present invention may select features based on minimal redundancy. This feature selection criterion selects mutually exclusive features and may be calculated, for example, using the equation
Figure imgf000025_0001
[0088] Accordingly, exemplary embodiments of the present invention may utilize maximum relevance/minimum redundancy criteria to find the subset of features to use in classification. The subset of features so found may maximize D-R in an incremental manner, for example, by adding one feature at a time.
[0089] By using this criteria in the manner described, at least one feature may be selected from each category (texture features, network features, and morphometric features).
[0090] Feature selection may be optionally omitted and in such a case, a predetermined set of features may be used. This predetermined set of features may include, for example, at least one feature from each category, as the type of information captured from features of the various three classes may be complementary and may provide information pertaining to different manifestations in breast histopathology. According to one exemplary embodiment of the present invention, the predetermined set of features may include at least one feature from each group, including, among the morphometric features: mean ratio of Eigenvalues of Hessian matrix, standard deviation in nuclei size, or mean of the angular feature; one feature from among the network cycle features: mean cycle weighted Euclidean length, number of connected components, or average shortest path between nuclei; and the texture feature H textons.
[0091] According to another exemplary embodiment of the present invention, the
predetermined set of features may include at least one feature listed above from at least one of the groups.
[0092] According to another exemplary embodiment of the present invention, the
predetermined set of features may include one network cycle feature (e.g. mean cycle weighted Euclidean length), two network features (average shortest path between nuclei and number of connected components in the graph), two morphometric features (mean size of nuclei and mean value for the ratio between the eigenvalues of the Hessian matrix), and twenty-two texton features computed on the H channel (e.g., where the histogram for each class contains eleven bins). In total, a twenty- seven-dimensional feature set is used.
[0093] Regardless of whether a subset of features is determined in accordance with maximum relevance-minimum redundancy (MR-MR) or a predetermined set of features is used, exemplary embodiments of the present invention may thereafter train a support vector machine (SVM) classifier on these features (Step SI 13).
[0094] SVM is a universal learning algorithm based on the statistical learning theory.
Learning is the process of selecting the best mapping function from a set of mapping models parameterized by a set of parameters. Given a finite sample data set ( , _ ;) for i = 1, 2,..., N, where χ. e Rd is a d dimensional input (feature) vector and yi e {-1, 1} is a class label, the objective being to estimate a mapping function f : x→y in order to classify future test samples.
[0095] Training may be performed, for example, by receiving a set of training images that have been manually classified as benign/malignant and/or according to a degree of malignancy by one or more users, for example, pathologists. The SVM classifier may then be generated to differentiate between the various classifications based on the selected or predetermined subset of features.
[0096] In training the SVM, multiple training images may be used. Each training image may be a microscope view of a tissue sample that the expert user has manually classifier. For this purpose, any number of training images may be used, for example, thirty training images may be used, for example, fifteen malignant images and fifteen benign images.
[0097] The trained SVM classifier may then be employed by a computer aided diagnosis system (CAD) to classify one or more clinical images to diagnose the presence of malignancy and/or to grade a malignancy (Step SI 14).
[0098] Accordingly, exemplary embodiments of the present invention may utilize the above- described approaches for generating and selecting features, training a classifier based on these features, and programing a CAD system to detect malignancy based on the trained classifier. This method may be implemented, for example, in a computer system. The CAD system using the trained classifier may also be implemented, for example, in a computer system.
[0099] FIG. 8 shows an example of a computer system, which may implement a method and system of the present disclosure. The system and method of the present disclosure may be implemented in the form of a software application running on a computer system, for example, a mainframe, personal computer (PC), handheld computer, server, etc. The software application may be stored on a recording media locally accessible by the computer system and accessible via a hard wired or wireless connection to a network, for example, a local area network, or the Internet.
[0100] The computer system referred to generally as system 1000 may include, for example, a central processing unit (CPU) 1001, random access memory (RAM) 1004, a printer interface 1010, a display unit 1011, a local area network (LAN) data transmission controller 1005, a LAN interface 1006, a network controller 1003, an internal bus 1002, and one or more input devices 1009, for example, a keyboard, mouse etc. As shown, the system 1000 may be connected to a data storage device, for example, a hard disk, 1008 via a link 1007.
[0101] Exemplary embodiments described herein are illustrative, and many variations can be introduced without departing from the spirit of the disclosure or from the scope of the appended claims. For example, elements and/or features of different exemplary
embodiments may be combined with each other and/or substituted for each other within the scope of this disclosure and appended claims.

Claims

What is claimed is:
1. A method for automatically classifying tissue, comprising:
obtaining training data including a plurality of microscope images that have been manually classified;
calculating a plurality of features from the training data, each of which is either a texture feature, a network feature, or a morphometric feature;
selecting a subset of features from the calculated subset of features based on both maximum relevance and minimum redundancy;
training a classifier based on the selected subset of features and the manual classifications; and
classifying a diagnostic microscope image in a computer-aided diagnostic system using the trained classifier.
2. The method of claim 1, wherein the subset of features includes at least one texture feature, at least one network feature, and at least one morphometric feature.
3. The method of claim 2, wherein the at least one texture feature is an H texton feature.
4. The method of claim 2, wherein the at least one network feature includes a mean cycle weighted Euclidean length, a number of connected components, or an average shortest path between nuclei.
5. The method of claim 2, wherein the at least one morphometric feature includes a mean ratio of Eigenvalues, a standard deviation in nuclei size, or a mean of angular feature.
6. The method of claim 1, wherein the classifier is a support vector machine.
7. The method of claim 1, wherein the diagnostic microscope image is a breast histopathological image.
8. The method of claim 1, wherein the classifying of the diagnostic microscope image includes determining whether the image is benign or malignant.
9. The method of claim 1, wherein the classifying of the diagnostic microscope image includes determining a grade of malignancy.
10. The method of claim 1, wherein calculating the plurality of features from the training data includes transforming the training images from an RGB color space to a CMY color space.
11. The method of claim 1, wherein calculating the plurality of features from the training data includes obtaining H & E component vectors from the training images.
12. The method of claim 1, wherein calculating the plurality of features from the training data includes detecting and segmenting nuclei from the training images.
13. The method of claim 12, wherein nuclei detection is based on fast radial symmetry.
14. The method of claim 12, wherein nuclei segmentation is based on the Random Walker approach.
15. A method for automatically classifying tissue, comprising:
obtaining training data including a plurality of microscope images that have been manually classified;
calculating a plurality of features including at least one texture feature, at least one network feature, and at least one morphometric feature;
training a classifier based on the plurality of features and the manual classifications; and
classifying a diagnostic microscope image in a computer-aided diagnostic system using the trained classifier.
16. The method of claim 15, wherein the at least one texture feature is an H texton feature.
17. The method of claim 15, wherein the at least one network feature includes a mean cycle weighted Euclidean length, a number of connected components, or an average shortest path between nuclei.
18. The method of claim 15, wherein the at least one morphometric feature includes a mean ratio of Eigenvalues, a standard deviation in nuclei size, or a mean of angular feature.
19. A method for automatically classifying tissue, comprising:
obtaining training data including a plurality of microscope images that have been manually classified;
calculating at least one feature, each of which is either a texture feature, a network feature, or a morphometric feature;
training a classifier based on the at least one feature and the manual classifications; and
classifying a diagnostic microscope image in a computer-aided diagnostic system using the trained classifier,
wherein the at least one feature includes a feature directed to one or more of the following: mean ratio of Eigenvalues of a Hessian Matrix, standard deviation in nuclei size, mean of angular feature, H texton, mean cycle weighted Euclidean length, number of connected components, or average shortest path between nuclei.
20. A method for automatically classifying tissue, comprising:
obtaining training data including a plurality of microscope images that have been manually classified;
calculating a plurality of features from the training data;
selecting a subset of features from the calculated subset of features;
training a classifier based on the selected subset of features and the manual classifications; and
classifying a diagnostic microscope image in a computer-aided diagnostic system using the trained classifier,
wherein the method is characterized by basing the selection of the subset of features on both maximum relevance and minimum redundancy.
21. A computer system comprising: a processor; and
a non-transitory, tangible, program storage medium, readable by the computer system, embodying a program of instructions executable by the processor to perform method steps for automatically classifying tissue, the method comprising:
obtaining a microscope image;
calculating at least one feature, each of which is either a texture feature, a network feature, or a morphometric feature, from the obtained microscope image; and
classifying the diagnostic microscope image based on the calculated at least one feature and a trained classifier,
wherein the at least one feature includes a feature directed to one or more of the following: mean ratio of Eigenvalues of a Hessian Matrix, standard deviation in nuclei size, mean of angular feature, H texton, mean cycle weighted Euclidean length, number of connected components, or average shortest path between nuclei.
PCT/US2012/049155 2011-08-02 2012-08-01 Automated malignancy detection in breast histopathological images Ceased WO2013019856A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
US201161514085P 2011-08-02 2011-08-02
US61/514,085 2011-08-02

Publications (1)

Publication Number Publication Date
WO2013019856A1 true WO2013019856A1 (en) 2013-02-07

Family

ID=47629663

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2012/049155 Ceased WO2013019856A1 (en) 2011-08-02 2012-08-01 Automated malignancy detection in breast histopathological images

Country Status (1)

Country Link
WO (1) WO2013019856A1 (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016132367A1 (en) * 2015-02-19 2016-08-25 Ramot At Tel-Aviv University Ltd. Chest radiograph (cxr) image analysis
US10061899B2 (en) 2008-07-09 2018-08-28 Baxter International Inc. Home therapy machine
ES2684373A1 (en) * 2017-12-19 2018-10-02 Universidad De León PROCEDURE AND ARTIFICIAL VISION SYSTEM FOR THE DESCRIPTION AND AUTOMATIC CLASSIFICATION OF NON-PATHOLOGICAL TISSUES OF THE HUMAN CARDIOVASCULAR SYSTEM (Machine-translation by Google Translate, not legally binding)
CN110569882A (en) * 2019-08-15 2019-12-13 杨春立 Image information classification method and device
EP3611694A1 (en) * 2018-08-15 2020-02-19 Koninklijke Philips N.V. System for analysis of microscopic data using graphs
CN110837809A (en) * 2019-11-11 2020-02-25 湖南伊鸿健康科技有限公司 Blood automatic analysis method, blood automatic analysis system, blood cell analyzer, and storage medium
US10607122B2 (en) 2017-12-04 2020-03-31 International Business Machines Corporation Systems and user interfaces for enhancement of data utilized in machine-learning based medical image review
US10671896B2 (en) 2017-12-04 2020-06-02 International Business Machines Corporation Systems and user interfaces for enhancement of data utilized in machine-learning based medical image review
US11393587B2 (en) 2017-12-04 2022-07-19 International Business Machines Corporation Systems and user interfaces for enhancement of data utilized in machine-learning based medical image review

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020165837A1 (en) * 1998-05-01 2002-11-07 Hong Zhang Computer-aided image analysis
US20050041832A1 (en) * 2003-08-20 2005-02-24 Xerox Corporation System and method for digital watermarking in a calibrated printing path
US20090297007A1 (en) * 2008-06-02 2009-12-03 Nec Laboratories America, Inc. Automated Method and System for Nuclear Analysis of Biopsy Images
US20100111396A1 (en) * 2008-11-06 2010-05-06 Los Alamos National Security Object and spatial level quantitative image analysis

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020165837A1 (en) * 1998-05-01 2002-11-07 Hong Zhang Computer-aided image analysis
US20050041832A1 (en) * 2003-08-20 2005-02-24 Xerox Corporation System and method for digital watermarking in a calibrated printing path
US20090297007A1 (en) * 2008-06-02 2009-12-03 Nec Laboratories America, Inc. Automated Method and System for Nuclear Analysis of Biopsy Images
US20100111396A1 (en) * 2008-11-06 2010-05-06 Los Alamos National Security Object and spatial level quantitative image analysis

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
KHURD ET AL.: "Computer-aided Gleason grading of prostate cancer histopathological images using texton forests", IEEE INTERNATIONAL SYMOSIUM ON BIOMEDICAL IMAGING, 14 April 2010 (2010-04-14), pages 636 - 639, Retrieved from the Internet <URL:http://lmb.informatik.uni-freiburg.de/people/bahlmann/data/kh_ba_ma_ka_isbi2010.pdfhttp://ieeexplore.ieee.org/xpl/login.jsp?tp=&arnumber=5490096&url=http%3A%2F%2Fieeexplore.ieee.org%2Fxpls%2Fabs_all.jsp%3Farnumber%3D5490096> [retrieved on 20121001] *

Cited By (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10224117B2 (en) 2008-07-09 2019-03-05 Baxter International Inc. Home therapy machine allowing patient device program selection
US10061899B2 (en) 2008-07-09 2018-08-28 Baxter International Inc. Home therapy machine
US10068061B2 (en) 2008-07-09 2018-09-04 Baxter International Inc. Home therapy entry, modification, and reporting system
US10095840B2 (en) 2008-07-09 2018-10-09 Baxter International Inc. System and method for performing renal therapy at a home or dwelling of a patient
US10089443B2 (en) 2012-05-15 2018-10-02 Baxter International Inc. Home medical device systems and methods for therapy prescription and tracking, servicing and inventory
WO2016132367A1 (en) * 2015-02-19 2016-08-25 Ramot At Tel-Aviv University Ltd. Chest radiograph (cxr) image analysis
US10671896B2 (en) 2017-12-04 2020-06-02 International Business Machines Corporation Systems and user interfaces for enhancement of data utilized in machine-learning based medical image review
US10607122B2 (en) 2017-12-04 2020-03-31 International Business Machines Corporation Systems and user interfaces for enhancement of data utilized in machine-learning based medical image review
US11393587B2 (en) 2017-12-04 2022-07-19 International Business Machines Corporation Systems and user interfaces for enhancement of data utilized in machine-learning based medical image review
US11562587B2 (en) 2017-12-04 2023-01-24 Merative Us L.P. Systems and user interfaces for enhancement of data utilized in machine-learning based medical image review
ES2684373A1 (en) * 2017-12-19 2018-10-02 Universidad De León PROCEDURE AND ARTIFICIAL VISION SYSTEM FOR THE DESCRIPTION AND AUTOMATIC CLASSIFICATION OF NON-PATHOLOGICAL TISSUES OF THE HUMAN CARDIOVASCULAR SYSTEM (Machine-translation by Google Translate, not legally binding)
EP3611694A1 (en) * 2018-08-15 2020-02-19 Koninklijke Philips N.V. System for analysis of microscopic data using graphs
WO2020035508A1 (en) * 2018-08-15 2020-02-20 Koninklijke Philips N.V. System for analysis of microscopic data using graphs
US12100137B2 (en) 2018-08-15 2024-09-24 Koninklijke Philips N.V. System for analysis of microscopic data using graphs
CN110569882A (en) * 2019-08-15 2019-12-13 杨春立 Image information classification method and device
CN110569882B (en) * 2019-08-15 2023-05-09 杨春立 Method and device for classifying image information
CN110837809A (en) * 2019-11-11 2020-02-25 湖南伊鸿健康科技有限公司 Blood automatic analysis method, blood automatic analysis system, blood cell analyzer, and storage medium

Similar Documents

Publication Publication Date Title
Chekkoury et al. Automated malignancy detection in breast histopathological images
Angel Arul Jothi et al. A survey on automated cancer diagnosis from histopathology images
Sari et al. Unsupervised feature extraction via deep learning for histopathological classification of colon tissue images
WO2013019856A1 (en) Automated malignancy detection in breast histopathological images
Doyle et al. Cascaded discrimination of normal, abnormal, and confounder classes in histopathology: Gleason grading of prostate cancer
JP6503382B2 (en) Digital Holographic Microscopy Data Analysis for Hematology
US8712142B2 (en) Method and apparatus for analysis of histopathology images and its application to cancer diagnosis and grading
Xu et al. An efficient technique for nuclei segmentation based on ellipse descriptor analysis and improved seed detection algorithm
CN111028206A (en) Prostate cancer automatic detection and classification system based on deep learning
WO2019048954A1 (en) Tissue staining quality determination
Xu et al. An unsupervised method for histological image segmentation based on tissue cluster level graph cut
WO2016032398A2 (en) Method and device for analysing an image
Bhattacharjee et al. Review on histopathological slide analysis using digital microscopy
US20220108123A1 (en) Tissue microenvironment analysis based on tiered classification and clustering analysis of digital pathology images
EP2174263A1 (en) Malignancy diagnosis using content-based image retreival of tissue histopathology
Hu et al. Breast cancer histopathological images recognition based on two-stage nuclei segmentation strategy
Olgun et al. Local object patterns for the representation and classification of colon tissue images
CN116580397B (en) Pathological image recognition methods, devices, equipment and storage media
Lopez et al. Exploration of efficacy of gland morphology and architectural features in prostate cancer gleason grading
CN116403211A (en) Segmentation and clustering method and system based on single-cell pathology image cell nuclei
Silva et al. Searching for cell signatures in multidimensional feature spaces
WO2014006421A1 (en) Identification of mitotic cells within a tumor region
Song et al. New morphological features for grading pancreatic ductal adenocarcinomas
Chen et al. What can machine vision do for lymphatic histopathology image analysis: a comprehensive review
Ahmad et al. Brain tumor detection & features extraction from MR images using segmentation, image optimization & classification techniques

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 12819470

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 12819470

Country of ref document: EP

Kind code of ref document: A1