%**********************************************************************************
%* *
%* Copyright (c) Leipzig 2006, 2007 G. Wollny, MPI for Evolutionary Anthropology, *
%* Madrid 2007, 2008 G. Wollny, BIT ETSIT, UPM, Madrid, Spain *
%* Permission is granted to copy, distribute and/or modify this document *
%* under the terms of the GNU Free Documentation License, Version 1.1 *
%* or any later version published by the Free Software Foundation; *
%* with no Invariant Sections , with no Front-Cover Texts and with no *
%* Back-Cover Texts. *
%* A copy of the license is available at http://www.gnu.org/copyleft/fdl.html *
%* *
%**********************************************************************************
\documentclass[twoside,a4paper,11pt]{InsightArticle}
\usepackage{times}
\usepackage[english]{babel}
\usepackage[pdftex]{graphicx}
\usepackage{amsmath}
\usepackage{color}
\usepackage{amssymb}
\usepackage{subfigure}
\usepackage{algorithm}
%\makeatletter
%\renewcommand{\@makecaption}[2]{%
% \vskip\abovecaptionskip
% \sbox\@tempboxa{\small #1: #2}%
% \ifdim \wd\@tempboxa >\hsize
% {\small #1: #2\par}
% \else
% {\small\centering #1: #2}
% \fi
% \vskip\belowcaptionskip}
%\usepackage{amsfont}
\newcommand{\Runo}{\rm I\hspace{-0.55ex}R}
\newcommand{\mct}{$\mu$CT }
\newcommand{\snoise}{\ensuremath{\sigma_{\text{noise}}}}
\newcommand{\sig}[1]{\ensuremath{\sigma_{\text{#1}}}}
\newcommand{\A}[2]{\ensuremath{A_{\text{#1}}^{#2} }}
\newcommand{\len}[1]{\ensuremath{L_{\text{#1}} }}
\newcommand{\Elen}{\ensuremath{\epsilon_{\text{length}} }}
\newcommand{\E}[1]{\ensuremath{\epsilon_{#1} }}
\newcommand{\Eenamel}{\ensuremath{\epsilon_{\text{enamel}} }}
\definecolor{mygray}{gray}{0.4}
\newcommand{\avgd}{\ensuremath{\bar{d}} }
\newcommand{\maxd}{\ensuremath{d_{\text{max}} }}
\newcommand{\Domain}{\ensuremath{\Omega} }
\newcommand{\Range}{\ensuremath{\mathbb{V} }}
\newcommand{\Imax}{\ensuremath{v_{\text{max}}} }
\newcommand{\N}{\ensuremath{\mathbb{N}} }
\newcommand{\width}{\text{width} }
\newcommand{\height}{\text{height} }
\newcommand{\Grad}{\bigtriangledown}
\newcommand{\x}{\ensuremath{\mathbf{x}} }
%\input{psfig}
%\psdraft
%\renewcommand{\topfraction}{0.85}
%\renewcommand{\textfraction}{0.1}
%\renewcommand{\floatpagefraction}{0.75}
% INFORMATION TO FILL IN CAMERA READY VERSION
%------------------------------------------------------------------
%\newcommand{\vol}{0} % Number of Volume (default 0)
%\newcommand{\num}{0} % Number (default 0)
%\setcounter{page}{1} % First page (default 1)
%\newcommand{\begpag}{\thepage}
%\newcommand{\finpag}{7} % Last Page
%\newcommand{\yearp}{2000} % Year of the issue (default 2000)
%------------------------------------------------------------------
%\renewcommand{\thefootnote}{\fnsymbol{footnote}}
\newcommand{\U}{\mathbf{U}}
\title{A Software for out-of-core image processing:
Application to the segmentation of tooth in \mct images.}
\author{Gert Wollny$^1$, Matthew M. Skinner$^2$, Anthony J. Olejniczak$^2$, Tanya M. Smith$^2$, \\
Stefan Reh$^2$, and Jean-Jacques Hublin$^2$}
\authoraddress{$^1$ Biomedical Imaging Technologies,ETSI Telecomunicaci\'on, UPM, Madrid, Spain \\
$^2$MPI for Evolutionary Anthropology, Human Evolution, Deutscher Platz 6, 04103 Leipzig, Germany}
\begin{document}
\maketitle
%\pagestyle{myheadings}
% INFORMATION TO FILL IN CAMERA READY VERSION
%-------------------------------------------------------------------
% fill in (authors) the authors of article in following format:
% * 1,2 authors: J. Jones and J. Jones
% * More than 2 authors: J. Jones et al.
%\markboth{\centerline{\small \it Name1 et al. /
% Electronic Letters on Computer Vision and Image Analysis
% \vol(\num):\begpag-\finpag, \yearp}}
% {\centerline{\small \it Name1 et al. /
% Electronic Letters on Computer Vision and Image Analysis
% \vol(\num):\begpag-\finpag, \yearp}}
%-------------------------------------------------------------------
%\hrulefill
\begin{abstract}
FIXME AT THE END.
Paleoanthropological research increasingly employs the use of non-destructive imaging
technologies, such as \emph{micro computed tomography}, to capture and analyse the entire three-dimensional
morphology of objects such as teeth.
For these studies to be successful, a segmentation protocol is required that segments the tissues of interest
properly into homogeneous voxel classes.
Challenges to an automatic segmentation are posed by the imaged object itself (cracks in the tissue,
intensity inhomogeneities induced by fossilisation) and by artifacts introduced by
the imaging process (noise, beam hardening, rings).
Additionally, the acquisition of images at very high resolutions results in data sets that
can not be processed as a whole on current workstation class computers.
In this paper, we propose an out-of-core segmentation approach, that is based on noise reduction
by non-linear filtering, fuzzy c-means classification, and intensity based region growing.
We validate the segmentation approach by analysing its accuracy in the segmentation of synthetic
images corrupted by noise and intensity inhomogeneities and compare its results
to a manually acquired segmentation.
We conclude that the proposed method can be used to obtain a good segmentation,
although a visual inspection of the results is necessary and manual post-processing
might be required, depending on the analysis pursed.
\vspace{0.5cm}
\noindent
{\em Key Words}: image segmentation, image enhancement, \mct, out-of-core processing
\end{abstract}
\hrulefill
\section{Introduction}
\begin{figure}[h]
\centering
\resizebox{\columnwidth}{!}{\includegraphics{original-scans.png}}
\caption{\label{fig:original}
Example of a transverse cross-sections of a fossil primate tooth scan.
Areas of high intensity indicate enamel, low intensities indicate dentin.
The intensity inhomogeneities in the dentin are a result of fossilisation.
}
\end{figure}
Paleoanthropological research increasingly employs the use of
non-destructive imaging technologies, such as \emph{micro computed tomography} (\mct).
Non-destructive imaging techniques introduce several advantages
into the research process, most importantly the preservation of valuable
fossil and extant biological tissues (in lieu of physical and chemical alteration of these
specimens to examine their internal properties).
Researchers only need physical access to the specimen for a very short time,
reducing the possibility of damage to specimens, image data is easily shared, enhancing
collaborative opportunities in the paleoanthropological community,
and analyses based on \mct images may be automated so as to include large samples.
One research area where many samples are available is the analysis of teeth.
Teeth, and especially dental enamel, are mostly resistant to diagenetic alteration and
other degenerative taphonomic processes, and therefore they are often the only
available fossil remains.
Historically, much paleoanthropological emphasis has been placed on the thickness
of dental enamel (see e.g., Smith et al. \cite{smith05enamel} and references therein)
as it relates to paleodiet and systematics of fossil primates and humans.
These studies have been destructive in nature, however, producing physical sections
of teeth in order to distinguish the enamel cap from the underlying dentin.
Moreover, these previous studies have suffered from dimensional loss, having measured
only single section planes from a complex three-dimensional object (i.e., a tooth).
Only recently, studies have begun to make use of non-destructive techniques capturing
the entire three-dimensional morphology of each tooth (e.g., Kono \cite{kono04molar},
Oleniczak and Grine \cite{olejniczak06asses} and references therein).
The comparative analysis of tooth shape also has applications for understanding the function
of teeth as well as reconstructing the taxonomy and phylogeny of living and extinct
mammalian species.
High-resolution computed tomography has made possible accurate 3D digital reconstruction
of both external tooth shape and internal tooth structure.
However, digital reconstructions of tooth shape and structure need to be accurate and reliable
with acceptable levels of inter-observer error.
Given the increasing use of non-destructive measurement protocols and the importance
of such dental studies to the paleoanthropological community,
a segmentation protocol is highly desirable which segments the tissues of interest
(enamel and dentin) properly into homogeneous voxel classes.
Of special interest is hereby the shape of the enamel dentin junction.
Currently, segmentation is usually done in a semi-automatic way, utilising filtering and labeling
tools provided by, e.g., AMIRA \cite{FIXME}.
This approach is tideous and needs a lot of time to be completed.
\begin{figure}[h]
\centering
\resizebox{0.3\columnwidth}{!}{\includegraphics{artifact0080.png}}
\resizebox{0.3\columnwidth}{!}{\includegraphics{artifact0100.png}}
\resizebox{0.3\columnwidth}{!}{\includegraphics{artifact0299.png}}
\caption{\label{fig:beamcrack}
Apart from noise, several artifacts pose challenges for an automatic analysis of the
image data: e.g. \emph{ringing} (left),
intensity gradients induced by \emph{beam hardening} (see middle and right),
\emph{cracks} in the dental tissue (indicated middle), and \emph{matrix} (indicated right).
}
\end{figure}
To obtain segmentation into the three classes: \emph{background}, \emph{dentin},
and \emph{enamel} various obstacles must be overcome:
images acquired using \mct are corrupted mostly by noise resulting from a
fluctuation in x-ray detection, which is an intrinsic property of x-ray imaging
that currently cannot be prevented (cf. e.g. \cite{riederer78:noise}).
Additional noise may originate from the scintillator structures and/or the detector electronics.
Reconstruction of the cross sections from projection images introduces \emph{ring artifacts},
and finally, the quantification of the intensity values adds another level of noise.
The use of non-monochromatic x-ray may introduces radial intensity gradients in the images,
known as \emph{beam hardening} that will result in an overlap of tissue intensities
(see Fig. \ref{fig:beamcrack} middle and left).
However, the effect of beam hardening can be reduced by applying proper algorithms during
reconstruction of the 3D data from the projections and will, therefore, not be discussed further.
In addition to problems intrinsic to the image acquisition, the research objects (teeth)
themselves pose certain challenges: intensity inhomogeneities in the tissues are induced
by, e.g., diagenetic remineralization and other taphonomic processes
(see e.g. Figure \ref{fig:original} second and third image).
Cracks in the tissues, which form during fossilisation, with degeneration,
and/or drying of dental tissues, make the automatic analysis of surface areas and shape
difficult if not impossible.
Matrix, (i.e. matter the fossil was embedded in when found) is often attached to the dental tissue.
In the comparative analysis of tooth shape, also teeth of FIXME may be imaged, that are not yet fully developed.
Here the enamel may not be fully mineralised, exhibiting gray scale values that otherwise would
correspond to dentine (FIXME:Figure).
Finally, since the image data is acquired at a high resolution (50 pixels per mm and more) the
resulting data sets are very large (usually 6GB but possibly up to 20 GB per tooth).
Processing this data on workstation class computers requires the use of algorithms that can work out-of-core.
In essence. three types of tooth data need to be considered: images of ...
\begin{enumerate}
\item fully developed teeth that are clean of matrix and that exhibid no or only
little intensity inhomogeneities.
\item fossile teeth that are embeddend in matrix and/or exhibid strong intensity
inhomogeneities, mostly in the dentine.
\item modern teeth that exhibid strong intensity inhomogeneities, mostly in the enamel.
\end{enumerate}
In the first case, we expect to obtain a full segmentation would require only an short
inspection of the results.
In the other two case, segmentation is an ill-posed problem that can not be
solved fully automatically.
Here we merely aim to provide the tools that can enhance the image quality to improve
the semi-automatic segmentation experience significantly.
Because in the highly competetive field of paleoanthropological research, getting these results
fast is of imerative interest to the researcher.
Therefore, we strife for an image enhancement and segmentation tool that allows
to process a high number of data sets on a work station class computer within a relative short time.
For that reason, we ruled out a number of image processing techniques either because
there implementation doesn't seem to offer a big advantage, or because processing time by
using these techiques would increase considerably.
Specifically, we will not discuss techniques to remove beam hardening effects since this is better
dealt with at the image reconstruction from the projections, texture analysis techniques to
distinguish tissues and matrix, frequency domain filtering, and iterative
filtering methods that in an out-of-core setting are just too time demanding.
\emph{FIXME: This will be reviewed at the end}
The reminder of this paper will introduce and discuss the filtering an segmentation protocol we settled on.
This includes the discussion of edge preserving smoothing filters and their usability for noise reduction and
fuzzy c-means classification as means to obtain a tissue classification.
In order to account for the intensity inhomogeneities, we introduce a region growing algorithm that is
seeded based on the fuzzy c-means classification.
Then we dedicate a section to discuss the specifics of the out-of-core processing.
A validation will substantiate the application of our segmentation approach to real image processing
problems and give quantitative error margins for the segmentation results.
Finally, we will discuss the suitability of this tool chain for enhancemant and segmentation
in the three cases given above.
\section{Segmentation}
In the following, an image is defined as mapping $I:\Domain \rightarrow \Range$
from its domain \Domain to the intensity range $\Range$.
In our application, the image domain \Domain is a finite grid
$\Domain = \{(0,1, ..., \width - 1) \times (0,1, ..., \height - 1)\} \subset \N^2$ and
the intensity range is given as $\Range = \{0,1, ..., \Imax$ with $\Imax=2^n-1$ being the maximum
possible intensity for for an image with a grayscale depth of n bit per pixels.
The 3D data is given as a stack of 2D images $I_z$.
$I(\x)$ signifies the pixel intensity at location $\x \in \Domain$.
$H(I)$ defines the histogram of the image $I$, $C(\x)$ defines the classification of a pixel, and
$P(c, I(\x))$ describes the probability that the pixel of image $I$ at location $\x$ is of class $c$.
\subsection{Noise reduction}
Given the high resolution of the 3D data considered, out-of-core processing is necessary.
In order to keep processing times as short as possible, only smoothing and edge preserving filters
are of interest to us that can be applied to the 3D data by processing the volume in a single pass.
This rules out frequency domain filters as well as edge-preserving filters
that are based on the solution of partial-differential equations that needs to be solved iteratively
(like anisotropic diffusion \cite{black98:aniso,perona90:aniso}).
Smoothing filters that can be applied in a single pass are linear filters like the
\emph{box} and the \emph{Gaussian} filter and non-linear filters like
the \emph{median} filter \cite{gonzales02:dip}.
Well-known edge preserving filters are e.g. methods that exploit local statistics,
like Lee \cite{lee86:speck} and Durand \cite{durand87:sar},
sub-window based approaches like the \emph{Kuwahara} filter \cite{kuwahara76:proces},
or, by extending the latter, \emph{value and criterion}
filter structures \cite{schulze94morphologybased} that design filters based on mathematical morphology.
Schulze and Wu showed that for images corrupted by speckle noise, filtering with a morphological
value-and-criterion filter ---
specifically the Mean of Least Variance (MLV) filter --- results in less prominent noise and sharper edges
than the application of local statistics based filters \cite{schulze95:noise}.
The MLV filter is defined like follows:
Given a structuring element $K$, the intensity value of a pixel $x \in \Omega$ is obtained like follows:
For each structuring subset containing $x$ evaluate the variance of this subset.
Pick the subset with the lowest variance, and set the intensity at $x$ to its mean.
$\forall K_i \subset \Omega, x \in K_i$, $v_i = V_{x_{ki} \in K_i} x_{ki}$.
Pick $i_{\text{min}} := \min_i (v_i)$ and set $x := \frac{\sum_{x_k \in K_{i_{\text{min}}}} x_k}{size{K}}$.
The MLV filter also provides an excellent corner response for images from a variety of sources \cite{schulze94:biomed}.
As en example an comperation with the the edge preserving Kuwahara filter is given below:
The Kuwahara filter can be considered as a MLV filter with limited support which provides
good edge preservation and sharpening.
However, for large filter width needed to reduce the noise level considerably, filtering artifacts are introduced at sharp features
(see Fig. \ref{fig:kuwaextcmp}, middle).
An MLV-filter with full support, using the same filter width avoids these artifacts (Fig. \ref{fig:kuwaextcmp}, right).
\begin{figure}[h]
\centering
\resizebox{0.3\columnwidth}{!}{\includegraphics{noisepeak.png}}
\resizebox{0.3\columnwidth}{!}{\includegraphics{nkuwahara4.png}}
\resizebox{0.3\columnwidth}{!}{\includegraphics{nextkuwa4.png}}
\caption{\label{fig:kuwaextcmp}
L.t.r. original image, filtered with a Kuwahara filter of width 4, filtered with an MLV
filter of width 4 and squarish support.
Note the filtering artifact induced by the Kuwahara filter in the middle image and the peak
that is preserved by the MLV filter.
}
\end{figure}
Hence we will restrict out analysis to combinations of the \emph{median} and \emph{Gaussian} smoothing filters and the
edge enhancing MLV filter with squarish support.
\subsection{Classification}
In CT images, intensities are mapped according to the Hounsfield scale \cite{kak01princ},
resulting in high intensities for tissues with a high mineralisation, while low intensities correspond to tissue with
a low mineralisation.
A classification of tissues is, therefore, directly related to the pixel intensities.
In a tooth, enamel exhibits high mineralisation and dentin is (normally) of a low mineralisation.
In the ideal case, a simple threshholding could provide the separation of different tissues.
However, given the intensity inhomogeneities, a more sophisticated tissue classification approach is required.
Approaches known from MRI image segmentation that correct intensity inhomogeneities
(here to correct inhomogeneities in the bias field, e.g. \cite{ahmed02:cmeans,pham99:fuzzy})
necessitate segmenting the entire 3D image at once, making this method impractical
for the image data at hand.
Instead, we ran a c-means classification based on the intensity histogram of the whole image stack.
The objective function of a fuzzy c-means (FCM) based partitioning $\{x_k\}_{k=1}^N$ into $c$ clusters with
class centres $\{v_i\}_{i=1}^c$ can be given by
\begin{equation}
\label{eq:basic}
J := \sum_{i=1}^c \sum_{k=1}^N u_{ik}^p d(x_k, v_i).
\end{equation}
\noindent
with
\begin{equation}
\label{eq:sumprob}
\sum_{i=1} u_{ik} = 1 \forall k
\end{equation}
\noindent
Here, $d(x,v)$ is a distance function, $u_{ik} \in [0,1]$ are the probabilites that pixel $k$ belongs to class $i$,
and $p$ is a weighting parameter that determines the amount of fuzziness.
For simplicity p is set to 2 for the remainder of this paper.
\begin{figure}[ht]
\centering
\resizebox{0.49\columnwidth}{!}{\includegraphics{original.pdf}}
\resizebox{0.49\columnwidth}{!}{\includegraphics{exp_prob.pdf}}
\caption{\label{fig:prob}Probability distribution, left according to the standard FCM formulation,
right according to our formulation.}
\end{figure}
\noindent
Minimising $J$ in (\ref{eq:basic}) under the condition (\ref{eq:sumprob}) results in:
\begin{equation}
\label{eq:derive}
u_{ik} := \frac{d(x_k, v_i)^{-1}}{\sum_{j=1}^c d(x_k, v_j)^{-1}}.
\end{equation}
\noindent
In the standard FCM formulation $d(x,v) = (x-v)^2$ \cite{bezdek91:fuzzy}.
Hence, (\ref{eq:derive}) yields
\begin{equation}
\label{eq:probfcm}
u_{ik} := \frac{(x_k - v_i)^{-2}}{\sum_{j=1}^c(x_k - v_j)^{-2}}.
\end{equation}
\noindent
An example probability distribution for three clusters with centres at 0, 0.75, 1.0
is given in Figure \ref{fig:prob} (left).
\noindent
As can be seen in this figure, the probabilities have various local maxima resulting in
unexpected classification results as can be seen in Figure \ref{figProbComp}, (middle).
\begin{figure}[ht]
\centering
\resizebox{0.3\columnwidth}{!}{\includegraphics{original.png}}
\resizebox{0.3\columnwidth}{!}{\includegraphics{standard-white.png}}
\resizebox{0.3\columnwidth}{!}{\includegraphics{exp-white.png}}
\caption{\label{figProbComp}
From left to right: Original image, probability for class 3 if classified into 3 classes with the standard FCM formulation and
with the exponential formulation.
Note the undesired high probability of the boundary to the background in the middle image.
}
\end{figure}
\noindent
To avoid this undesired classification, we propose to use another distance function
$d(x,v) = e^{\frac{(x_k-v_i)^2}{2\sigma^2} }$ resulting in the probability function
\begin{equation}
\label{eq:probfcm2}
u_{ik} = \frac{ e^{-\frac{(x_k-v_i)^2}{2\sigma^2}}}
{\sum_{j=1}^c e^{-\frac{(x_k-v_j)^2}{2\sigma^2}}}
\end{equation}
\noindent
with $\sigma$ describing the ``sharpness'' in the class distinction.
Its probability distribution with $\sigma=\frac{1}{8}$ is given in \ref{fig:prob} (right), and it can be
seen that the probability distribution exhibits only one maximum (which is global) for a
given class centre, resulting in a better segmentation at class interfaces.
Given the above results, we have now obtained a fuzzy c-means classification of the input images that
assigns probabilities for enamel, dentin, and background to each pixel intensity value.
\subsection{Tissue Labelling}
The last step in the segmentation procedure is the labelling of the two tissue types and the background.
Since different mineralisation levels result in intensity inhomogeneities, a na\"\i ve labelling that is
based on the highest class-probability assigned to a certain pixel intensity, would often result in mislabelling
(see Figure \ref{FigLabel} (c) and (d)).
Because the junction between tissues often exhibits a low intensity gradient (Figure \ref{FigLabel}(b)), gradient based segmentation,
like the watershed algorithm, also don't yield good segmentation results.
\begin{figure}[ht]
\centering
\subfigure[original]{\resizebox{0.24\columnwidth}{!}{\includegraphics{original1312.png}}}
\subfigure[detail]{\resizebox{0.24\columnwidth}{!}{\includegraphics{smooth1312.png}}}
\subfigure[na\"\i ve labelling]
{\resizebox{0.24\columnwidth}{!}{\includegraphics{naive1312.png}}}
\subfigure[region growing]{\resizebox{0.24\columnwidth}{!}{\includegraphics{regiongrow1312.png}}}
\caption{\label{FigLabel}
Filtered image and segmentation results: (b) shows the smooth transition between tissues in detail, (c) and (d)
visualise segmentation results for the dentin.
}
\end{figure}
\noindent
To overcome these problems, intensity based region growing is employed.
To segment, e.g. the enamel, the segmentation is seeded by pixels with a very high probability to be enamel (e.g. $0.99$).
Then region growing is applied by adding all pixels with intensity values above a second, lower threshold (e.g. $0.65$).
This procedure is applied to the whole image stack by passing over the slices, and keeping a certain amount
of slices in memory for growing ``backwards''.
Hence, the region growing is applied in a true 3D manner.
A result of the region growing is given in Figure \ref{FigLabel} (right).
In a similar fashion the other two tissues an be segmented.
However, depending on the image source, dentine and/or enamel may exhibit strong intensity inhomogeneities.
In these cases, it is best to only segment background and the tissue with the least intensity inhomogeneities
by using region growing, and evauate the other tissue as the reminder of the image
that is not yet segmented.
As the result of the segmentation process, tree binary masks are obtained correspond to enamel, dentine, and background.
\subsection{Post-Processing}
FIXME: this section should come a little bit more formal
Matrix is attached to the tooth, it is usually segmented as dentine.
If the both do not share a junction, matrix can be removed from the segmentation by
labeling and picking connected components.
Out-of-core labling is a two-pass procedure:
In the first pass, a variable keeps track of the absolute number of labels, and a join-map keeps track of joined labels.
The first slice of the binary is loaded and connected components are labeled in a 2D manner,
the label image is stored to auxilliary memory and a variable keeps track of the numbers of labels so far.
Then, the next mask slice is loaded, and for each mask pixel, the label from the previos slice is taken - if it is not zero.
The label areas are then grown by adding connected pixels.
If a neighboring pixel is already labeled with another label, a join-map entry is generated that keeps track of
the joining of labels.
Finally, all unlabelled pixels of the sloce are labeled in a 2D fashion adding to the absolute number of used labels
and the label image is stored to auxilliary memory.
This process is repeated for all slices.
After the first pass is finished, the join-map is re-ordered, to map all joined labels to one number.
In the second pass, all label images are re-encoded, by replacing the label number based on the re-ordered join-map.
The largest connected component will usually correspond to the dentine, whereas the other components correspond to matrix.
To pick the largest component automatically, a tree-pass operation is needed ``histogram, remap, threshold'':
First, the histogram of the label images needs to be evaluated.
Then, the histogram is sorted to let the lowest label numbers correspond to the highest number of pixels and
a re-labelling map is created.
In the second pass, the label images are re-labeled accordingly.
In the third pass, simple tresholding of the resulting images by a maximum of 1 will create the
requested mask.
If cracks in the image split the dentine into two or more parts, in most cases, these parts
still form the bodies with the highest pixel counts in the label image, and
its segmentation can be done by increasing the threshold to the number of parts.
\section{Software Implementation}
\subsection{Arcitecture}
The software is implemented in C++ and is build on a plug-in based arcitecture.
\subsection{Out-of-core processing}
There are two types of out-of-core processing that are needed for the processing discussed below:
accumulation and filtering.
Accumulation can be implemented streightforward, since it only required to
load each image slice of a set, and accumulate the requested quantity, like, e.g., a histogram.
Filtering, on the other hand is implemented based on a \emph{first in first out} buffer that
we call a \emph{first in first out filter} (FIFOF).
These filters can be chained together, so that different filters can be run one after another, without
writing intermediate results to the hard disk.
The last FIFOF in the chain usually implements the disk write operation.
Apert from the chaining operation, of the two operations a FIFOF makes visible to the user
one provides the means to push an image slice into the FIFOF chain and the other to
finalize the filtering.
Each FIFOF itself contains a buffer to accumulate and process slice data according
the implemented filter.
\begin{algorithm}
while (exist input-slice)
\end{algorithm}
During the course of the operation of a FIFOF. three stages can be distinguised:
\begin{itemize}
\item First slices are pushed into the FIFOF, until the buffer is filled.
This push operation might already prepare the input data in some way.
For example, in the (seperable) Gaussian filter, a the buffer stores slices that
are already filtered in two dimensions.
\item When the buffer is filled, it is processed to obtain an output slice that is then
pushed further down the chain.
In addition, the first slice is removed from the buffer, making room for the next input.
\item When no more input is available, the finalize operation of the FIFOF is executed.
It takes care to provide the filtering of the remaining buffer content and to push its result
down the filter chain.
\end{itemize}
Whith this filter structure, the requirement on working memory for a filter chain depends
on the filters provided, and the size of an image slice of the image stack to be processed.
Given an image slice size of $w \times h$, and since the filter width is usually small compared to image size, the
memory requirement of a filter chain can be expressed as $O(wh)$ and, therefore,
independened from the number of slices to be processed,
\section{Validation and Error Estimation}
In the following, we will focus on the validation of the segmentation algorithm given above.
We employed a three-fold validation procedure to the above segmentation approach:
\begin{enumerate}
\item A synthetic image was corrupted by additive Gaussian noise, segmented, and
the results were then compared to Ground Truth, i.e. the segmentation of the original image.
\item A synthetic image was distorted by using a frequency band dampening filter, corrupted by
additive Gaussian noise, segmented and the results were then compared to Ground Truth.
\end{enumerate}
\noindent
The first and second validation assesses the performance in the ideal case, when all parameters influencing
the image quality can be quantified and Ground Truth is known.
Measuring the segmentation accuracy for tissue class $c$ is done by evaluation the ratio of
mis-labelled pixels to the (known) number of pixels if the respective class resulting
in an percentual segmentation error $S$ defined as
\begin{equation}
\E{c} := \frac{n\left( (\A{ref}{c} \cup \A{seg}{c}) \setminus (\A{ref}{c} \cap \A{seg}{c})\right)}{n(\A{ref}{c})} \cdot 100\%
\end{equation}
where \A{ref}{c} and \A{seg}{c} are, respectively, the set of voxels of tissue class $c$ in the reference image
(which is Ground Truth) and the segmented distorted image; $n(\A{.}{c})$ is the number of voxels in class $c$.
Since the enamel-dentin junction area and shape are also of interest in 3D measurements, an assessment of the error of the
enamel-dentin junction line in the 2D image was conducted too.
Here the junction line is segmented by dilating both, the segmentation of the enamel and
the segmentation of the dentin, by using a 4n mask, and evaluating the overlap of both segmentations:
\begin{equation}
I_{\text{border}} := dilate(I_{\text{enamel}}, 4n) \cap dilate(I_{\text{dentin}}, 4n)
\end{equation}
\noindent
The following measurements provide information about the accuracy of the segmentation of this junction line:
First, the number of pixels of the border image $I_{\text{border}}$ is counted. ---
It is proportional to the length of the junction.
The percentual length error \Elen is then evaluated according to
\begin{equation}
\Elen{length} := \frac{|n(\len{ref}) - n(\len{seg}) |}{n(\len{ref})} \cdot 100\%
\end{equation}
Additionally, the average distance \avgd of the pixels of the segmented line
$x \in \len{seg} $ from the reference line \len{ref} is evaluated
\begin{equation}
\avgd := \frac{\sum_{x \in \len{seg}} dist(x, \len{ref})}{n(\len{seg})}
\end{equation}
with $dist(x,y)$ being the Euclidean distance between points $x$ and $y$.
Finally, the maximum distance \maxd of all pixels $x \in \len{seg} $ from the reference line \len{ref} is evaluated
\begin{equation}
\maxd := max_{x \in \len{seg}} dist(x, \len{ref}).
\end{equation}
\noindent
These measures \Elen{length}, \avgd, and \maxd provide insight into the shape preservation of the enamel-dentin junction.
\subsection{Synthetic Images corrupted by Noise}
\begin{figure}[ht]
\centering
\resizebox{0.24\columnwidth}{!}{\includegraphics{tooth.png}}
\resizebox{0.24\columnwidth}{!}{\includegraphics{noise04000.png}}
\resizebox{0.24\columnwidth}{!}{\includegraphics{noise08000.png}}
\resizebox{0.24\columnwidth}{!}{\includegraphics{noise16000.png}}
\caption{\label{FigSynth}
Original image, images corrupted by additive noise with $\sigma =$ 6\% (= $0.6 \Imax$), 12\%, and 24\%.
}
\end{figure}
A hand drawn tooth-like grey-scale image of size $512\times512$ was created.
Enamel was set to the uniform value $\frac{\Imax}{2}$, and dentin was modelled with the value $\frac{9}{50} \Imax $.
These values approximate the output of the \mct reconstruction software.
The image was then filtered with a Gaussian of width 3 to simulate the partial volume effect
and corrupted by Gaussian noise with a standard deviation up to $\frac{\Imax}{2} = 50\%$ (see Figure \ref{FigSynth}).
For a statistical assessment of the segmentation error, 20 images were created and segmented at each noise level.
The optimal size of the median, Gaussian and MLV filter depends on the noise level.
We applied combinations of those filters with widths of $\{2,3,\ldots,10\}$.
For the region growing, the seed threshold was set to 0.99 and the low threshold was set to 0.65.
For each noise level the best segmentation results with respect to the quantities \avgd, \maxd,
\Elen, and $\Eenamel$ are recorded.
\begin{figure}[h]
\centering
\subfigure[ \Eenamel (\%)]{\resizebox{0.45\columnwidth}{!}{\includegraphics{area.pdf}}}
\subfigure[\Elen(\%)]{\resizebox{0.45\columnwidth}{!}{\includegraphics{len.pdf}}}
\subfigure[\avgd (Pixel)]{\resizebox{0.45\columnwidth}{!}{\includegraphics{dist.pdf}}}
\subfigure[\maxd (Pixel)]{\resizebox{0.45\columnwidth}{!}{\includegraphics{mdist.pdf}}}
\caption{\label{FigErrorsAL}
The enamel segmentation error and the length error $\Elen$ of the enamel-dentin junction increases
more or less linearly with the noise level of the images.
The length measurement is more sensitive to noise resulting in a higher fluctuation of the error.
The distance of the segmented tissue junction from Ground Truth with reference to the noise level in pixels.
For noise with a $\sigma$ below 20\% the segmentation is nearly perfect.
}
\end{figure}
The segmentation error $\Eenamel$, as well as the length error $\Elen$ increase
more or less linearly with the noise level (Figure \ref{FigErrorsAL}(a,b)).
However, as can be seen from the high variation in its error, the junction length is much more sensitive to the noise.
For a noise level of $\sigma \le 20\%\:\Imax$ the segmentation of the junction line is nearly perfect.
In average the maximum distance of the segmented junction line is below 1 pixel (Figure \ref{FigErrorsAL}(c)).
For higher noise levels more outliers are present, which is represented by the high average maximum distances (Figure \ref{FigErrorsAL}(d)).
The error in the boundary recovery - expressed by \maxd and \Elen - is more sensible to changes in the filter
combination then the tissue class area.
Therefore, if both enamel segmentation and the enamel-dentin junction are of interest it is
advisable to pick a filter combination that results in a better edge preservation (and enhancement).
\emph{Replace this table by some images that illustrate the segmentation results}
For all noise levels, the best results were obtained by first applying a smoothing filter
(Gauss or median) and then the non-linear MLV filter.
For images with a high noise level, pre-filtering with a median filter is always better then
employing a Gaussian smoothing.
For images with low noise level, pre-filtering with a Gaussian filter doesn't yield considerable better results then
pre-filtering with a median filter.
Therefore, in all remaining validation experiments, only combinations of a median filter followed by a MLV filter were applied.
\subsection{Synthetic Images corrupted by Noise and Intensity Inhomogeneities}
In order to account for intensity inhomogeneities too, a second set on synthetic data was run.
Here, the pixels of the original image $I(\x)$ were first distorted by a Gaussian band
dampening filter $H(\x)$ (see Figure \ref{FigSynth2}).
\begin{equation}
\label{eq:banddamp}
H(\x):= 1.0 - s \cdot e^{-\frac{1}{2}\left[\frac{I(\x)^2 - f^2}{I(\x)^2 \cdot w}\right]}
\end{equation}
with $f$ the band centre frequency, $w$ the band width, and $s \in [0,1]$ the dampening factor.
$s=0.0$ accounts for no damping and with $s=1.0$ (\ref{eq:banddamp}) comprises a Gaussian band-reject filter.
The experiments were run for noise levels of $\sigma =6\%, 12\%, 18\%, 27\%, and 35\% $, $f=9$,
$w=3$ and $s \in \{ 0.0, 0.3, 0.5, 0.7 \}$
\begin{figure}[h]
\centering
\resizebox{0.24\columnwidth}{!}{\includegraphics{tooth.png}}
\resizebox{0.24\columnwidth}{!}{\includegraphics{noise_inhomo03.png}}
\resizebox{0.24\columnwidth}{!}{\includegraphics{noise_inhomo05.png}}
\resizebox{0.24\columnwidth}{!}{\includegraphics{noise_inhomo07.png}}
\caption{\label{FigSynth2}
Original image (left), distorted by a band-dampening filter $s \in \{ 0.3, 0.5, 0.7 \}$ and
images corrupted by additive Gaussian noise with $\sigma = 12\%$ .
}
\end{figure}
The segmentation is independent from inhomogeneities induced by a band dampening up to $s = 0.3$ in (\ref{eq:banddamp}) .
For all the noise levels analysed, the area segmentation error \Eenamel, and the Length error \Elen are
below the acceptable level of 5 \% (Figure \ref{FigErrorsAL2} (a,b)).
With a stronger manifestation of intensity inhomogeneities both errors increase above
acceptable values.
\begin{figure}[ht]
\centering
\subfigure[\Eenamel (\%)]{\resizebox{0.45\columnwidth}{!}{\includegraphics{ih_area.pdf}}}
\subfigure[\Elen (\%)]{\resizebox{0.45\columnwidth}{!}{\includegraphics{ih_len.pdf}}}
\subfigure[\avgd (Pixel)]{\resizebox{0.45\columnwidth}{!}{\includegraphics{ih_dist.pdf}}}
\subfigure[\maxd (Pixel)]{\resizebox{0.45\columnwidth}{!}{\includegraphics{ih_mdist.pdf}}}
\caption{\label{FigErrorsAL2}
Average error of best segmentation results and the corresponding variation.
}
\end{figure}
\section{Application to a real tooth}
In order to measure the performance of the segmentation algorithm in the application context of \mct tooth segmentation,
a comparison between a manual obtained segmentation and an automatic segmentation of one tooth data set was performed.
From the tooth data (Figure \ref{fig:beamcrack}) set consisting of a volume of 384x400x649 voxels,
the enamel cap was hand segmented using PhotoShop.
By using the polygonal lasso tool, a rough segmentation was acquired, which was then refined manually.
The segmentation of the boundary between enamel and dentin proved to be quite easy, whereas the definition
of the boundary between enamel and the background posed a larger challenge because of the smooth change of intensities.
\begin{figure}[ht]
\centering
\resizebox{0.3\columnwidth}{!}{\includegraphics{o0110.png}}
\resizebox{0.3\columnwidth}{!}{\includegraphics{f0110.png}}
\resizebox{0.3\columnwidth}{!}{\includegraphics{c0110.png}}
\vspace{1mm}
\resizebox{0.3\columnwidth}{!}{\includegraphics{o0299.png}}
\resizebox{0.3\columnwidth}{!}{\includegraphics{f0299.png}}
\resizebox{0.3\columnwidth}{!}{\includegraphics{c0299.png}}
\caption{\label{FigSegSlice}
Two slices of a tooth: left original, middle, filtered with a median and a mean least variation filter, both of width 3, and
right fuzzy c-means classification probabilities encoded in colours - red corresponds to background, green to dentin,
and blue to enamel.
Note, the crack in the enamel that is enlarged by the filtering (upper row) and is classified to have a
high dentin probability because considering the full 3D stack of images its intensity corresponds
to dentin in images without enamel (lower row).
Also not the high enamel probability in the dentin-only image in the lower row, this is a result of beam hardening.}
\end{figure}
In order to extract the dentin-enamel boundary surface, the following semi-automatic approach was used:
First, the images where smoothed by using a median filter of width 3 followed by a mlv filter of width 3.
Then the whole tooth was segmented from the background by thresh-holding.
In the enamel mask obtained by hand-segmentation, cracks where closed by applying a morphological closing operation
with a spherical mask of radius 1.
Then, the inverted manually acquired enamel mask was applied to the tooth segmentation.
By picking the largest connected component in the resulting binary image, the dentin mask was obtained.
Finally, the enamel-dentin boundary surface was obtained by eroding the closed, hand segmented enamel
mask with a 6n mask and evaluating the overlap with the dentin mask.
With this procedure, the segmentation of the enamel-dentin boundary corresponds mostly to the hand segmentation
and is only slightly altered by the morphological closing operation.
The automatic segmentation of the tooth was executed by filtering first with a median and then a MLV filter, both
of width 3.
The fuzzy c-means classification was applied like above.
The resulting probability map indicates that the beam hardening results in enamel being classified as dentin with a
high probability (see Figure \ref{FigSegSlice}).
The best lower threshold for the region growing was acquired by trial and error.
Because the beam hardening results in a string radial intensity gradient,
best results were obtained for a value of 0.3.
With these values we obtained an enamel segmentation error \Eenamel of 6.5\%, and the area error for the enamel-dentine junction
\Elen resulted in 8.5\%.
The average distance \avgd was 2.5 Pixel and the maximum distance \maxd 44 Pixel.
\begin{figure}[ht]
\centering
\resizebox{0.8\columnwidth}{!}{\includegraphics{hand2auto33-03.png}}
\caption{\label{FigCompHand2Auto}
A tooth was hand segmented as described in the text, and the dentin surface is visualised by the transparent green object.
An automatic segmentation is visualised as white shape (parameters: filter: median=3 + mlv=3,
seed threshold = 0.99, low threshold = 0.3).
Note the ``appendix'' of the dentin and the altering of the ``peaks''.
}
\end{figure}
In Figure \ref{FigCompHand2Auto} the automatically obtained segmentation of the dentin surface is displayed in conjunction
with the semi-manual obtained segmentation.
It can be seen that the crack in the enamel as well as the low intensity of the enamel due to beam hardening
resulted in the segmentation of a part of the enamel as an appendix to the dentin that accounts for the large
maximum distance \maxd of the segmented dentin-enamel junction from the manually obtained one.
This appendix also accounts for some of the enamel segmentation error.
Additionally, the shape of the dentin horns was altered.
As a result, manual post processing is necessary to obtain a clean segmentation of this tooth.
\section{Discussion}
Combining the \emph{median} and \emph{mean least variation} filters resulted in a good noise reduction and edge enhancement
of the input data.
Finding the best filter width for a good balance between smoothing and shape preservation requires trial and error.
Nevertheless, parameters found for an image set acquired with a certain imaging protocol can usually be filtered by
using the very same filter combination.
Therefore, the trial-and-error run can be applied to a subset of one representative data set and the results can then be used
for a whole set of scans.
Synthetic experiments showed that input images which are only corrupted by noise can be segmented with high accuracy.
Intensity inhomogeneities, however, pose a real challenge to the proposed segmentation algorithm because they usually result
in an intensity overlap between tissues.
Additionally, cracks in the input objects are hard to overcome, especially if surface shape and area are considered.
In these cases a manual pre- or post-processing in addition to the automatic segmentation is usually inevitable.
In summary, the segmentation method described in this paper is useful to obtain a basic segmentation of a tooth
scanned using \mct.
Often manual post processing will be required, in order to get a good segmentation, though.
Nevertheless, because the analysis of the data usually requires its visualisation, obvious errors are
easily seen.
Other obstacles to segmentation that future work may address are:
\begin{itemize}
\item A successful fuzzy c-means classification of the histogram depends strongly on the start parameters
(i.e. initial class centres and $\sigma$).
\item Choosing the proper low threshold in the region growing algorithm sometimes also requires trial and error.
Considering intensity gradient information can help to overcome this limitation.
\item With the ever increasing computational power available at low cost,
running more sofisticated algorithms like texture analysis might become feasible.
\end{itemize}
\noindent
Finally, an automatic quantification of the level of noise and intensity inhomogeneity would provide the means to give
error margins for measurements based on the segmentation.
\bibliography{segment}
\bibliographystyle{plain}
\end{document}