Reflectance information depends only of the materials spectral responses in the scene. When the mixture
between materials is macroscopic, the linear mixing model of spectra is generally admitted. In this case, the
image typically looks like this:
Figure 22.2: Zone which verify the LM model.
We notice 22.2 the presence of pure pixels, and pixel-blending. The LMM acknowledges that reflectance
spectrum associated with each pixel is a linear combination of pure materials in the recovery area,
commonly known as “”endmembers. This is illustrated in 22.3
Figure 22.3: Decomposition of a hyperspectral cube according to the LMM.
The “ left” term represents the different spectral bands of data cube. The “ right” term represents a “
product” between the reflectance spectra of endmembers and their respective abundances. Abundance band
of endmembers is image grayscale between 0 and 1. The pixel i of the abundance band of endmember j is
sji. This value is the abundance of endmember j in the pixel i. Under certain conditions , this value can
be interpreted as the ratio surface of the material in the overlap zone (22.2). In practice, one can reasonably
a limit number of pure materials compose the scene.
the scene contains pure pixels if the spatial resolution is sufficient and do not necessarily
contains them otherwise.
Many techniques of unmixing in hyperspectral image analysis are based on geometric approach where each
pixel is seen as a spectral vector of L (number of spectral bands). The spectral bands can then be written as
Figure 22.4: “Vectorization” of hyperspectral cube. The spectral pixels are stored in the columns of the matrixR and, in equivalently, the spectral bands are assigned to the lines of R.
By deduction of 22.3 et de 22.4, the LMM needs to decompose R as:
R = A.S+N = X +N
J is the number of endmembers and the number of spectral pixels I:
J columns of the matrix A contain the spectra of endmembers.
J rows of the matrix S contain the abundance maps vectorized, we call the columns vectors of
abundances of matrix S.
The matrix N, of dimensions LXI is a matrix of additive noise.
The unmixing problem is to estimate matrices A and S from R or possibly of Ẍ, an estimate of the denoised
Several physical constraints can be taken into account to solve the unmixing problem:
C1: reflectance spectra are positives (non negative matrix A).
C2: positivity abundances are positive (non-negative matrix S).
C3: additivity abundance (the sum of the coefficients of each column of the matrix S is 1).
Independence between the “algebraic spectral vectors” of endmembers associated with
linearity and mixtures of spectra, so that the simplex property described in paragraph below.
Recent unmixing algorithms based on the “property of simplex.” In a vector space of dimension J -1, we
can associate to J vectors algebrically independent, J points which define the vertices of a J-simplex
Figure 22.5: Illustration of a 2-simplex, a 3-simplex and a 4-simplex.
Hyperspectral cube of L bands, based on J endmembers, may be contained in a affine subspace of
dimension J -1. A relevant subspace in the sense of signal-to-noise ratio is generally obtained by Principal
Component Analysis (PCA) 14.7. In practice, the J -1 eigenvectors associated with highest values are the
columns of a projection matrix V of dimension (Lx(J -1)). Reduced data Z, of dimensions ((J -1)xI) are
obtained by the operation: Z = VT()
where each column of where the average spectrum is subtracted, generally estimated under
maximum likelihood. In the subspace carrying the column-vectors Z, endmembers spectra oare
associated to the top of the simplex. If the noise is negligible, the simplex circumscribed reduced
This property shows that the endmembers research are the vertices of a simplex that circumscribes the data.
However, an infinity of different simplices can identify the same data set. In fact, the problem of unmixing
generally does not have a unique solution. This degeneration can also be demonstrated by the formalism of
the non-negative matrices factorization .
It is therefore necessary to choose the most physically relevant solution. All unmixing techniques based on
this simplex property admit that the best solution is defined by the allowable minimum volume
simplex, or the notion of volume is extended to all finite dimensions (possibly different from
A first family of unmixing algorithms are based on research of the endmembers “among” data. This means
that a minimum of one pure pixel must be associated with each endmembers. If this hypothesis
is not verified, it will produce an estimation error of the endmembers spectra. The historical
advantage of these algorithms are their low algorithmic complexity. The three best known are
A second family is composed of algorithms which are looking for the simplex of minimum volume
circumscribing the data. Phase initialization consists in determining an initial simplex any circumscribing
the data. Then, a numerical optimization scheme minimizes a functional, increasing function of the volume
generalized, itself dependent of estimated endmembersin the current iteration. The optimization scheme is
constrained by the fact that the data have remained on the simplex and possibly by constraints C1, C2 and
All spectral pixels included in the simplex estimate (approximatively) by VCA does not
impact the constraint of data to belong to the researched simplex, they are simply delete the
data used in the minimization of the simplex to reduce the algorithmic complexity.
The highly developed optimization technique uses sequential quadratic programming
(Sequential Quadratic Programming - SQP) and more specifically of the category
Non negative matrix factorization algorithms (NMF for Non-negative Matrix Factorization). The purpose of
this branch of applied mathematics is to factor a non-negative matrix, X in our case, into a product of non
negative matrices: AS by minimizing a distance between X and AS and with an adapted regularization to
lift the degeneration in an appropriate manner adapted to the physical problems associated with
Minimizing the norm of standard Frobenius with a spectral regularization and a regularization
“Space”(the matrix of abundances).
Minimum spectral Dispersion: the spectral regularization encourages the variance of
coefficients of a spectrum of endmembers to be low.
Maximum spatial dispersion: the spatial regularization encourages the vector of
abundances to occupy all the admissible parts (more information in ). it presents a
certain analogy with the minimum volume constraint.
Algorithms of families 1 and 2 estimate “ only” the spectra of endmembers. The estimated abundance maps
held a posteri and requires the application of an algorithm like Fully Constrained Least Square (FCLS)
otb::FCLSUnmixingImageFilter. VSS includes an specific algorithm for the estimation of the
abundances. The unmixing is a general non-convex problem, which explains the importance of the
initialization of algorithms.
Overview of algorithms and related physical constraints:
We start by defining the types for the images and the reader and the writer. We choose to work with a
otb::VectorImage , since we will produce a multi-channel image (the principal components) from a
multi-channel input image.
By definition, an anomaly in a scene is an element that does not expect to find. The unusual element is
likely different from its environment and its presence is in the minority scene. Typically, a vehicle in natural
environment, a rock in a field, a wooden hut in a forest are anomalies that can be desired to detect using a
hyperspectral imaging. This implies that the spectral response of the anomaly can be discriminated in the
spectral response of “background clutter”. This also implies that the pixels associated to anomalies, the
anomalous pixels are sufficiently rare and/or punctual to be considered as such. These properties can be
viewed as spectral and spatial hypotheses on which the techniques of detection anomalies in hyperspectral
images rely on.
Literature on hyperspectral imaging generally distinguishes target detection and detection anomalies:
We speak of detection of targets when the spectral response element of the research is used
as input to the detection algorithm. This is an a priori information that can, in theory, allow
to construct algorithms with very high detection score, such as for example the Adaptive
Matched Filter (AMF) or Adaptive Cosine/Coherence Estimator (ACE). Nevertheless, thin
enough knowledge of the researched spectrum is a difficult information to hold in practice,
leading the use of anomaly detection algorithms.
We speak of detection of anomalies (or outliers) when the spectrum of the unusual element
is not required by the algorithm. For this reason, we often associate the term “Unsupervised”
detection. Nevertheless, these algorithms depend generally of “structural” parameters. For
example, in the case of detection on a sliding window, selecting the right dimensions is based
on an a priori knowledge of the anomalies size. We focus here on algorithms for anomalydetection.
Figure 22.7: Notions on detection: ground truth mask, map detection, face detection.
In 22.7, we introduce some notions that will be useful later. Anomaly detection algorithms have an image as
input and consider a map serving as a detection tool for making a decision. A adaptive thresholding
provides a estimated mask detection that we hope is the most similar possible as the ground truth mask,
unknown in practice.
Two approaches dominate the state of the art in anomaly detection in hyperspectral images. Methods which
use Pursuit Projection (PP) and methods based on a probabilistic modelization of the background and
possibly of the target class with statistical hypothesis tests. The PP consists in projecting linearly spectral
pixels on vectors wi which optimizes a criterion sensitive to the presence of anomalies (like
Kurtosis). This gives a series of maps of projections where anomalies contrast very strongly with
the background. But the automatic estimation of map detection have also major difficulties,
How many projectors should I consider?
What (s) projector (s) to choose?
How to manage an inhomogeneous background?
Detection performances varies with the spatial dimensions of the image (number of samples)
These algorithms usually do not have a structure allowing parallelization
These algorithms do not generally have a “constant false alarm rate”.
Algorithms described here are RX (presented in the first version in  and GMRF .
They are based on probability models, statistics and hypothesis tests and sliding window. These
approaches consist in answering the following question : “The pixel (or set of neighboring pixels)
tests looks like background pixels?”, by a process of test statistics. More fully, this approach
A model for the background
The choice of a statistic test
Eventually, a model for the class “anomaly”
Estimators for the parameters a priori unknown
Hypothesis of homogeneity for a satisfactory compromise between:
The number of samples compared to the number of estimate parameters
The homogeneity, including the background
Algorithmic complexity (although the algorithms considered are highly parallelizable)
Figure 22.8: Basic workflow of anomaly detection in hyperspectral images.
An optional first step is to reduce the size of spectral data while maintaining information related to
anomalies. Then, the spectral pixels are tested one by oneturn (parallelizable task). The pixel test works on
a sliding window (see 22.9). This window consists of two sub windows, centered on the pixel test of
dimensions L and LL, with L < LL.
Pixels belonging to the annulus formed the class “local background”. The statistical
parameters of the background, required by statistical tests, are estimated from these spectral
pixels. One of the challenges is to find a compromise on the thickness of the annulus (and
therefore the number statistical sampling) where:
If the annulus is too thick, the local background is no longer homogeneous
If the number of samples is too low, the precision of the statistical estimation of the
model parameters for the background is too low
Pixels of the central part of the sliding window belong to the class “unknown”. If the test pixel
(central pixel) is an anomaly, it is possible that its neighbors pixels are also anomalies, which imply to
not to choose a too small value for L if it is known that anomalies can be distribute over several
Once all the parameters are estimated, a statistical test is performed and assigns a value Λ(i) to the tested
pixel i. A map of detection is thus formed.
Figure 22.9: Principle of the sliding window and definitions of parameters L and LL of the sliding window. RXalgorithms and GMRF, taken herein in the following sections.