This chapter introduces the tools available in OTB for the estimation of geometric disparities between images.
The problem we want to deal with is the one of the automatic disparity map estimation of images acquired
with different sensors. By different sensors, we mean sensors which produce images with different
radiometric properties, that is, sensors which measure different physical magnitudes: optical sensors
operating in different spectral bands, radar and optical sensors, etc.
For this kind of image pairs, the classical approach of fine correlation [81, 43], can not always be used to
provide the required accuracy, since this similarity measure (the correlation coefficient) can only measure
similarities up to an affine transformation of the radiometries.
There are two main questions which can be asked about what we want to do:
We can answer by saying that the images of the same object obtained by different sensors are two different
representations of the same reality. For the same spatial location, we have two different measures. Both
informations come from the same source and thus they have a lot of common information. This
relationship may not be perfect, but it can be evaluated in a relative way: different geometrical
distortions are compared and the one leading to the strongest link between the two measures is
When working with images acquired with the same (type of) sensor one can use a very effective approach.
Since a correlation coefficient measure is robust and fast for similar images, one can afford to apply it in
every pixel of one image in order to search for the corresponding HP in the other image. One can thus build
a deformation grid (a sampling of the deformation map). If the sampling step of this grid is short enough,
the interpolation using an analytical model is not needed and high frequency deformations can be
estimated. The obtained grid can be used as a re-sampling grid and thus obtain the registered
No doubt, this approach, combined with image interpolation techniques (in order to estimate sub-pixel
deformations) and multi-resolution strategies allows for obtaining the best performances in terms of
deformation estimation, and hence for the automatic image registration.
Unfortunately, in the multi-sensor case, the correlation coefficient can not be used. We will thus try to find
similarity measures which can be applied in the multi-sensor case with the same approach as the correlation
We start by giving several definitions which allow for the formalization of the image registration problem. First of all, we define the master image and the slave image:
Definition 1 Master image: image to which other images will be registered; its geometry is considered as the reference.
Definition 2 Slave image: image to be geometrically transformed in order to be registered to the master image.
Two main concepts are the one of similarity measure and the one of geometric transformation:
Sc has an absolute maximum when the two images I and J are identical in the sense of the criterion
Finally we introduce a definition for the image registration problem:
The geometric transformation of definition 4 is used for the correction of the existing deformation between the two images to be registered. This deformation contains informations which are linked to the observed scene and the acquisition conditions. They can be classified into 3 classes depending on their physical source:
These deformations are characterized by their spatial frequencies and intensities which are summarized in
|Stereo||Medium||High and Medium|
|Attitude evolution||Low||Low to Medium|
Depending on the type of deformation to be corrected, its model will be different. For example, if the only
deformation to be corrected is the one introduced by the mean attitude, a physical model for the acquisition
geometry (independent of the image contents) will be enough. If the sensor is not well known,
this deformation can be approximated by a simple analytical model. When the deformations
to be modeled are high frequency, analytical (parametric) models are not suitable for a fine
registration. In this case, one has to use a fine sampling of the deformation, that means the use of
deformation grids. These grids give, for a set of pixels of the master image, their location in the slave
The following points summarize the problem of the deformation modeling:
The last point implies that the sampling period of the grid must be short enough in order to
account for high frequency deformations (Shannon theorem). Of course, if the deformations
are non stationary (it is usually the case of topographic deformations), the sampling can be
As a conclusion, we can say that definition 5 poses the registration problem as an optimization problem.
This optimization can be either global or local with a similarity measure which can also be either local or
global. All this is synthesized in table 11.2.
|Geometric model||Similarity measure||Optimization of the|
|with a priori HP|
|without a priori HP|
The ideal approach would consist in a registration which is locally optimized, both in similarity and
deformation, in order to have the best registration quality. This is the case when deformation grids with
dense sampling are used. Unfortunately, this case is the most computationally heavy and one often uses
either a low sampling rate of the grid, or the evaluation of the similarity in a small set of pixels for the
estimation of an analytical model. Both of these choices lead to local registration errors which, depending
on the topography, can amount several pixels.
Even if this registration accuracy can be enough in many applications, (ortho-registration, import into a
GIS, etc.), it is not acceptable in the case of data fusion, multi-channel segmentation or change
detection . This is why we will focus on the problem of deformation estimation using dense
The fine modeling of the geometric deformation we are looking for needs for the estimation of the
coordinates of nearly every pixel in the master image inside the slave image. In the classical mono-sensor
case where we use the correlation coefficient we proceed as follows.
The geometric deformation is modeled by local rigid displacements. One wants to estimate the coordinates
of each pixel of the master image inside the slave image. This can be represented by a displacement vector
associated to every pixel of the master image. Each of the two components (lines and columns) of this
vector field will be called deformation grid.
We use a small window taken in the master image and we test the similarity for every possible shift within
an exploration area inside the slave image (figure 11.1).
Reference ImageSecondary ImageCandidate pointsEstimation windowSearch windowSimilarity estimationSimilarity optimizationOptimumΔx ,Δy
That means that for each position we compute the correlation coefficient. The result is a correlation surface whose maximum gives the most likely local shift between both images:
In this expression, N is the number of pixels of the analysis window, mI and mJ are the estimated mean
values inside the analysis window of respectively image I and image J and σI and σJ are their standard
Quality criteria can be applied to the estimated maximum in order to give a confidence factor to the
estimated shift: width of the peak, maximum value, etc. Sub-pixel shifts can be measured by applying
fractional shifts to the sliding window. This can be done by image interpolation.
The interesting parameters of the procedure are:
The correlation coefficient cannot be used with original grey-level images in the multi-sensor case. It could
be used on extracted features (edges, etc.), but the feature extraction can introduce localization errors.
Also, when the images come from sensors using very different modalities, it can be difficult
to find similar features in both images. In this case, one can try to find the similarity at the
pixel level, but with other similarity measures and apply the same approach as we have just
The concept of similarity measure has been presented in definition 3. The difficulty of the procedure
lies in finding the function f which properly represents the criterion c. We also need that f
be easily and robustly estimated with small windows. We extend here what we proposed in
We remind here the computation of the correlation coefficient between two image windows I and J. The coordinates of the pixels inside the windows are represented by (x,y):
In order to qualitatively characterize the different similarity measures we propose the following
experiment. We take two images which are perfectly registered and we extract a small window of
size N ×M from each of the images (this size is set to 101×101 for this experiment). For the
master image, the window will be centered on coordinates (x0,y0) (the center of the image)
and for the slave image, it will be centered on coordinates (x0 +Δx,y0). With different values
of Δx (from -10 pixels to 10 pixels in our experiments), we obtain an estimate of ρ(I,J) as a
function of Δx, which we write as ρ(Δx) for short. The obtained curve should have a maximum
for Δx = 0, since the images are perfectly registered. We would also like to have an absolute
maximum with a high value and with a sharp peak, in order to have a good precision for the shift
The source code for this example can be found in the file
This example demonstrates the use of the otb::FineRegistrationImageFilter . This filter performs deformation estimation using the classical extrema of image-to-image metric look-up in a search window.
The first step toward the use of these filters is to include the proper header files.
Several type of otb::Image are required to represent the input image, the metric field, and the deformation field.
To make the metric estimation more robust, the first required step is to blur the input images. This is done using the itk::RecursiveGaussianImageFilter :
Now, we declare and instantiate the otb::FineCorrelationImageFilter which is going to perform the registration:
Some parameters need to be specified to the filter:
We need to set the sub-pixel accuracy we want to obtain:
The default matching metric used by the FineRegistrationImageFilter::i s standard correlation. However, we may also use any other image-to-image metric provided by ITK. For instance, here is how we would use the itk::MutualInformationImageToImageMetric (do not forget to include the proper header).
The itk::MutualInformationImageToImageMetric produces low value for poor matches, therefore, the filter has to maximize the metric :
The execution of the otb::FineRegistrationImageFilter will be triggered by the Update() call on the writer at the end of the pipeline. Make sure to use a otb::ImageFileWriter if you want to benefit from the streaming features.
Figure 11.2 shows the result of applying the otb::FineRegistrationImageFilter .
Taking figure 11.1 as a starting point, we can generalize the approach by letting the user choose:
In order to do this, we will use the ITK registration framework locally on a set of nodes. Once the disparity is estimated on a set of nodes, we will use it to generate a deformation field: the dense, regular vector field which gives the translation to be applied to a pixel of the secondary image to be positioned on its homologous point of the master image.
The source code for this example can be found in the file
This example demonstrates the use of the otb::DisparityMapEstimationMethod , along with the otb::NearestPointDisplacementFieldGenerator . The first filter performs deformation estimation according to a given transform, using embedded ITK registration framework. It takes as input a possibly non regular point set and produces a point set with associated point data representing the deformation.
The second filter generates a deformation field by using nearest neighbor interpolation on the deformation values from the point set. More advanced methods for deformation field interpolation are also available.
The first step toward the use of these filters is to include the proper header files.
Then we must decide what pixel type to use for the image. We choose to do all the computation in floating point precision and rescale the results between 0 and 255 in order to export PNG images.
The images are defined using the pixel type and the dimension. Please note that the otb::NearestPointDisplacementFieldGenerator generates a otb::VectorImage to represent the deformation field in both image directions.
The next step is to define the transform we have chosen to model the deformation. In this example the deformation is modeled as a itk::TranslationTransform .
Then we define the metric we will use to evaluate the local registration between the fixed and the moving image. In this example we choosed the itk::NormalizedCorrelationImageToImageMetric .
Disparity map estimation implies evaluation of the moving image at non-grid position. Therefore, an interpolator is needed. In this example we choosed the itk::WindowedSincInterpolateImageFunction .
To perform local registration, an optimizer is needed. In this example we choosed the itk::GradientDescentOptimizer .
Now we will define the point set to represent the point where to compute local disparity.
Now we define the disparity map estimation filter.
The input image reader also has to be defined.
Two readers are instantiated : one for the fixed image, and one for the moving image.
We will the create a regular point set where to compute the local disparity.
We build the transform, interpolator, metric and optimizer for the disparity map estimation filter.
We then set up the disparity map estimation filter. This filter will perform a local registration at each point of the given point set using the ITK registration framework. It will produce a point set whose point data reflects the disparity locally around the associated point.
Point data will contains the following data :
Please note that in the case of a itk::TranslationTransform , the deformation values and the transform parameters are the same.
The initial transform parameters can be set via the SetInitialTransformParameters() method. In our case, we simply fill the parameter array with null values.
Now we can set the input for the deformation field estimation filter. Fixed image can be set using the SetFixedImage() method, moving image can be set using the SetMovingImage(), and input point set can be set using the SetPointSet() method.
Once the estimation has been performed by the otb::DisparityMapEstimationMethod , one can generate the associated deformation field (that means translation in first and second image direction). It will be represented as a otb::VectorImage .
For the deformation field estimation, we will use the otb::BSplinesInterpolateDisplacementFieldGenerator . This filter will perform a nearest neighbor interpolation on the deformation values in the point set data.
The disparity map estimation filter is instanciated.
We must then specify the input point set using the SetPointSet() method.
One must also specify the origin, size and spacing of the output deformation field.
The local registration process can lead to wrong deformation values and transform parameters. To Select only points in point set for which the registration process was succesful, one can set a threshold on the final metric value : points for which the absolute final metric value is below this threshold will be discarded. This threshold can be set with the SetMetricThreshold() method.
The following classes provide similar functionality:
Now we can warp our fixed image according to the estimated deformation field. This will be performed by the itk::WarpImageFilter . First, we define this filter.
Then we instantiate it.
We set the input image to warp using the SetInput() method, and the deformation field using the SetDisplacementField() method.
In order to write the result to a PNG file, we will rescale it on a proper range.
We can now write the image to a file. The filters are executed by invoking the Update() method.
We also want to write the deformation field along the first direction to a file. To achieve this we will use the otb::MultiToMonoChannelExtractROI filter.
Figure 11.3 shows the result of applying disparity map estimation on a stereo pair using a regular point set, followed by deformation field estimation using Splines and fixed image resampling.
The source code for this example can be found in the file
This example demonstrates the use of the stereo reconstruction chain from an image pair. The images are assumed to come from the same sensor but with different positions. The approach presented here has the following steps:
It is important to note that this method requires the sensor models with a pose estimate for each image.
This example demonstrates the use of the following filters :
The image pair is supposed to be in sensor geometry. From two images covering nearly the same area, one can estimate a common epipolar geometry. In this geometry, an altitude variation corresponds to an horizontal shift between the two images. The filter otb::StereorectificationDisplacementFieldSource computes the deformation grids for each image.
These grids are sampled in epipolar geometry. They have two bands, containing the position offset (in physical space units) between the current epipolar point and the corresponding sensor point in horizontal and vertical direction. They can be computed at a lower resolution than sensor resolution. The application StereoRectificationGridGenerator also provides a simple tool to generate the epipolar grids for your image pair.
Then, the sensor images can be resampled in epipolar geometry, using the otb::StreamingWarpImageFilter . The application GridBasedImageResampling also gives an easy access to this filter. The user can choose the epipolar region to resample, as well as the resampling step and the interpolator.
Note that the epipolar image size can be retrieved from the stereo rectification grid filter.
The deformation grids are casted into deformation fields, then the left and right sensor images are resampled.
Since the resampling produces black regions around the image, it is useless to estimate disparities on these no-data regions. We use a otb::BandMathImageFilter to produce a mask on left and right epipolar images.
Once the two sensor images have been resampled in epipolar geometry, the disparity map can be computed. The approach presented here is a 2D matching based on a pixel-wise metric optimization. This approach doesn’t give the best results compared to global optimization methods, but it is suitable for streaming and threading on large images.
The major filter used for this step is otb::PixelWiseBlockMatchingImageFilter . The metric is computed on a window centered around the tested epipolar position. It performs a pixel-to-pixel matching between the two epipolar images. The output disparities are given as index offset from left to right position. The following features are available in this filter:
Some other filters have been added to enhance these pixel-to-pixel disparities. The filter otb::SubPixelDisparityImageFilter can estimate the disparities with sub-pixel precision. Several interpolation methods can be used : parabolic fit, triangular fit, and dichotomy search.
The filter otb::DisparityMapMedianFilter can be used to remove outliers. It has two parameters:
The application PixelWiseBlockMatching contains all these filters and provides a single interface to compute your disparity maps.
The disparity map obtained with the previous step usually gives a good idea of the altitude profile. However, it is more useful to study altitude with a DEM (Digital Elevation Model) representation.
The filter otb::DisparityMapToDEMFilter performs this last step. The behavior of this filter is to :
The rule of keeping the highest elevation makes sense for buildings seen from the side because the roof edges elevation has to be kept. However this rule is not suited for noisy disparities.
The application DisparityMapToElevationMap also gives an example of use.
Figure 11.4 shows the result of applying terrain reconstruction based using pixel-wise block matching, sub-pixel interpolation and DEM estimation using a pair of Pleiades images over the Stadium in Toulouse, France.