SampleSelection - Sample Selection

Selects samples from a training vector data set.

Detailed description

The application selects a set of samples from geometries intended for training (they should have a field giving the associated class).

First of all, the geometries must be analyzed by the PolygonClassStatistics application to compute statistics about the geometries, which are summarized in an xml file. Then, this xml file must be given as input to this application (parameter instats).

The input support image and the input training vectors shall be given in parameters ‘in’ and ‘vec’ respectively. Only the sampling grid (origin, size, spacing)will be read in the input image. There are several strategies to select samples (parameter strategy) :

  • smallest (default) : select the same number of sample in each class so that the smallest one is fully sampled.
  • constant : select the same number of samples N in each class (with N below or equal to the size of the smallest class).
  • byclass : set the required number for each class manually, with an input CSV file (first column is class name, second one is the required samples number).
  • percent: set a target global percentage of samples to use. Class proportions will be respected.
  • total: set a target total number of samples to use. Class proportions will be respected.

There is also a choice on the sampling type to performs :

  • periodic : select samples uniformly distributed
  • random : select samples randomly distributed

Once the strategy and type are selected, the application outputs samples positions(parameter out).

The other parameters to look at are :

  • layer : index specifying from which layer to pick geometries.
  • field : set the field name containing the class.
  • mask : an optional raster mask can be used to discard samples.
  • outrates : allows outputting a CSV file that summarizes the sampling rates for each class.

As with the PolygonClassStatistics application, different types of geometry are supported : polygons, lines, points. The behavior of this application is different for each type of geometry :

  • polygon: select points whose center is inside the polygon
  • lines : select points intersecting the line
  • points : select closest point to the provided point

Parameters

This section describes in details the parameters available for this application. Table [1] presents a summary of these parameters and the parameters keys to be used in command-line and programming languages. Application key is SampleSelection .

[1]Table: Parameters table for Sample Selection.
Parameter Key Parameter Name Parameter Type
in InputImage Input image
mask InputMask Input image
vec Input vectors Input File name
out Output vectors Output File name
instats Input Statistics Input File name
outrates Output rates Output File name
sampler Sampler type Choices
sampler periodic Periodic sampler Choice
sampler random Random sampler Choice
sampler.periodic.jitter Jitter amplitude Int
strategy Sampling strategy Choices
strategy byclass Set samples count for each class Choice
strategy constant Set the same samples counts for all classes Choice
strategy percent Use a percentage of the samples available for each class Choice
strategy total Set the total number of samples to generate, and use class proportions. Choice
strategy smallest Set same number of samples for all classes, with the smallest class fully sampled Choice
strategy all Take all samples Choice
strategy.byclass.in Number of samples by class Input File name
strategy.constant.nb Number of samples for all classes Int
strategy.percent.p The percentage to use Float
strategy.total.v The number of samples to generate Int
field Field Name List
layer Layer Index Int
elev Elevation management Group
elev.dem DEM directory Directory
elev.geoid Geoid File Input File name
elev.default Default elevation Float
ram Available RAM (Mb) Int
rand set user defined seed Int
inxml Load otb application from xml file XML input parameters file
outxml Save otb application to xml file XML output parameters file

InputImage: Support image that will be classified.

InputMask: Validity mask (only pixels corresponding to a mask value greater than 0 will be used for statistics).

Input vectors: Input geometries to analyse.

Output vectors: Output resampled geometries.

Input Statistics: Input file storing statistics (XML format).

Output rates: Output rates (CSV formatted).

Sampler type: Type of sampling (periodic, pattern based, random). Available choices are:

  • Periodic sampler: Takes samples regularly spaced.
  • Jitter amplitude: Jitter amplitude added during sample selection (0 = no jitter).
  • Random sampler: The positions to select are randomly shuffled.

Sampling strategy Available choices are:

  • Set samples count for each class: Set samples count for each class.
  • Number of samples by class: Number of samples by class (CSV format with class name in 1st column and required samples in the 2nd.
  • Set the same samples counts for all classes: Set the same samples counts for all classes.
  • Number of samples for all classes: Number of samples for all classes.
  • Use a percentage of the samples available for each class: Use a percentage of the samples available for each class.
  • The percentage to use: The percentage to use.
  • Set the total number of samples to generate, and use class proportions.: Set the total number of samples to generate, and use class proportions.
  • The number of samples to generate: The number of samples to generate.
  • Set same number of samples for all classes, with the smallest class fully sampled: Set same number of samples for all classes, with the smallest class fully sampled.
  • Take all samples: Take all samples.

Field Name: Name of the field carrying the class name in the input vectors.

Layer Index: Layer index to read in the input vector file.

[Elevation management]: This group of parameters allows managing elevation values. Supported formats are SRTM, DTED or any geotiff. DownloadSRTMTiles application could be a useful tool to list/download tiles related to a product.

  • DEM directory: This parameter allows selecting a directory containing Digital Elevation Model files. Note that this directory should contain only DEM files. Unexpected behaviour might occurs if other images are found in this directory.
  • Geoid File: Use a geoid grid to get the height above the ellipsoid in case there is no DEM available, no coverage for some points or pixels with no_data in the DEM tiles. A version of the geoid can be found on the OTB website(https://gitlab.orfeo-toolbox.org/orfeotoolbox/otb-data/blob/master/Input/DEM/egm96.grd).
  • Default elevation: This parameter allows setting the default height above ellipsoid when there is no DEM available, no coverage for some points or pixels with no_data in the DEM tiles, and no geoid file has been set. This is also used by some application as an average elevation value.

Available RAM (Mb): Available memory for processing (in MB).

set user defined seed: Set specific seed. with integer value.

Load otb application from xml file: Load otb application from xml file.

Save otb application to xml file: Save otb application to xml file.

Example

To run this example in command-line, use the following:

otbcli_SampleSelection -in support_image.tif -vec variousVectors.sqlite -field label -instats apTvClPolygonClassStatisticsOut.xml -out resampledVectors.sqlite

To run this example from Python, use the following code snippet:

#!/usr/bin/python

# Import the otb applications package
import otbApplication

# The following line creates an instance of the SampleSelection application
SampleSelection = otbApplication.Registry.CreateApplication("SampleSelection")

# The following lines set all the application parameters:
SampleSelection.SetParameterString("in", "support_image.tif")

SampleSelection.SetParameterString("vec", "variousVectors.sqlite")

# The following line execute the application
SampleSelection.ExecuteAndWriteOutput()

Limitations

None

Authors

This application has been written by OTB-Team.