SampleSelection

Selects samples from a training vector data set.

Description

The application selects a set of samples from geometries intended for training (they should have a field giving the associated class).

First of all, the geometries must be analyzed by the PolygonClassStatistics application to compute statistics about the geometries, which are summarized in an XML file. Then, this XML file must be given as an input to this application (parameter instats).

The input support image and the input training vectors shall be given in parameters ‘in’ and ‘vec’ respectively. Only the sampling grid (origin, size, spacing)will be read in the input image. There are several strategies to select samples (parameter strategy) :

  • smallest (default) : select the same number of samples in each class so that the smallest one is fully sampled.
  • constant : select the same number of samples N in each class (with N below or equal to the size of the smallest class).
  • byclass : set the required number for each class manually, with an input CSV file (first column is class name, second one is the required samples number).
  • percent: set a target global percentage of samples to use. Class proportions will be respected.
  • total: set a target total number of samples to use. Class proportions will be respected.

There is also a choice of the sampling type to perform:

  • periodic : select samples uniformly distributed
  • random : select samples randomly distributed

Once the strategy and type are selected, the application outputs samples positions(parameter out).

The other parameters to consider are:

  • layer : index specifying from which layer to pick geometries.
  • field : set the field name containing the class.
  • mask : an optional raster mask can be used to discard samples.
  • outrates : allows outputting a CSV file that summarizes the sampling rates for each class.

As with the PolygonClassStatistics application, different types of geometry are supported : polygons, lines, points. The behavior of this application is different for each type of geometry :

  • polygon: select points whose center is inside the polygon
  • lines : select points intersecting the line
  • points : select closest point to the provided point

Parameters

InputImage -in image Mandatory
Support image that will be classified

InputMask -mask image
Validity mask (only pixels corresponding to a mask value greater than 0 will be used for statistics)

Input vectors -vec filename [dtype] Mandatory
Input geometries to analyse

Output vectors -out filename [dtype] Mandatory
Output resampled geometries

Input Statistics -instats filename [dtype] Mandatory
Input file storing statistics (XML format)

Output rates -outrates filename [dtype]
Output rates (CSV formatted)

Sampler type -sampler [periodic|random] Default value: periodic
Type of sampling (periodic, pattern based, random)

  • Periodic sampler
    Takes samples regularly spaced
  • Random sampler
    The positions to select are randomly shuffled.

Periodic sampler options

Jitter amplitude -sampler.periodic.jitter int Default value: 0
Jitter amplitude added during sample selection (0 = no jitter)


Sampling strategy -strategy [byclass|constant|percent|total|smallest|all] Default value: smallest

  • Set samples count for each class
    Set samples count for each class
  • Set the same samples counts for all classes
    Set the same samples counts for all classes
  • Use a percentage of the samples available for each class
    Use a percentage of the samples available for each class
  • Set the total number of samples to generate, and use class proportions.
    Set the total number of samples to generate, and use class proportions.
  • Set the same number of samples for all classes, with the smallest class fully sampled
    Set the same number of samples for all classes, with the smallest class fully sampled
  • Use all samples
    Use all samples

Set samples count for each class options

Number of samples by class -strategy.byclass.in filename [dtype] Mandatory
Number of samples by class (CSV format with class name in 1st column and required samples in the 2nd.

Set the same samples counts for all classes options

Number of samples for all classes -strategy.constant.nb int Mandatory
Number of samples for all classes

Use a percentage of the samples available for each class options

The percentage to use -strategy.percent.p float Default value: 0.5
The percentage to use

Set the total number of samples to generate, and use class proportions. options

The number of samples to generate -strategy.total.v int Default value: 1000
The number of samples to generate


Field Name -field string
Name of the field carrying the class name in the input vectors.

Layer Index -layer int Default value: 0
Layer index to read in the input vector file.

Elevation management

This group of parameters allows managing elevation values. Supported formats are SRTM, DTED or any geotiff. DownloadSRTMTiles application could be a useful tool to list/download tiles related to a product.

DEM directory -elev.dem directory
This parameter allows selecting a directory containing Digital Elevation Model files. Note that this directory should contain only DEM files. Unexpected behaviour might occurs if other images are found in this directory.

Geoid File -elev.geoid filename [dtype]
Use a geoid grid to get the height above the ellipsoid in case there is no DEM available, no coverage for some points or pixels with no_data in the DEM tiles. A version of the geoid can be found on the OTB website(https://gitlab.orfeo-toolbox.org/orfeotoolbox/otb-data/blob/master/Input/DEM/egm96.grd).

Default elevation -elev.default float Default value: 0
This parameter allows setting the default height above ellipsoid when there is no DEM available, no coverage for some points or pixels with no_data in the DEM tiles, and no geoid file has been set. This is also used by some application as an average elevation value.


Random seed -rand int
Set a specific random seed with integer value.

Available RAM (MB) -ram int Default value: 256
Available memory for processing (in MB).

Examples

From the command-line:

otbcli_SampleSelection -in support_image.tif -vec variousVectors.sqlite -field label -instats apTvClPolygonClassStatisticsOut.xml -out resampledVectors.sqlite

From Python:

import otbApplication

app = otbApplication.Registry.CreateApplication("SampleSelection")

app.SetParameterString("in", "support_image.tif")
app.SetParameterString("vec", "variousVectors.sqlite")
app.SetParameterString("field", "label")
app.SetParameterString("instats", "apTvClPolygonClassStatisticsOut.xml")
app.SetParameterString("out", "resampledVectors.sqlite")

app.ExecuteAndWriteOutput()