4.8.11 Sample Selection

Selects samples from a training vector data set.

Detailed description

The application selects a set of samples from geometries intended for training (they should have a field giving the associated class).

First of all, the geometries must be analyzed by the PolygonClassStatistics application to compute statistics about the geometries, which are summarized in an xml file.
Then, this xml file must be given as input to this application (parameter instats).

The input support image and the input training vectors shall be given in parameters ’in’ and ’vec’ respectively. Only the sampling grid (origin, size, spacing)will be read in the input image.
There are several strategies to select samples (parameter strategy) :
- smallest (default) : select the same number of sample in each class
so that the smallest one is fully sampled.
- constant : select the same number of samples N in each class
(with N below or equal to the size of the smallest class).
- byclass : set the required number for each class manually, with an input CSV file
(first column is class name, second one is the required samples number).
There is also a choice on the sampling type to performs :
- periodic : select samples uniformly distributed
- random : select samples randomly distributed
Once the strategy and type are selected, the application outputs samples positions(parameter out).

The other parameters to look at are :
- layer : index specifying from which layer to pick geometries.
- field : set the field name containing the class.
- mask : an optional raster mask can be used to discard samples.
- outrates : allows to output a CSV file that summarizes the sampling rates for each class.

As with the PolygonClassStatistics application, different types of geometry are supported : polygons, lines, points.
The behavior of this application is different for each type of geometry :
- polygon: select points whose center is inside the polygon
- lines : select points intersecting the line
- points : select closest point to the provided point

Parameters

This section describes in details the parameters available for this application. Table 4.134, page 740 presents a summary of these parameters and the parameters keys to be used in command-line and programming languages. Application key is SampleSelection.





Parameter key

Parameter type

Parameter description




in

Input image

InputImage

mask

Input image

InputMask

vec

Input File name

Input vectors

out

Output File name

Output vectors

instats

Input File name

Input Statistics

outrates

Output File name

Output rates

sampler

Choices

Sampler type

sampler periodic

Choice

Periodic sampler

sampler random

Choice

Random sampler

sampler.periodic.jitter

Int

Jitter amplitude

strategy

Choices

Sampling strategy

strategy byclass

Choice

Set samples count for each class

strategy constant

Choice

Set the same samples counts for all classes

strategy smallest

Choice

Set same number of samples for all classes, with the smallest class fully sampled

strategy all

Choice

Take all samples

strategy.byclass.in

Input File name

Number of samples by class

strategy.constant.nb

Int

Number of samples for all classes

field

String

Field Name

layer

Int

Layer Index

ram

Int

Available RAM (Mb)

rand

Int

set user defined seed

inxml

XML input parameters file

Load otb application from xml file

outxml

XML output parameters file

Save otb application to xml file











Table 4.134: Parameters table for Sample Selection.

InputImage Support image that will be classified

InputMask Validity mask (only pixels corresponding to a mask value greater than 0 will be used for statistics)

Input vectors Input geometries to analyse

Output vectors Output resampled geometries

Input Statistics Input file storing statistics (XML format)

Output rates Output rates (CSV formated)

Sampler type Type of sampling (periodic, pattern based, random) Available choices are:

Sampling strategy Available choices are:

Field Name Name of the field carrying the class name in the input vectors.

Layer Index Layer index to read in the input vector file.

Available RAM (Mb) Available memory for processing (in MB)

set user defined seed Set specific seed. with integer value.

Load otb application from xml file Load otb application from xml file

Save otb application to xml file Save otb application to xml file

Example

To run this example in command-line, use the following:

otbcli_SampleSelection -in support_image.tif -vec variousVectors.sqlite -field label -instats apTvClPolygonClassStatisticsOut.xml -out resampledVectors.sqlite

To run this example from Python, use the following code snippet:

#!/usr/bin/python 
 
# Import the otb applications package 
import otbApplication 
 
# The following line creates an instance of the SampleSelection application 
SampleSelection = otbApplication.Registry.CreateApplication("SampleSelection") 
 
# The following lines set all the application parameters: 
SampleSelection.SetParameterString("in", "support_image.tif") 
 
SampleSelection.SetParameterString("vec", "variousVectors.sqlite") 
 
SampleSelection.SetParameterString("field", "label") 
 
SampleSelection.SetParameterString("instats", "apTvClPolygonClassStatisticsOut.xml") 
 
SampleSelection.SetParameterString("out", "resampledVectors.sqlite") 
 
# The following line execute the application 
SampleSelection.ExecuteAndWriteOutput()

Limitations

None

Authors

This application has been written by OTB-Team.