classify [ parameter=value ] [ inputfile outputfile ] classify [ parameter=value ] [ inputfile ... directory ]Parameters are: variables, class_var_name, input_mask_set, input_mask_var, resolution, mask_classes, channel_minimum, channel_maximum, target_classes, target_labels, target_valuesn, target_deltasn, list_file, merge_tolerance, single_link, save_clusters, list_clusters, num_clusters, percent_signif, percent_max_siz, merge_tolerance.
classify identifies and labels data clusters in a scene/image using a multichannel binning and merging classification algorithm, followed by comparison and labeling of clusters with known classes. Clusters whose centers, or mean values, lie within a threshold distance of a user-specified "target feature" mean value, are grouped and labeled so that all image elements associated with targeted clusters (classes) are correspondingly labeled and masks highlighting target features can be generated. classify delivers an output product which consists of masks highlighting targeted image features/classes.
Classification of all image features is first performed by a simple sort of image pixels into data bins. Data bins are grid-cells comprising a grid superimposed onto the the multichannel spectral range being utilized. The grid cells should be significantly smaller than the expected dimensions (range of values) of the smallest data clusters sought for classification and labeling. The grid mesh dimensions (cell-counts) for each channel can be specified with the resolution parameter.
classify creates an output dataset with the output variable dataclasses of type byte. dataclasses contains a non-negative integer value when the corresponding image pixel belongs to a target class, otherwise the value is set to badvalue (GP_BAD_BYTE). A valid integer value in dataclasses is the integer specified with the target_labels parameter, for the classes specified on the class_var_name parameter field. Target classes are specified by mean and delta values for the data variables utilized, thereby defining neighborhoods wherein target classes are to be found.
After initially sorting image pixels into their corresponding "bins", providing an initial set of data-clusters, these data-clusters are statistically processed and merged with adjacent clusters, on the basis of covariance information accumulated for each cluster, which allows assessment of similarity between adjacent clusters. Cluster pairs being most similar are merged. Two algorithms are provided for this purpose: A single-linkage method, and an average-linkage method.
The single-linkage algorithm identifies every cluster's most appropriate (closest, most similar) neighboring cluster, thereby specifying a series of cluster pairs which should be merged. Merging of all identified cluster pairs is performed in one pass, essentially simultaneously consolidating large chains of neighboring clusters. With single-linkage, one pass may even be sufficient to consolidate all clusters into a single data cluster, depending upon the modality of the data distribution. This algorithm's performance is faster than N**2, N being the initial count of populated grid-cells. Single linkage tends to consolidate clusters quite uniformly, but also may merge outlier data with larger, well-defined clusters.
The average-linkage algorithm differs from single-linkage in that only the closest cluster pair is merged before all affected remaining cluster pairs are reevaluated for closeness and similarity with the merge resultant cluster. This requires additional computational overhead for each merge, so that this algorithm's performance ranges between N**2 and N**3, N being the initial count of populated grid-cells. Average-linkage tends to consolidate large, classifiable data features most rapidly, but ignore outliers.
A cluster pair is determined to be mergeable (closest, most similar) by means of principle component analysis. A cluster pairs' similarity is tested on the basis of data overlap (adjacency and contiguity), which can be measured by comparing clusters' standard deviations with separation of their mean values, as assessed in terms of the principle component coordinates of each cluster. Separation of a cluster pair's mean values is determined in terms of the average of the standard deviations of a cluster pair under consideration. Covariance matrices permit principle component transformations to estimate directional standard deviations, providing measure of data overlap for all relative cluster orientations. While it generally suffices to identify merely the closest neighbor for any given cluster, a difference threshold parameter merge_tolerance is provided to optionally insure that significantly unrelated clusters are not inadvertently consolidated. Additional control parameters provided are num_clusters, which specifies the final number of clusters sought to contain percent_signif percent of the data distribution, thereby specifying a termination criterium. Also, percent_max_siz is provided to limit the largest cluster size as a percentage of the data distribution, to prevent premature consolidation of large data clusters.
target_classes is the parameter to specify a sequence of target class labels and implicitly the corresponding sequence indexes for these target labels. target_valuesn specify the expected values for classes corresponding to each target category. A Neighborhood around these target_valuesn must also be defined with the target_deltasn parameter in order to provide a range of values within which feature classes are expected to cluster. Any classes falling within these ranges of values are identified and labeled as belonging to the target class:
target_values - target_deltas <= target class values <= target_values + target_deltas
Not specifying target classes and target labels suppresses generation of class data; the output dataset will then only contain data pertinent to the resultant data clusters.
Attributes are written to the output dataset which pertain particularly to the specified classes. (dataclasses)_names (default) lists the class names, (dataclasses)_labels (default) lists the class labels which were specified with the target_classes and target_labels parameters. These serve for reuse of the classification data in further processing and classification, such as when class data is eventually reused as an input mask, specified with the input_mask_var parameter.
This example labels image features for three class categories. Note that class merging proceeds iteratively until no classes remain to be merged.
% classify in/out files : char(255) ? g8.97140.1700.reg g8.97140.1700.class variables : char(255) ? gvar_ch1 gvar_ch2 gvar_ch5 single_link : char( 3) ? [no] y input_mask_set : char(255) ? [] input_mask_var : char(255) ? [] save_clusters : char( 3) ? [no] y list_clusters : char( 3) ? [no] y list_file : char(255) ? [clusters.list] target_classes : char(255) ? class1 class2 class3 target_labels : int( 3) ? [0 1 2] 1 2 3
targets for variable gvar_ch1: target_values1 : real( 3) ? 20 40 55 target_deltas1 : real( 3) ? 10 10 5
targets for variable gvar_ch2: target_values2 : real( 3) ? -5 5 5 target_deltas2 : real( 3) ? 30 30 30
targets for variable gvar_ch5: target_values3 : real( 3) ? -5 -15 -25 target_deltas3 : real( 3) ? 5 5 5
resolution : real( 3) ? [0.5 0.5 0.5] 0.3 0.6 0.4 num_clusters : int ? [10] 20 percent_signif : int ? [95] 95 percent_max_size : int ? [100] merge_tolerance: real ? [10.] 10.
Initializing image data
Initializing output data
Initializing output variable
Initializing clusters
Initializing grid
Looping over image elements: ****** Marking empty grid-cells
1562 clusters before merging.
loop to merge clusters:
**************** +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
loop to merge clusters: **** +++++++++++++++++++++++++++++++++++++++++++++
loop to merge clusters:
* ++++++++++++++++++++++++++ Total clusters counted before sorting = 48
Total clusters counted after sorting = 48 ********************************
Cluster statistics:
Number Datasize Mean Min Max Covariance | inverse Covariance ------ -------- ---- ---- ---- -------------------------------
0 79909 7.61293 1.89000 13.40000 4.51721 0.05035 -0.07509
-32.45867 -39.29000 -26.98000 -0.02759 7.53153 -0.06313
19.47752 7.93000 33.90000 3.52624 4.94731 16.48094
1 36874 19.43652 3.66000 37.59000 43.85708 -0.02970 0.01643
-39.15960 -49.58000 -28.87000 4.78384 7.76362 -0.04223
-12.23012 -37.73000 2.94000 -10.94069 5.25922 29.57635
2 36789 3.91014 1.11000 7.25000 2.22222 0.01481 0.11266
-29.65788 -33.37000 -23.76000 -0.81703 4.96106 -0.14394
12.62985 -3.53000 19.38000 -1.45396 3.18554 6.41070
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
: : : : : : : :
Total clusters counted while labeling = 48 ******************************** g9.97241.1700.class: classification completed.
Last Update: $Date: 2000/12/07 19:55:11 $