sortstats - Computes statistics across records with same sort key.

SYNOPSIS

sortstats  [ parameter=value ... ]  [ inputfile outputfile ]
sortstats  [ parameter=value ... ]  [ inputfile ... directory ]

Parameters are: sort_vars, stat_vars, stat_functions, output_names, min_good.

DESCRIPTION

The function sortstats creates datasets with 1 or 2-D variables having names specified by output_names. The following procedure is employed to calculate the output data:

1. sort_vars are used in the user-specified order as a multiple key to sort stat_vars.

2. Record for each sorted variable from stat_vars is divided into groups with the same value of a multiple sort key.

3. For each of these groups the user-specified statistics is computed, put into the output variables specified in output_names for each sort_vars, and stored into the output dataset.

4. All unique multiple keys are also stored under the same sort_vars names into the output dataset.

The following facts and requirements can be derived from the description above:

a) For parallel sorting, all sort_vars must have the same dimension as a leading dimension of stat_vars.

b) Since sort variables are stored to the output dataset, sort_vars should be different from output_names, although variables from sort_vars and stat_vars can be the same.

c) sort_vars and output_names will remain parallel, i.e. will have the same leading dimension (equal to the number of unique keys), although its size can be smaller.

PARAMETERS

sort_vars

Variables from input datafile(s) to be used for sorting.

Valid responses are any existing 1-D variables with the same dimension. There is no default.

stat_vars

Variables from input datafile(s) for which statistics are desired.

Valid responses are any existing 1 or 2-D variables with the same (leading) dimension. There is no default.

stat_functions

Functions to be used to compute desired statistics.

Valid response(s) can be: first, last, min, max, mean, var, or stdev. There is no default.

output_names

Names of stat_vars on output.

The default is the same as stat_vars.

min_good

Least number of values needed to compute a statistic (one per stat_var).

Valid response is any integer value. The default is raw of 1s [1...1].

EXAMPLES

The following is an example of making a small dataset with findcases to then perform sortstats on. It is best to do sortstats on a small dataset if you would like to check the accuracy of sortstats.

lapaz% findcases
in/out files   : char(255) ? n11.tssort n11.fcases
include_vars   : char(255) ? []
sort_var       : char(255) ? [] platform_id
case_values    : int (2048) ? 0^350
keep_cases     : char(  3) ? [yes]

Printvar  n11.fcases  Page 1

                date   dcs_count dcs_doppler dcs_quality platform_id            time
       1      930728           0      265795           0         106       113047.34
       2      930728           0      255314           0         106       113118.45
       3      930728           0      244253           0         106       113149.57
       4      930728           0      233416           0         106       113220.67
       5      930728           0      222370           0         106       113251.78
       6      930728           0      211928           0         106        113322.9
       7      930728           0      201662           0         106       113354.02
       8      930728           0      192047           0         106       113425.14
       9      930728           0      183227           0         106       113456.27
      10      930728           0      167705           0         106       113558.49
      11      930728           0      150335           0         106       113731.86
      12      930728           0      360154           0         110       113048.23
      13      930728           0      357574           0         110       113118.02
      14      930728           0      349999           0         110        113217.6
      15      930728           0      337340           0         110       113317.17
      16      930728           0      327999           0         110       113346.94
      17      930728           0      315987           0         110       113416.73
      18      930728           0      300614           0         110       113446.52
      19      930728           0      281788           0         110       113516.31
      20      930728           0      260058           0         110        113546.1
      21      930728           0      236377           0         110       113615.89
      22      930728           0      212674           0         110       113645.66
      23      930728           0      191083           0         110       113715.45
      24      930728           0      172379           0         110       113745.24
      25      930728           0      156948           0         110       113815.03
      26      930728           0      145077           0         110       113844.82
      27      930728           0      145077           0         110       113844.82
      28      930728           0      135708           0         110       113914.61
      29      930728           0      128718           0         110        113944.4
      30      930728           0      123206           0         110       114014.17
      31      930728           0      118986           0         110       114043.96
      32      930728           0      115635           0         110       114113.75
      33      930728           0      113244           0         110       114143.54
      34      930728           0      109784           0         110       114243.09
      35      930728           0      266218           0         323       113126.68
      36      930728           0      208680           0         323       113416.81
      37      930728           0      186928           0         323       113541.86

Now we will do some various statistical operations. Keep in mind the sort_var and stat_var cannot be the same variable name. The sort_var is for the sorting part of this function ONLY. The stat_var is for the statistics portion of this function ONLY.

stat_vars, output_names, and stat_functions should all have the same number names/functions.

lapaz% sortstats
in/out files   : char(255) ? n11.fcases n11.sortstats
sort_vars      : char(255) ? [] platform_id dcs_count
stat_vars      : char(255) ? [] dcs_doppler dcs_doppler date date
output_names   : char(255) ? [] doppler_min doppler_max
                                date_min date_max
stat_functions : char(127) ? min max min max
min_good       : int (  4) ? [1 1 1 1] 2 2 2 2
lapaz%

lapaz% printvar n11.sortstats
include_vars   : char(255) ? []
line_per_elem  : char(  3) ? [yes]
list_dims      : char(  3) ? [yes]
printout       : char(  3) ? [no]
Printvar  n11.sortstats  Page 1

      date_max    date_min   dcs_count doppler_max doppler_min platform_id
 1      930728      930728           0      265795      150335         106
 2      930728      930728           0      360154      109784         110
 3      930728      930728           0      266218      186928         323
lapaz%

This example computes the mean of the dcs_doppler and time variables.

lapaz% sortstats
in/out files   : char(255) ? n11.fcases n11.mean
sort_vars      : char(255) ? [] platform_id
stat_vars      : char(255) ? [] dcs_doppler time
output_names   : char(255) ? [] doppler_mean time_mean
stat_functions : char( 63) ? mean mean
min_good       : int (  2) ? [1 1] 2 2
lapaz%

lapaz% printvar n11.mean
include_vars   : char(255) ? [] -dcs_data
line_per_elem  : char(  3) ? [yes]
list_dims      : char(  3) ? [yes]
printout       : char(  3) ? [no]
Printvar  n11.mean  Page 1

         doppler_mean platform_id       time_mean
       1       211641         106       113334.23
       2       217235         110       113659.91
       3       220608         323       113348.45
lapaz%

SEE ALSO

tssort, findcases, datasets

NOTES

If you need to run more than one statistical function on a variable then make sure to list that variable the number of times that corresponds with the number of functions to be run on that variable.

stat_vars, output_names, and stat_functions should all have the same number of elements stated. Example: if a variable is listed four times in the stat_vars portions then there should be four output names, and four stat functions.


Last Update: $Date: 1999/05/10 20:46:23 $