sortstats [ parameter=value ... ] [ inputfile outputfile ] sortstats [ parameter=value ... ] [ inputfile ... directory ]
Parameters are: sort_vars, stat_vars, stat_functions, output_names, min_good.
The function sortstats creates datasets with 1 or 2-D variables having names specified by output_names. The following procedure is employed to calculate the output data:
1. sort_vars are used in the user-specified order as a multiple key to sort stat_vars.
2. Record for each sorted variable from stat_vars is divided into groups with the same value of a multiple sort key.
3. For each of these groups the user-specified statistics is computed, put into the output variables specified in output_names for each sort_vars, and stored into the output dataset.
4. All unique multiple keys are also stored under the same sort_vars names into the output dataset.
The following facts and requirements can be derived from the description above:
a) For parallel sorting, all sort_vars must have the same dimension as a leading dimension of stat_vars.
b) Since sort variables are stored to the output dataset, sort_vars should be different from output_names, although variables from sort_vars and stat_vars can be the same.
c) sort_vars and output_names will remain parallel, i.e. will have the same leading dimension (equal to the number of unique keys), although its size can be smaller.
Variables from input datafile(s) to be used for sorting.
Valid responses are any existing 1-D variables with the same dimension. There is no default.
Variables from input datafile(s) for which statistics are desired.
Valid responses are any existing 1 or 2-D variables with the same (leading) dimension. There is no default.
Functions to be used to compute desired statistics.
Valid response(s) can be: first, last, min, max, mean, var, or stdev. There is no default.
Names of stat_vars on output.
The default is the same as stat_vars.
Least number of values needed to compute a statistic (one per stat_var).
Valid response is any integer value. The default is raw of 1s [1...1].
The following is an example of making a small dataset with findcases to then perform sortstats on. It is best to do sortstats on a small dataset if you would like to check the accuracy of sortstats.
lapaz% findcases
in/out files : char(255) ? n11.tssort n11.fcases
include_vars : char(255) ? []
sort_var : char(255) ? [] platform_id
case_values : int (2048) ? 0^350
keep_cases : char( 3) ? [yes]
Printvar n11.fcases Page 1
date dcs_count dcs_doppler dcs_quality platform_id time
1 930728 0 265795 0 106 113047.34
2 930728 0 255314 0 106 113118.45
3 930728 0 244253 0 106 113149.57
4 930728 0 233416 0 106 113220.67
5 930728 0 222370 0 106 113251.78
6 930728 0 211928 0 106 113322.9
7 930728 0 201662 0 106 113354.02
8 930728 0 192047 0 106 113425.14
9 930728 0 183227 0 106 113456.27
10 930728 0 167705 0 106 113558.49
11 930728 0 150335 0 106 113731.86
12 930728 0 360154 0 110 113048.23
13 930728 0 357574 0 110 113118.02
14 930728 0 349999 0 110 113217.6
15 930728 0 337340 0 110 113317.17
16 930728 0 327999 0 110 113346.94
17 930728 0 315987 0 110 113416.73
18 930728 0 300614 0 110 113446.52
19 930728 0 281788 0 110 113516.31
20 930728 0 260058 0 110 113546.1
21 930728 0 236377 0 110 113615.89
22 930728 0 212674 0 110 113645.66
23 930728 0 191083 0 110 113715.45
24 930728 0 172379 0 110 113745.24
25 930728 0 156948 0 110 113815.03
26 930728 0 145077 0 110 113844.82
27 930728 0 145077 0 110 113844.82
28 930728 0 135708 0 110 113914.61
29 930728 0 128718 0 110 113944.4
30 930728 0 123206 0 110 114014.17
31 930728 0 118986 0 110 114043.96
32 930728 0 115635 0 110 114113.75
33 930728 0 113244 0 110 114143.54
34 930728 0 109784 0 110 114243.09
35 930728 0 266218 0 323 113126.68
36 930728 0 208680 0 323 113416.81
37 930728 0 186928 0 323 113541.86
Now we will do some various statistical operations. Keep in mind the sort_var and stat_var cannot be the same variable name. The sort_var is for the sorting part of this function ONLY. The stat_var is for the statistics portion of this function ONLY.
stat_vars, output_names, and stat_functions should all have the same number names/functions.
lapaz% sortstats
in/out files : char(255) ? n11.fcases n11.sortstats
sort_vars : char(255) ? [] platform_id dcs_count
stat_vars : char(255) ? [] dcs_doppler dcs_doppler date date
output_names : char(255) ? [] doppler_min doppler_max
date_min date_max
stat_functions : char(127) ? min max min max
min_good : int ( 4) ? [1 1 1 1] 2 2 2 2
lapaz%
lapaz% printvar n11.sortstats
include_vars : char(255) ? []
line_per_elem : char( 3) ? [yes]
list_dims : char( 3) ? [yes]
printout : char( 3) ? [no]
Printvar n11.sortstats Page 1
date_max date_min dcs_count doppler_max doppler_min platform_id
1 930728 930728 0 265795 150335 106
2 930728 930728 0 360154 109784 110
3 930728 930728 0 266218 186928 323
lapaz%
This example computes the mean of the dcs_doppler and time variables.
lapaz% sortstats
in/out files : char(255) ? n11.fcases n11.mean
sort_vars : char(255) ? [] platform_id
stat_vars : char(255) ? [] dcs_doppler time
output_names : char(255) ? [] doppler_mean time_mean
stat_functions : char( 63) ? mean mean
min_good : int ( 2) ? [1 1] 2 2
lapaz%
lapaz% printvar n11.mean
include_vars : char(255) ? [] -dcs_data
line_per_elem : char( 3) ? [yes]
list_dims : char( 3) ? [yes]
printout : char( 3) ? [no]
Printvar n11.mean Page 1
doppler_mean platform_id time_mean
1 211641 106 113334.23
2 217235 110 113659.91
3 220608 323 113348.45
lapaz%
If you need to run more than one statistical function on a variable then make sure to list that variable the number of times that corresponds with the number of functions to be run on that variable.
stat_vars, output_names, and stat_functions should all have the same number of elements stated. Example: if a variable is listed four times in the stat_vars portions then there should be four output names, and four stat functions.
Last Update: $Date: 1999/05/10 20:46:23 $