eof - Computes empirical orthogonal functions

SYNOPSIS

eof  [ parameter=value ... ]  [ inputfile outputfile ]
eof  [ parameter=value ... ]  [ inputfile ... directory ]

Parameters are: include_vars, across_vars, new_var_name, eof_dim, modes, burst_modes, max_iter.

DESCRIPTION

eof computes empirical orthogonal functions (EOFs) of multi-dimensional data, where different instances of the variable to be decomposed occur along the dimension eof_dim (see laminate). The EOFs are computed directly via Singular Value Decomposition. This method is useful in cases where there are more points in a given instance of the data field than there are instances of the data field, since it does not require the calculation of the covariance matrix. modes specifies the number of EOFs to retain in the output dataset. eof reports the eigenvalues and percent of variance accounted for by each of the retained EOFs.

For each input variable "X", eof writes the EOF vectors, expansion coefficients and eigenvalues as output variables named "X_vec", "X_amp", and "X_var", respectively. In addition, the percentage of variance accounted for by each of the eigenvalues is written to a variable named "percent_var". The dimensions of "X_vec" are the same as "X", except that the eof_dim is replaced by a dimension named "mode" having size modes. Additionally, eof writes the total data variance, total variance captured by all the modes, variance in the retained modes, and the percent of variance in the retained modes in attributes named "X_d_var", "X_m_var", "X_r_var", and "X_p_var", respectively.

The vectors are normalized to have variance equal to the associated eigenvalue and thus have the same units as the input data. The expansion coefficient series are unit normalized. The output variables are all written to the output dataset as type GP_FLOAT (see include/gp.h). Eigenvalues that are smaller than 1.0E-5 times the largest eigenvalue are set equal to zero. If bad values occur in the input variable, they are set equal to zero before the computation of the eigenstructure. In this case, it is assumed that the "mean" of the data has been removed (see anomaly); thus the bad values take on the "mean" value.

If the user selects across_vars=yes, the input variables are effectively laminated together to create a single new input variable. For this to work, all the input variables must have the same dimensions. After this lamination, the original input variables are ignored, and EOFs are computed only for the new input variable. This new input variable has one additional dimension, whose length is equal to the number of original input variables. The name of this added dimension is set via the eof_dim parameter.

The across_vars=yes option is useful when computing EOFs across differents sets of sensor channels.

For each input variable X, the actual modes (eigenvectors) are stored in an output variable X_vec. If there are M modes, and if the dimensions of a given input variable X are N1, N2, ..., Nj, ..., Nm, where Nj is the eof_dim, then the modes (eigenvectors) are stored in an output variable with dimensions M, N1, N2, ..., Nj-1, Nj+1, ..., Nm.

If the user selects burst_modes=yes, then for each input variable X, the individual modes (eigenvectors) are also stored in separate variables. The names of these separate eigenvector variables are X_vec_##, where ## is the number of the mode. Using the dimensions from the previous paragraph's example, each eigenvector variable has dimensions N1, N2, ..., Nj-1, Nj+1, ..., Nm.

PARAMETERS

include_vars

Specifies which variables to compute EOFs for. Each variable must have the eof_dim specified below.

The default is all variables in the input dataset(s).

across_vars

Specifies whether or not all the input variables are to be effectively laminated together into a single new input varaiable. The default is no, meaning that input variables are not combined.

new_var_name

If across_vars=yes, this parameter sets the name of resulting single new input variable. There is no default.

eof_dim

Specifies the dimension about which to compute the EOFs. The different instances of the variable are assumed to occur along this dimension.

There is no default.

modes

Specifies the number of EOFs to retain and write to the output dataset.

Valid responses are [> 0 and <= size of eof_dim]. The default is 10.

max_iter

[OPTIONAL] Specifies the maximum number of iterations to be performed by the svd computation routine. If convergence is not achieved with the number of iterations specified, the program aborts with an error message.

Valid range is [ > 0 ]. The default is 30.

burst_modes

Specifies whether or not variables corresponding to individual eigenvectors are to be created in the output datasets. The default is yes.

EXAMPLES

The dataset sstsw.tdf contains two three-dimensional variables, sea surface temperature (sst) and surface shortwave (sw). The variables have dimensions month (=78), line (=13), and sample (=20) which represent, time, latitude, and longitude. The mean annual cycle of the data has been removed using anomaly. The following example shows how to compute and store the ten principle EOFs in the dataset named sstsweof.tdf. Note: since the two variables contain "bad" values, e.g. no sst over land, the bad value statistics are reported.

% eof sstsw.tdf sstsweof.tdf
include_vars   : char(255) ? [] sst sw
across_vars    : char(  3) ? [no]
eof_dim        : char( 31) ? month
modes          : int       ? [10]
burst_modes    : char(  3) ? [yes] no

Processing sstsw.tdf

sst    mean:    0.0935385    variance:      0.30618

Number of bad values found are:          468
Number per sample             :            6
Fraction of total data        :    0.0230769
These bad values set to zero.

Eigen Number      Eigen Value      Percent Variance      Summed Variance
     1                0.1387            45.288                45.288
     2                0.0368            12.019                57.307
     3                0.0200             6.517                63.824
     4                0.0114             3.716                67.540
     5                0.0104             3.395                70.935
     6                0.0079             2.596                73.530
     7                0.0064             2.091                75.622
     8                0.0063             2.046                77.668
     9                0.0059             1.920                79.588
    10                0.0051             1.666                81.255



sw    mean:  -1.2678e-07    variance:      61.4444

Number of bad values found are:         1404
Number per sample             :           18
Fraction of total data        :    0.0692308
These bad values set to zero.

Eigen Number      Eigen Value      Percent Variance      Summed Variance
     1               11.8119            19.225                19.225
     2                5.9475             9.680                28.905
     3                4.5616             7.424                36.329
     4                4.1016             6.676                43.005
     5                3.6781             5.986                48.991
     6                3.0665             4.991                53.982
     7                2.5657             4.176                58.158
     8                2.4878             4.049                62.207
     9                2.0233             3.293                65.500
    10                1.7262             2.809                68.310


The contents of the output dataset is:

% contents sstsweof.tdf
printout       : char(  3) ? [no]
Contents of File: sstsweof.tdf    Page 1

Dimension       Size            Coord           Scale      Offset
 mode              10            ?                  1           0
 month             78            time               1           0
 line              13            y                  1           0
 sample            20            x                  1           0

Attribute       Type            Units           Value
 sst_d_var       double          (Celsius)^2     0.30618
 sst_m_var       double          (Celsius)^2     0.306198
 sst_r_var       double          (Celsius)^2     0.2488
 sst_p_var       double          percent         81.2546
 sw_d_var        double          (W/m2)^2        61.4444
 sw_m_var        double          (W/m2)^2        61.4413
 sw_r_var        double          (W/m2)^2        41.9703
 sw_p_var        double          percent         68.3095
 history         byte

Variable        Type            Units
 sst_amp         float
 sst_vec         float           Celsius
 sst_var         float           (Celsius)^2
 sst_perc        float           percent
 sw_amp          float
 sw_vec          float           W/m2
 sw_var          float           (W/m2)^2
 sw_perc         float           percent

Variable        Dimension       Size
 sst_amp         mode              10
 sst_amp         month             78
 sst_vec         mode              10
 sst_vec         line              13
 sst_vec         sample            20
 sst_var         mode              10
 sst_perc        mode              10
 sw_amp          mode              10
 sw_amp          month             78
 sw_vec          mode              10
 sw_vec          line              13
 sw_vec          sample            20
 sw_var          mode              10
 sw_perc         mode              10

Variable            BadValue    ValidMin    ValidMax       Scale      Offset
 sst_amp         -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sst_vec         -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sst_var         -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sst_perc        -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sw_amp          -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sw_vec          -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sw_var          -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sw_perc         -3.4028e+38 -3.4028e+38  3.4028e+38           1           0

SEE ALSO

datasets, eof_overview, spectral, eoffilt, eofproj, svd, cca, linfit, emath, laminate, dimavg, magnify, xcorrel, anomaly.

NOTES

Memory allocation errors generally mean the variable's sizes (the sizes not associated with svd_dim) produce a matrix decomposition that is too large to compute. In this case the dataspace has to be reduced (see subset or magnify).


Last Update: $Date: 2002/05/07 23:51:20 $