cca - Computes modes of covariability between two variables

SYNOPSIS

cca [ parameter=value ... ] [ inputfile outputfile ]

ccamod [ parameter=value ... ] [ inputfile outputfile ]

Parameters for cca are: predictorpredictandcca_dimmodescovarmodel,   cca_fileinstantiate,   max_iter.

Parameters for ccamod are: cca_filepredictorpredictandcca_dimmodes.

DESCRIPTION

cca (Canonical Correlation Analysis) computes empirical modes of covariability between the EOF eigenstructures (see eof) of two multi-dimensional variables (predictor and predictand), The modes of covariability are computed from the covariance matrix of the EOF expansion coefficients via Singular Value Decomposition. This produces predictor and predictand eigenvectors, eigenvalues and expansion coefficents. The distinction between predictor and predictand is only important in the use ccamod. cca reports the: 1) eigenvalue(s) for each mode (which represent the correlation between the expansion coefficients for that mode), 2) [optionally] the squared covariance accounted for by each mode, and 3) the amount of predictor and predictand variance accounted for individually by each mode. Essentially cca constructs linear combinations of EOFs which have resulting expansion coefficients with maximum correlation, as opposed to svd which maximizes squared covariance explained by each mode.

For each pair of predictor and predictand variables "X" and "Y", cca expects to find corresponding EOF eigenstructures in the input dataset (see eof). Upon performing the covariability analysis, cca writes the linear weights of the EOFs, eigenvectors, expansion coefficients, variances, eigenvalues (correlations), and optionally the squared covariances of the modes as output variables named "X_wght" & "Y_wght", "X_vec" & "Y_vec", "X_amp" & "Y_amp", "X_var" & "Y_var", "covariance^2" and "correlation", respectively. The first mode represents the linear combination of EOFs whose expansion coefficients have the greatest correlation.

The dimensions of "X_vec" and "Y_vec" in the output file (cca eigenvectors) are the same as "X_vec" and "Y_vec" in the input file (EOF eigenvectors), respectively, except that the mode dimension of each has size modes. Except for the mode dimension, the other dimensions or the number of dimensions of "X_vec" and "Y_vec" need not be the same.

The output variables are all written to the output dataset as type GP_FLOAT (see include/gp.h). Eigenvalues that are smaller than 1.0E-5 times the largest eigenvalue are set equal to zero.

cca also writes several attributes to the output dataset. These include the total data variance, variance in the retained modes, and the percent of variance in the retained modes in attributes named "X_d_var" & "Y_d_var", "X_r_var" & "Y_r_var", and "X_p_var" & "Y_p_var", respectively. Optionally, the squared covariance in the retained modes is written to an attribute named "r_covar".

Optionally, cca can output a cca_file dataset. This file is is used as a model to transform instances of a predictor variable to a a "best-fit" predictand using ccamod (see EXAMPLES)

PARAMETERS

predictor

Specifies the predictor variable. In the case of cca, it is the prefix to the input and output predictor eigenstructure, e.g. "X", in the description above. In the case of ccamod, it is the name of the input predictor variable.

There is no default.

predictand

Specifies the predictand variable. In the case of cca, it is the prefix to the input and output predictand eigenstructures, e.g. "Y", in the description above. In the case of ccamod, it is both the name of the output predictand variable and the prefix to the predictand eigenstructure in the cca_file.

There is no default.

cca_dim

Specifies the dimension along which to extract the separate instances of the predictor and predictand variables. For cca, this would be the second dimension of the input eigenstructure expansion coefficients.

There is no default.

modes

In the case of cca, specifies the number of modes to use from each of the predictor and predictand eigenstructures, respectively. The number of modes in the output eigenstructures will be the smaller of the two numbers. In the case of ccamod, specifies the number of modes to use in computing the predictand variable.

Valid ranges is [ > 0 and <= size of "mode" dimension]. There is no default.

covar

Computes the squared covariances of each of the modes

The valid responses are [yes or no].  The default is no.

model

Specifies whether or not to create a cca model output file. This is only necessary if modeling predictand data from "new" predictor data is desired using ccamod.

Valid responses are [yes or no]. The default is no.

cca_file

In the case of cca and when model=yes, specifies the name of the model output dataset to use as the cca_file input to ccamod. In the case of ccamod, specifies the name of the input model dataset used to compute the output predictand variable based on the input predictor variable.

There is no default.

instantiate

In the case of cca and when model=yes, specifies whether or not to instantiate the variables in the cca_file output file. Since some of the variables in the cca_file are copies of those in the cca input and output files, they are by default written only as linked variables.

Valid responses are [yes or no]. The default is no.

max_iter

[OPTIONAL] Specifies the maximum number of iterations to be performed by the svd computation routine. If convergence is not achieved with the number of iterations specified, the program aborts with an error message.

Valid range is [ > 0 ]. The default is 30.

EXAMPLES

This example illustrates the use of cca and ccamod for variables consisting of monthly sea surface temperature (sst) and surface shortwave (sw) for the tropical Pacific ocean. The eigenstructures for these data were already computed in the EXAMPLES section of eof and written to sstsweof.tdf. The idea of running this procedure is to determine the empirical field-to-field relationships between sst and sw. Since one might presume that monthly patterns of sst may be related to monthly patterns of sw, especially at El Nino time scales, it may be helpful in assessing the feedbacks between sst and sw. We assume that most of the relationship will be captured in the first few modes so we will save the first ten.

% cca sstsweof.tdf sstswcca.tdf
predictor      : char( 31) ? sst
predictand     : char( 31) ? sw
cca_dim        : char( 31) ? month
modes          : int (  2) ? 10 10
covar          : char(  3) ? [no] y
model          : char(  3) ? [no] y
cca_file       : char(255) ? ccamod.tdf
instantiate    : char(  3) ? [no] y


Mode   Correlation  Covariance
  1      0.94          82164.8
  2      0.84           1228.1
  3      0.77          2615.35
  4      0.67          2608.44
  5      0.57          1110.37
  6      0.37          166.588
  7      0.23          143.565
  8      0.11           22.218
  9      0.06           14.603
 10      0.00        0.0185595

Mode    P'or-var    %-Var    %-Sum     P'and-var   %-Var    %-Sum
  1     0.125023    40.831   40.831      10.9629   17.843   17.843
  2   0.00845389     2.761   43.592       3.0575    4.976   22.819
  3    0.0168736     5.511   49.102      3.86987    6.298   29.118
  4    0.0190065     6.207   55.310      4.59006    7.471   36.588
  5    0.0116065     3.791   59.100       4.3873    7.141   43.729
  6   0.00733578     2.396   61.496      2.51116    4.087   47.816
  7   0.00909014     2.969   64.464      4.37355    7.118   54.934
  8   0.00989101     3.230   67.695       2.6109    4.249   59.184
  9     0.019424     6.344   74.038      3.08003    5.013   64.197
 10     0.022096     7.216   81.255        2.527    4.113   68.310

The output indicates that a large fraction of the covariability is captured by the first mode, and that the predictor and predictand expansion coefficients of this mode have a correlation of 0.94. The next few modes still account for some non-trivial squared covariance and have fairly high correlations. The first mode accounts for about 40% of the sst 17% of the sw variability. The contents of the output dataset is:

% contents sstswcca.tdf
printout       : char(  3) ? [no]
Contents of File: sstswcca.tdf    Page 1

Dimension       Size            Coord           Scale      Offset
 mode              10            ?                  1           0
 sstmode           10            ?                  1           0
 swmode            10            ?                  1           0
 line              13            y                  1           0
 sample            20            x                  1           0
 month             78            time               1           0

Attribute       Type            Units           Value
 sst_m_var       double          (Celsius)^2     0.306198
 sst_d_var       double          (Celsius)^2     0.30618
 sw_m_var        double          (W/m2)^2        61.4413
 sw_d_var        double          (W/m2)^2        61.4444
 r_covar         double          (Celsius*W/m2)^2 90074.1
 sst_r_var       double          (Celsius)^2     0.2488
 sst_p_var       double          (Celsius)^2     81.2546
 sw_r_var        double          (W/m2)^2        41.9703
 sw_p_var        double          (W/m2)^2        68.3095
 history         byte

Variable        Type            Units
 sst_wght        float
 sw_wght         float
 sst_var         double          (Celsius)^2
 sw_var          double          (W/m2)^2
 sst_vec         float           Celsius
 sw_vec          float           W/m2
 sst_amp         float
 sw_amp          float
 correlation     float           correlation
 covariance^2    float           (Celsius*W/m2)^2

Variable        Dimension       Size
 sst_wght        mode              10
 sst_wght        sstmode           10
 sw_wght         mode              10
 sw_wght         swmode            10
 sst_var         mode              10
 sw_var          mode              10
 sst_vec         mode              10
 sst_vec         line              13
 sst_vec         sample            20
 sw_vec          mode              10
 sw_vec          line              13
 sw_vec          sample            20
 sst_amp         mode              10
 sst_amp         month             78
 sw_amp          mode              10
 sw_amp          month             78
 correlation     mode              10
 covariance^2    mode              10

Variable            BadValue    ValidMin    ValidMax       Scale      Offset
 sst_wght        -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sw_wght         -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sst_var         -3.4028e+38 -1.7977e+308 1.7977e+308           1           0
 sw_var          -3.4028e+38 -1.7977e+308 1.7977e+308           1           0
 sst_vec         -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sw_vec          -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sst_amp         -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 sw_amp          -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 correlation     -3.4028e+38 -3.4028e+38  3.4028e+38           1           0
 covariance^2    -3.4028e+38 -3.4028e+38  3.4028e+38           1           0

Now suppose we have another dataset that has the sst variable in it from a different time period but we don't have the corresponding sw variable for the same time period. We can use ccamod and the cca_file dataset derived above to "predict" the shortwave based on the sst. In this example, the new sst is in the variable called testnew.tdf and we are writing the predicted shortwave to the dataset called testmod.tdf. In this case, we will use all 10 of the modes extracted from above to construct the shortwave data.

% ccamod testnew.tdf testmod.tdf
cca_file       : char( 31) ? ccamod.tdf
predictor      : char( 31) ? sst
predictand     : char( 31) ? sw
cca_dim        : char( 31) ? month
modes          : int       ? [10]

SEE ALSO

datasets, spectral, eof, eof, eoffilt, eofproj, svd, subset, linfit, emath, nhood, dimavg, laminate, magnify, xcorrel, anomaly.

NOTES

Memory allocation errors generally mean the two variables' sizes (the sizes not associated with svd_dim) produce a covariance matrix that is too large to compute. In this case the dataspace has to be reduced (see subset or magnify).

References Bretherton et al. 1992: An Intercomparison of methods for finding coupled patterns in climate data. J.Climate, 5, pg 541-560.


Last Update: $Date: 2000/12/07 19:54:54 $