cca [ parameter=value ... ] [ inputfile outputfile ] ccamod [ parameter=value ... ] [ inputfile outputfile ]
Parameters for cca are: predictor, predictand, cca_dim, modes, covar, model, cca_file, instantiate, max_iter.
Parameters for ccamod are: cca_file, predictor, predictand, cca_dim, modes.
cca (Canonical Correlation Analysis) computes empirical modes of covariability between the EOF eigenstructures (see eof) of two multi-dimensional variables (predictor and predictand), The modes of covariability are computed from the covariance matrix of the EOF expansion coefficients via Singular Value Decomposition. This produces predictor and predictand eigenvectors, eigenvalues and expansion coefficents. The distinction between predictor and predictand is only important in the use ccamod. cca reports the: 1) eigenvalue(s) for each mode (which represent the correlation between the expansion coefficients for that mode), 2) [optionally] the squared covariance accounted for by each mode, and 3) the amount of predictor and predictand variance accounted for individually by each mode. Essentially cca constructs linear combinations of EOFs which have resulting expansion coefficients with maximum correlation, as opposed to svd which maximizes squared covariance explained by each mode.
For each pair of predictor and predictand variables "X" and "Y", cca expects to find corresponding EOF eigenstructures in the input dataset (see eof). Upon performing the covariability analysis, cca writes the linear weights of the EOFs, eigenvectors, expansion coefficients, variances, eigenvalues (correlations), and optionally the squared covariances of the modes as output variables named "X_wght" & "Y_wght", "X_vec" & "Y_vec", "X_amp" & "Y_amp", "X_var" & "Y_var", "covariance^2" and "correlation", respectively. The first mode represents the linear combination of EOFs whose expansion coefficients have the greatest correlation.
The dimensions of "X_vec" and "Y_vec" in the output file (cca eigenvectors) are the same as "X_vec" and "Y_vec" in the input file (EOF eigenvectors), respectively, except that the mode dimension of each has size modes. Except for the mode dimension, the other dimensions or the number of dimensions of "X_vec" and "Y_vec" need not be the same.
The output variables are all written to the output dataset as type GP_FLOAT (see include/gp.h). Eigenvalues that are smaller than 1.0E-5 times the largest eigenvalue are set equal to zero.
cca also writes several attributes to the output dataset. These include the total data variance, variance in the retained modes, and the percent of variance in the retained modes in attributes named "X_d_var" & "Y_d_var", "X_r_var" & "Y_r_var", and "X_p_var" & "Y_p_var", respectively. Optionally, the squared covariance in the retained modes is written to an attribute named "r_covar".
Optionally, cca can output a cca_file dataset. This file is is used as a model to transform instances of a predictor variable to a a "best-fit" predictand using ccamod (see EXAMPLES)
Specifies the predictor variable. In the case of cca, it is the prefix to the input and output predictor eigenstructure, e.g. "X", in the description above. In the case of ccamod, it is the name of the input predictor variable.
There is no default.
Specifies the predictand variable. In the case of cca, it is the prefix to the input and output predictand eigenstructures, e.g. "Y", in the description above. In the case of ccamod, it is both the name of the output predictand variable and the prefix to the predictand eigenstructure in the cca_file.
There is no default.
Specifies the dimension along which to extract the separate instances of the predictor and predictand variables. For cca, this would be the second dimension of the input eigenstructure expansion coefficients.
There is no default.
In the case of cca, specifies the number of modes to use from each of the predictor and predictand eigenstructures, respectively. The number of modes in the output eigenstructures will be the smaller of the two numbers. In the case of ccamod, specifies the number of modes to use in computing the predictand variable.
Valid ranges is [ > 0 and <= size of "mode" dimension]. There is no default.
Computes the squared covariances of each of the modes
The valid responses are [yes or no]. The default is no.
Specifies whether or not to create a cca model output file. This is only necessary if modeling predictand data from "new" predictor data is desired using ccamod.
Valid responses are [yes or no]. The default is no.
In the case of cca and when model=yes, specifies the name of the model output dataset to use as the cca_file input to ccamod. In the case of ccamod, specifies the name of the input model dataset used to compute the output predictand variable based on the input predictor variable.
There is no default.
In the case of cca and when model=yes, specifies whether or not to instantiate the variables in the cca_file output file. Since some of the variables in the cca_file are copies of those in the cca input and output files, they are by default written only as linked variables.
Valid responses are [yes or no]. The default is no.
[OPTIONAL] Specifies the maximum number of iterations to be performed by the svd computation routine. If convergence is not achieved with the number of iterations specified, the program aborts with an error message.
Valid range is [ > 0 ]. The default is 30.
This example illustrates the use of cca and ccamod for variables consisting of monthly sea surface temperature (sst) and surface shortwave (sw) for the tropical Pacific ocean. The eigenstructures for these data were already computed in the EXAMPLES section of eof and written to sstsweof.tdf. The idea of running this procedure is to determine the empirical field-to-field relationships between sst and sw. Since one might presume that monthly patterns of sst may be related to monthly patterns of sw, especially at El Nino time scales, it may be helpful in assessing the feedbacks between sst and sw. We assume that most of the relationship will be captured in the first few modes so we will save the first ten.
% cca sstsweof.tdf sstswcca.tdf predictor : char( 31) ? sst predictand : char( 31) ? sw cca_dim : char( 31) ? month modes : int ( 2) ? 10 10 covar : char( 3) ? [no] y model : char( 3) ? [no] y cca_file : char(255) ? ccamod.tdf instantiate : char( 3) ? [no] y Mode Correlation Covariance 1 0.94 82164.8 2 0.84 1228.1 3 0.77 2615.35 4 0.67 2608.44 5 0.57 1110.37 6 0.37 166.588 7 0.23 143.565 8 0.11 22.218 9 0.06 14.603 10 0.00 0.0185595 Mode P'or-var %-Var %-Sum P'and-var %-Var %-Sum 1 0.125023 40.831 40.831 10.9629 17.843 17.843 2 0.00845389 2.761 43.592 3.0575 4.976 22.819 3 0.0168736 5.511 49.102 3.86987 6.298 29.118 4 0.0190065 6.207 55.310 4.59006 7.471 36.588 5 0.0116065 3.791 59.100 4.3873 7.141 43.729 6 0.00733578 2.396 61.496 2.51116 4.087 47.816 7 0.00909014 2.969 64.464 4.37355 7.118 54.934 8 0.00989101 3.230 67.695 2.6109 4.249 59.184 9 0.019424 6.344 74.038 3.08003 5.013 64.197 10 0.022096 7.216 81.255 2.527 4.113 68.310
The output indicates that a large fraction of the covariability is captured by the first mode, and that the predictor and predictand expansion coefficients of this mode have a correlation of 0.94. The next few modes still account for some non-trivial squared covariance and have fairly high correlations. The first mode accounts for about 40% of the sst 17% of the sw variability. The contents of the output dataset is:
% contents sstswcca.tdf printout : char( 3) ? [no] Contents of File: sstswcca.tdf Page 1 Dimension Size Coord Scale Offset mode 10 ? 1 0 sstmode 10 ? 1 0 swmode 10 ? 1 0 line 13 y 1 0 sample 20 x 1 0 month 78 time 1 0 Attribute Type Units Value sst_m_var double (Celsius)^2 0.306198 sst_d_var double (Celsius)^2 0.30618 sw_m_var double (W/m2)^2 61.4413 sw_d_var double (W/m2)^2 61.4444 r_covar double (Celsius*W/m2)^2 90074.1 sst_r_var double (Celsius)^2 0.2488 sst_p_var double (Celsius)^2 81.2546 sw_r_var double (W/m2)^2 41.9703 sw_p_var double (W/m2)^2 68.3095 history byte Variable Type Units sst_wght float sw_wght float sst_var double (Celsius)^2 sw_var double (W/m2)^2 sst_vec float Celsius sw_vec float W/m2 sst_amp float sw_amp float correlation float correlation covariance^2 float (Celsius*W/m2)^2 Variable Dimension Size sst_wght mode 10 sst_wght sstmode 10 sw_wght mode 10 sw_wght swmode 10 sst_var mode 10 sw_var mode 10 sst_vec mode 10 sst_vec line 13 sst_vec sample 20 sw_vec mode 10 sw_vec line 13 sw_vec sample 20 sst_amp mode 10 sst_amp month 78 sw_amp mode 10 sw_amp month 78 correlation mode 10 covariance^2 mode 10 Variable BadValue ValidMin ValidMax Scale Offset sst_wght -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 sw_wght -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 sst_var -3.4028e+38 -1.7977e+308 1.7977e+308 1 0 sw_var -3.4028e+38 -1.7977e+308 1.7977e+308 1 0 sst_vec -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 sw_vec -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 sst_amp -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 sw_amp -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 correlation -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 covariance^2 -3.4028e+38 -3.4028e+38 3.4028e+38 1 0
Now suppose we have another dataset that has the sst variable in it from a different time period but we don't have the corresponding sw variable for the same time period. We can use ccamod and the cca_file dataset derived above to "predict" the shortwave based on the sst. In this example, the new sst is in the variable called testnew.tdf and we are writing the predicted shortwave to the dataset called testmod.tdf. In this case, we will use all 10 of the modes extracted from above to construct the shortwave data.
% ccamod testnew.tdf testmod.tdf cca_file : char( 31) ? ccamod.tdf predictor : char( 31) ? sst predictand : char( 31) ? sw cca_dim : char( 31) ? month modes : int ? [10]
datasets, spectral, eof, eof, eoffilt, eofproj, svd, subset, linfit, emath, nhood, dimavg, laminate, magnify, xcorrel, anomaly.
Memory allocation errors generally mean the two variables' sizes (the sizes not associated with svd_dim) produce a covariance matrix that is too large to compute. In this case the dataspace has to be reduced (see subset or magnify).
References Bretherton et al. 1992: An Intercomparison of methods for finding coupled patterns in climate data. J.Climate, 5, pg 541-560.
Last Update: $Date: 2000/12/07 19:54:54 $