svd [ parameter=value ... ] [ inputfile outputfile ] svdmod [ parameter=value ... ] [ inputfile outputfile ]
Parameters for svd are: predictor, predictand, svd_dim, modes, max_iter.
Parameters for svdmod are: svd_file, predictor, predictand, svd_dim, modes.
svd (Singular Value Decomposition) computes empirical modes of covariability between sequences of two multi-dimensional variables (predictor and predictand), where different instances of the variables are assumed to be along the dimension svd_dim (see laminate). The modes of covariability are computed directly from the covariance matrix via Singular Value Decomposition. This produces predictor and predictand eigenvectors, eigenvalues and expansion coefficients. The distinction between predictor and predictand is only important in the use svdmod. svd reports the: 1) eigenvalue(s) for each mode (which represent the amount of squared covariance accounted for by that mode), 2) the correlation between the expansion coefficients for each mode, and 3) the amount of predictor and predictand variance accounted for individually by each mode.
For each pair of predictor and predictand variables "X" and "Y", svd writes the eigenvectors, expansion coefficients, variances, eigenvalues, and correlations as output variables named "X_vec" & "Y_vec", "X_amp" & "Y_amp", "X_var" & "Y_var", "covariance^2" and "correlation" respectively. The eigenvalues represent the squared covariance accounted for by each mode. The first mode represents the pair of patterns accounting for the most squared covariance in the dataset. The correlation is computed between the expansion coefficients for the predictor and predictand modes, i.e. "X_amp" and "Y_amp". Note: svd maximizes the squared covariance accounted for by each mode, while cca maximizes the correlation between between the expansion coefficients for each mode.
The dimensions of "X_vec" and "Y_vec" are the same as "X" and "Y", respectively, except that the eof_dim dimension of each (which must be present in each) is replaced by a dimension named "mode" having size modes. Except for the eof_dim dimension, the other dimensions or the number of dimensions of "X" and "Y" need not be the same. Each eigenvector pair is normalized to have squared covariance equal to the associated eigenvalue and thus individually the eigenvectors have the same units as the input data. The expansion coefficient series are unit normalized. The predictor and predictand variances of each mode are multiplied by the number of points in the eigenvector.
The output variables are all written to the output dataset as type GP_FLOAT (see include/gp.h). Eigenvalues that are smaller than 1.0E-5 times the largest eigenvalue are set equal to zero. If bad values occur in the input variable, they are set equal to zero before the computation of the eigenstructure. In this case, it is assumed that the "mean" of the data has been removed (see anomaly); thus the bad values take on the "mean" value.
svd also writes several attributes to the output dataset. These include the total data variance, variance in the retained modes, and the percent of variance in the retained modes in attributes named "X_d_var" & "Y_d_var", "X_r_var" & "Y_r_var", and "X_p_var" & "Y_p_var", respectively. Additionally, the total data squared covariance, total squared covariance captured by all the modes, squared covariance in the retained modes, and the percent of squared covariance in the retained modes is written to attributes named "d_covar", "m_covar", "r_covar", and "p_covar", respectively.
svdmod is used as a model to transform instances of a predictor variable in the inputfile to a "best-fit" predictand based on the results included in an svd_file, i.e. an output dataset from svd. Except for the mode and svd_dim dimensions, the predictor variable in the inputfile and the predictor modes in the svd_file must have the same dimensions. The output predictand variable will have the same dimensions as the predictand modes in the svd_file, but will have the svd_dim dimension instead of the dimension mode. The predictand variable is written to the outputfile.
Specifies the predictor variable. This is an input variable obtained from the input dataset for both svd and svdmod.
There is no default.
Specifies the predictand variable. In the case of svd, it is an input variable obtained from the input dataset. In the case of svdmod, it is both the name of the output variable and the prefix to the predictand eigenstructure in the svd_file, e.g. "Y" in the description above.
There is no default.
Specifies the dimension along which to extract the separate instances of the predictor and predictand variables.
There is no default.
In the case of svd, specifies the number of modes to retain and write to the output dataset. In the case of svdmod, specifies the number of modes to use in computing the predictand variable.
Valid range is [ > 0 and <= size of svd_dim]. The default is 10.
[OPTIONAL] Specifies the maximum number of iterations to be performed by the svd computation routine. If convergence is not achieved with the number of iterations specified, the program aborts with an error message.
Valid range is [ > 0 ]. The default is 30.
Specifies the name of the svd model dataset used to compute the predictand output dataset based on the input predictor dataset. This must be an output dataset from svd.
There is no default.
This example illustrates the use of svd and svdmod for variables consisting of zonal mean cloud cover given in oktas and surface short-wave given in W/m2. The data have already had their means removed using anomaly. Each of the variables have the common dimension called time with the rest of their dimensions being spatial dimensions (not necessarily in common); in this case each has the same second dimension that represents latitude. Each of these variables has bad/missing values. This is not optimal for such a procedure and bad values should be removed or interpolated beforehand (see smear). The idea of running this procedure is to determine the empirical field-to-field relationships between cloud cover and short-wave. We assume that most of the relationship will be captured in the first few modes so we will save the first six.
% svd test.tdf testsvd.tdf predictor : char( 31) ? cldcov_anom predictand : char( 31) ? shortwave_anom svd_dim : char( 31) ? time modes : int ? [10] 6 Processing test.tdf Number of bad values found are: Predictor 14 Predictand 16 Number per sample : Predictor 1.16667 Predictand 1.33333 Fraction of total data : Predictor 0.0224359 Predictand 0.025641 These bad values set to zero. Mean and variance of Predictor: 3.87056e-10 0.177207 Mean and variance of Predictand: -7.44521e-08 1857.19 ... computing covariance matrix ... Total squared covariance: 319954 ... computing svd ... Squared covariance accounted for by all eigenvalues: 319954 Mode Eigen-Value %-Covariance %-Sum Correlation 1 317423 99.209 99.209 0.88 2 1538.43 0.481 99.690 0.68 3 591.539 0.185 99.875 0.71 4 259.565 0.081 99.956 0.79 5 91.3149 0.029 99.984 0.86 6 32.5776 0.010 99.994 0.80 Mode P'or-var %-Var %-Sum P'and-var %-Var %-Sum 1 0.0865204 48.824 48.824 1720.51 92.641 92.641 2 0.026957 15.212 64.036 45.23 2.435 95.076 3 0.0296356 16.724 80.760 14.1394 0.761 95.837 4 0.0121664 6.866 87.626 12.622 0.680 96.517 5 0.00629871 3.554 91.180 7.19636 0.387 96.905 6 0.00434016 2.449 93.629 4.165 0.224 97.129
The output indicates that essentially all the covariability is captured by the first mode. This mode accounts for about 49% of the cloud cover and about 93% of the short-wave. The temporal expansion coefficients of the first mode for the cloud cover and short-wave have a linear correlation of about 0.88. The contents of the output dataset is:
% contents testsvd.tdf printout : char( 3) ? [no] Contents of File: testsvd.tdf Page 1 Dimension Size Coord Scale Offset mode 6 ? 1 0 line 52 y 1 0 time 12 time 1 0 Attribute Type Units Value cldcov_d_var double (okta)^2 0.177207 shortwave_d_var double (W/m2)^2 1857.19 cldcov_r_var double (okta)^2 0.165918 shortwave_r_var double (W/m2)^2 1803.86 cldcov_p_var double percent 93.6294 shortwave_p_var double percent 97.1289 d_covar double (okta*W/m2)^2 319954 m_covar double (okta*W/m2)^2 319954 r_covar double (okta*W/m2)^2 319936 p_covar double percent 99.9944 history byte Variable Type Units cldcov_vec float okta shortwave_vec float W/m2 cldcov_amp float shortwave_amp float cldcov_var float (okta)^2 shortwave_var float (W/m2)^2 covariance^2 float (okta*W/m2)^2 correlation float Variable Dimension Size cldcov_vec mode 6 cldcov_vec line 52 shortwave_vec mode 6 shortwave_vec line 52 cldcov_amp mode 6 cldcov_amp time 12 shortwave_amp mode 6 shortwave_amp time 12 cldcov_var mode 6 shortwave_var mode 6 covariance^2 mode 6 correlation mode 6 Variable BadValue ValidMin ValidMax Scale Offset cldcov_vec -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 shortwave_vec -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 cldcov_amp -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 shortwave_amp -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 cldcov_var -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 shortwave_var -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 covariance^2 -3.4028e+38 -3.4028e+38 3.4028e+38 1 0 correlation -3.4028e+38 -3.4028e+38 3.4028e+38 1 0
Now suppose we have another dataset that has the cloud cover variable in it from a different time period but we don't have the corresponding short-wave variable for the same time period. We can use svdmod and the dataset derived above to "predict" the short-wave based on the cloud cover. In this example, the new cloud cover is in the variable called testnew.tdf and we are writing the predicted short-wave to the dataset called testmod.tdf. In this case, we will use all 6 of the modes extracted from above to construct the short-wave data.
% svdmod testnew.tdf testmod.tdf svd_file : char( 31) ? testsvd.tdf predictor : char( 31) ? cldcov_anom predictand : char( 31) ? shortwave_anom svd_dim : char( 31) ? time modes : int ? [10] 6
datasets, spectral, eof, eof, eoffilt, eofproj, cca, subset, linfit, emath, nhood, dimavg, laminate, magnify, xcorrel, anomaly.
Memory allocation errors generally mean the two variables' sizes (the sizes not associated with svd_dim) produce a covariance matrix that is too large to compute. In this case the dataspace has to be reduced (see subset or magnify).
References Bretherton et al. 1992: An Intercomparison of methods for finding coupled patterns in climate data. J.Climate, 5, pg 541-560.
Last Update: $Date: 1999/05/10 20:46:32 $