% Use matrix and vector algebra formulation to solve a least % squares fitting problem % Make a simple data set that is a function of one dimension, t more off echo on % The independent coordinate, t, will have N elements N = input('Enter the number of elements in the time series (N): '); % N = 1000; t = (0:N)/N*2*pi; % it goes from 0 to 2*pi % squash it inot a columm vector t = t(:); % make up some data a = 0.2; b = 0.5; c = 0.3; y = a + b*cos(t) + c*randn(size(t)); % The data is a simple harmonic function of period 2*pi. Some % normally distributed random noise has been added. % Plot the data plot(t,y,'rd') figure(gcf) disp('Paused...');pause % Now build the design matrix for a model function to be fitted that % includes a constant, linear an quadratic terms in t and, and sin(t) % and cos(t) terms (as if we expected a period 2*pi variability but % the phase is unknown) % % ie. the model is: % % yfit = a1 + a2*t *a3*t^2 + a4*cos(t) + a5*sin(t) E = [ones(size(t)) t t.^2 cos(t) sin(t)]; % The solution via the normal equations is: % the vector x that minimizes the misfit n between model and data: % Ex + n = y; min J = n'n w.r.t. x x = inv(E'*E)*E'*y % We have recovered values close to those we used to generate the % data % Plot the fit yfit = x(1) + x(2)*t + x(3)*t.^2 + x(4)*cos(t) + x(5)*sin(t); plot(t,y,'rd',t,yfit) figure(gcf) disp('Paused...');pause % Now let's look at the residuals n = yfit-y; % The variance of the residuals should be close to the value 'c' % used to add random noise sigma = std(n) % Plot a histogrtam of the residuals hist(n,-1.05:.1:1.05) figure(gcf) disp('Paused...');pause % They do indeed look approximately normally distributed with % standard deviation of 'c', but one would need to do a formal test % to be sure. % The uncertainty in the fitted parameters is: P = sigma^2*inv(E'*E) % The off diagonal terms are small, indicating that there is little % the fit to the coefficients is largely independent. We would % expect this since the temrs in the model are quite different % and it would be unlikely for variability to project onto more than % one term. If some of the terms being fitted in the model were % similar, for example harmonic terms with close frequencies (e.g. % diurnal 12 hours and lunar semidiurnal 12.42 hours tide % constituents). % The error ars in the individual coefficients is p = sqrt(diag(P)); % i.e. the coefficients are +/- these values disp('The middle column is the input parameters [a 0 0 b 0] and we expect ') disp('them to be bracketed by the first and last columns most of the time') param = [a 0 0 b 0]'; bounds = [x param x] + p*[-2 0 2] disp('Paused...');pause % Instead of solving the normal equations, we can use singular value decomposition % i.e. we will seek a 'solution' x to Ex=y that satisfies the % overdetermined system (M > N) in a least squares sense. % The SVD factorization of E is E = U*S*V' and is obtained simply with: [U,S,V] = svd(E,0); % As an exercise, you can verify the orthonormality property of U and V. % We need the inverse of the diagonal matrix S, but don't need to do inv(S) % because we know we can obtain the inverse of a diagonal matrix simply by % taking the reciprocal of the diagonal elements Si = 1./S; % But this leaves us with Inf on the off diagonals. Find these values and % replace them with zeros. Si(find(~isfinite(Si))) = 0; % Now the solution is xs = V*Si*U'*y % Compare: the columns should be the same: [x xs] % Now obtain the same solution simply by the Matlab matrix left divide % operation. For an overdetermined system (M > N), this obtains the % solution that satifies the system equations with the least mean squared % error xmld = E\y; % Compare: the columns should be the same [x xs xmld]