% Use matrix and vector algebra formulation to solve a least % squares fitting problem % Make a simple data set that is a function of one dimension, t more off echo on % The independent coordinate, t, will have N elements N = input('Enter the number of elements in the time series (N): '); % N = 1000; t = (0:N)/N*2*pi; % it goes from 0 to 2*pi % squash it into a columm vector t = t(:); % make up some data a = 0.2; b = 0.5; c = 0.1; d = 0.05; y = a + b*cos(t) + c*randn(size(t)); % The data is a simple harmonic function of period 2*pi. Some % normally distributed random noise has been added. % Plot the data plot(t,y,'rd') figure(gcf) disp('Paused...');pause % Now build the design matrix for a model function to be fitted that % includes a constant, linear an quadratic terms in t and, and sin(t) % and cos(t) terms (as if we expected a period 2*pi variability but % the phase is unknown) % % ie. the model is: % % yfit = a1 + a2*t *a3*t^2 + a4*cos(t) + a5*sin(t) disp(' ') disp('E = [ones(size(t)) t t.^2 cos(t) sin(t)];') disp(' ') E = [ones(size(t)) t t.^2 cos(t) sin(t)]; % The solution via the normal equations is: % the vector x that minimizes the misfit n between model and data: % Ex + n = y; min J = n'n w.r.t. x x = inv(E'*E)*E'*y A = inv(E'*E)*E'; % (Note that if we change the data we can repeat the fit with the same % normal equations provided we don't change the model). % e.g. % y = a + 0.6*t.^2 + b*cos(t) + c*randn(size(t)); % x = A*y % We should have recovered values close to those we used to generate the % data: disp(' ') disp([' x True']) disp(num2str([x [a 0 0 b 0]'])) % Plot the fit yfit = x(1) + x(2)*t + x(3)*t.^2 + x(4)*cos(t) + x(5)*sin(t); plot(t,y,'rd',t,yfit) figure(gcf) disp('Paused...');pause % Now let's look at the residuals n = yfit-y; % The variance of the residuals should be close to the value 'c' % used to add random noise sigma = std(n); disp(['Variance of residuals is std(n) = ' num2str(sigma,2)]) disp(['Random noise added to data was ' num2str(c,2)]) disp('Paused...');pause % Plot a histogram of the residuals bins = c*[-4:0.2:4]; hist(n,bins) figure(gcf) disp('Paused...');pause % They do indeed look approximately normally distributed with % standard deviation of 'c', but one would need to do a formal test % to be sure. % But we can plot the pdf for a normal distribution with std dev c over the % histogram to see. The magnitude has to be scaled up by the area under the % histogram (the area under the normal curve is 1). o = hist(n,bins); area = sum(o)*diff(bins([1 2])); f = [-1:0.02:1]*0.5*diff(get(gca,'xlim')); normal = 1/(c*sqrt(2*pi))*exp(-0.5*(f/c).^2); hold on plot(f,area*normal,'r') hold off disp('Paused...');pause % The uncertainty in the fitted parameters is: P = sigma^2*inv(E'*E) % The off diagonal terms are small, indicating that the coefficients are % fit to the data largely independently of each other. We % expect this since the terms in the model are quite different % and it would be unlikely for variability to project onto more than % one term. If some of the terms being fitted in the model were % similar, for example harmonic terms with close frequencies (e.g. % diurnal 12 hours and lunar semidiurnal 12.42 hours tide % constituents) there might be greater covariance in parameter values. % The error bars in the individual coefficients are p = sqrt(diag(P)); % i.e. the coefficients are +/- these values disp('The middle column is the input parameters [a 0 0 b 0] and we expect ') disp('them to be bracketed by the first and last columns most of the time') param = [a 0 0 b 0]'; bounds = [x param x] + p*[-2 0 2]