suppose there matrix b, size 500*1000 double(here, 500 represents number of observations , 1000 represents number of features).

sigma covariance matrix of b, , d diagonal matrix diagonal elements eigenvalues of sigma. assume a eigenvectors of covariance matrix sigma.

i have following questions:

i need select first k = 800 eigenvectors corresponding eigenvalues largest magnitude rank selected features. final matrix named aq. how can in matlab?
what meaning of these selected eigenvectors?
it seems size of final matrix aq 1000*800 double once calculate aq. time points/observation information of 500 has disappeared. final matrix aq, value 1000 in matrix aq represent now? also, value 800 in matrix aq represent now?

i'm assuming determined eigenvectors eig function. recommend in future use eigs function. not computes eigenvalues , eigenvectors you, compute k largest eigenvalues associated eigenvectors you. may save computational overhead don't have compute of eigenvalues , associated eigenvectors of matrix want subset. supply covariance matrix of data eigs , returns k largest eigenvalues , eigenvectors you.

now, problem, describing principal component analysis. mechanics behind compute covariance matrix of data , find eigenvalues , eigenvectors of computed result. has been known doing way not recommended due numerical instability computing eigenvalues , eigenvectors large matrices. canonical way via singular value decomposition. concretely, columns of v matrix give eigenvectors of covariance matrix, or principal components, , associated eigenvalues square root of singular values produced in diagonals of matrix s.

see informative post on cross validated why preferred:

https://stats.stackexchange.com/questions/79043/why-pca-of-data-by-means-of-svd-of-the-data

i'll throw in link talks theory behind why singular value decomposition used in principal component analysis:

https://stats.stackexchange.com/questions/134282/relationship-between-svd-and-pca-how-to-use-svd-to-perform-pca

now let's answer question 1 @ time.

question #1

matlab generates eigenvalues , corresponding ordering of eigenvectors in such way unsorted. if wish select out largest k eigenvalues , associated eigenvectors given output of eig (800 in example), you'll need sort eigenvalues in descending order, rearrange columns of eigenvector matrix produced eig select out first k values.

i should note using eigs not guarantee sorted order, have explicitly sort these when comes down it.

in matlab, doing described above this:

sigma = cov(b); [a,d] = eig(sigma); vals = diag(d); [~,ind] = sort(abs(vals), 'descend'); asort = a(:,ind);

it's thing note sorting on absolute value of eigenvalues because scaled eigenvalues eigenvalues themselves. these scales include negatives. means if had component eigenvalue was, -10000, indication component has significant meaning data, , if sorted purely on numbers themselves, gets placed near lower ranks.

the first line of code finds covariance matrix of b, though said it's stored in sigma, let's make reproducible. next, find eigenvalues of covariance matrix , associated eigenvectors. take note each column of eigenvector matrix a represents 1 eigenvector. specifically, i^th column / eigenvector of a corresponds i^th eigenvalue seen in d.

however, eigenvalues in diagonal matrix, extract out diagonals diag command, sort them , figure out ordering, rearrange a respect ordering. use second output of sort because tells position of each value in unsorted result appear in sorted result. ordering need rearrange columns of eigenvector matrix a. it's imperative choose 'descend' flag largest eigenvalue , associated eigenvector appear first, talked before.

you can pluck out first k largest vectors , values via:

k = 800; aq = asort(:,1:k);

question #2

it's known fact eigenvectors of covariance matrix equal principal components. concretely, first principal component (i.e. largest eigenvector , associated largest eigenvalue) gives direction of maximum variability in data. each principal component after gives variability of decreasing nature. it's note each principal component orthogonal each other.

here's example wikipedia 2 dimensional data:

i pulled above image wikipedia article on principal component analysis, linked above. scatter plot of samples distributed according bivariate gaussian distribution centred @ (1,3) standard deviation of 3 in (0.878, 0.478) direction , of 1 in orthogonal direction. component standard deviation of 3 first principal component while 1 orthogonal second component. vectors shown eigenvectors of covariance matrix scaled square root of corresponding eigenvalue, , shifted tails @ mean.

now let's question. reason why take @ k largest eigenvalues way of performing dimensionality reduction. essentially, performing data compression take higher dimensional data , project them onto lower dimensional space. more principal components include in projection, more resemble original data. begins taper off @ point, first few principal components allow faithfully reconstruct data part.

a great visual example of performing pca (or svd rather) , data reconstruction found great quora post stumbled upon in past.

http://qr.ae/raeu8a

question #3

you use matrix reproject higher dimensional data onto lower dimensional space. number of rows being 1000 still there, means there 1000 features in dataset. 800 reduced dimensionality of data be. consider matrix transformation original dimensionality of feature (1000) down reduced dimensionality (800).

you use matrix in conjunction reconstructing original data was. concretely, give approximation of original data looked least amount of error. in case, don't need use of principal components (i.e. k largest vectors) , can create approximation of data less information had before.

how reconstruct data simple. let's talk forward , reverse operations first full data. forward operation take original data , reproject instead of lower dimensionality, use all of components. first need have original data mean subtracted:

bm = bsxfun(@minus, b, mean(b,1));

bm produce matrix each feature of every sample mean subtracted. bsxfun allows subtraction of 2 matrices in unequal dimension provided can broadcast dimensions can both match up. specifically, happen in case mean of each column / feature of b computed , temporary replicated matrix produced large b. when subtract original data replicated matrix, effect subtract every data point respective feature means, decentralizing data mean of each feature 0.

once this, operation project simply:

bproject = bm*asort;

the above operation quite simple. doing expressing each sample's feature linear combination of principal components. example, given first sample or first row of decentralized data, first sample's feature in projected domain dot product of row vector pertains entire sample , first principal component column vector.. first sample's second feature in projected domain weighted sum of entire sample , second component. repeat samples , principal components. in effect, reprojecting data respect principal components - orthogonal basis vectors transform data 1 representation another.

a better description of talked can found here. @ amro's answer:

matlab principal component analysis (eigenvalues order)

now go backwards, inverse operation, special property eigenvector matrix if transpose this, inverse. original data back, undo operation above , add means problem:

out = bsxfun(@plus, bproject*asort.', mean(b, 1));

you want original data back, you're solving bm respect previous operation did. however, inverse of asort transpose here. what's happening after perform operation getting original data back, data still decentralized. original data back, must add means of each feature data matrix final result. that's why we're using bsxfun call here can each sample's feature values.

you should able go , forth original domain , projected domain above 2 lines of code. dimensionality reduction (or approximation of original data) comes play reverse operation. need first project data onto bases of principal components (i.e. forward operation), go original domain trying reconstruct data reduced number of principal components, replace asort in above code aq , reduce amount of features you're using in bproject. concretely:

out = bsxfun(@plus, bproject(:,1:k)*aq.', mean(b, 1));

doing bproject(:,1:k) selects out k features in projected domain of data, corresponding k largest eigenvectors. interestingly, if want representation of data regards reduced dimensionality, can use bproject(:,1:k) , that'll enough. however, if want go forward , compute approximation of original data, need compute reverse step. above code had before full dimensionality of data, use aq selecting out k features in bproject. give original data represented k largest eigenvectors / eigenvalues in matrix.

if you'd see awesome example, i'll mimic quora post linked using image. consider doing grayscale image each row "sample" , each column feature. let's take cameraman image that's part of image processing toolbox:

im = imread('camerman.tif'); imshow(im); %// using image processing toolbox

we image:

this 256 x 256 image, means have 256 data points , each point has 256 features. i'm going convert image double precision in computing covariance matrix. i'm going repeat above code, incrementally increasing k @ each go 3, 11, 15, 25, 45, 65 , 125. therefore, each k, introducing more principal components , should start reconstruction of our data.

here's runnable code illustrates point:

%%%%%%%// pre-processing stage clear all; close all;  %// read in image - make sure cast double b = double(imread('cameraman.tif'));  %// calculate covariance matrix sigma = cov(b);  %// find eigenvalues , eigenvectors of covariance matrix [a,d] = eig(sigma); vals = diag(d);  %// sort eigenvalues [~,ind] = sort(abs(vals), 'descend');  %// rearrange eigenvectors asort = a(:,ind);  %// find mean subtracted data bm = bsxfun(@minus, b, mean(b,1));  %// reproject data onto principal components bproject = bm*asort;  %%%%%%%// begin reconstruction logic figure; counter = 1;  k = [3 11 15 25 45 65 125 155]     %// extract out highest k eigenvectors     aq = asort(:,1:k);      %// project onto original domain     out = bsxfun(@plus, bproject(:,1:k)*aq.', mean(b, 1));      %// place projection onto right slot , show image     subplot(4, 2, counter);     counter = counter + 1;     imshow(out,[]);     title(['k = ' num2str(k)]); end

as can see, majority of code same have seen. what's different loop on values of k, project onto original space (i.e. computing approximation) k highest eigenvectors, show image.

we nice figure:

as can see, starting k=3 doesn't favours... can see general structure, wouldn't hurt add more in. start increasing number of components, start clearer picture of original data looks like. @ k=25, can see cameraman looks perfectly, , don't need components 26 , beyond see what's happening. talking regards data compression don't need work on of principal components clear picture of what's going on.

i'd end note referring chris taylor's wonderful exposition on topic of principal components analysis, code, graphs , great explanation boot! got started on pca, quora post solidified knowledge.

matlab - pca analysis , reconstruction of multi dimensional data

Search This Blog

Panthy J

matlab - What does selecting the largest eigenvalues and eigenvectors in the covariance matrix mean in data analysis? -

question #1

question #2

question #3

Comments

Post a Comment

Popular posts from this blog

asp.net - 'System.Web.HttpContext' does not contain a definition for 'GetOwinContext' Mystery -

yii2 - Yii 2 Running a Cron in the basic template -

javascript - jQuery DataTable responsive doesnt work with Boostrap 3 -