Information Geometric Dimensionality Reduction (IGDR) Toolbox

Kevin M. Carter, Raviv Raich, Alfred O. Hero III

The IGDR toolbox is a suite of matlab code designed to implement to techniques and algorithms developed in:

Kevin M. Carter, “Dimensionality reduction on statistical manifolds,” Ph.D. thesis, University of Michigan, January 2009.

Matlab Code

Download (.zip)

Copyright notice

Matlab scripts for an information-geometric approach to dimensionality reduction. The details of the algorithms can be found in:

K. M. Carter,

“Dimensionality reduction on statistical manifolds,”

 Ph.D. Thesis, University of Michigan, January, 2009.
* K. M. Carter, R. Raich, W. G. Finn and A. O. Hero.
 "Information preserving component analysis: data projections for flow cytometry analysis,"
 //IEEE Journal of Selected Topics in Signal Processing: Special Issue on Digital Image Processing Techniques for Oncology//, vol. 3, no. 1, Feb. 2009.
* K. M. Carter, R. Raich, and A. O. Hero.
 "An information geometric approach to supervised dimensionality reduction,"
 to appear in //IEEE Inter. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)//, April, 2009.
* K. M. Carter, R. Raich, W. G. Finn and A. O. Hero.
 "Fine: Fisher information nonparametric embedding,"
 in review for //IEEE Transactions on Pattern Recognition and Machine Learning//.
* K. M. Carter, R. Raich, and A. O. Hero.
 "Fine: Information embedding for document classification,"
 in //Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing//, pages 1861-1864, April 2008.

Published reports of research using the code provided here (or a modified version) should cite the applicable articles referenced above.

Comments and questions are welcome. We would also appreciate hearing about how you used this code, improvements made to it, etc. You are free to modify the code, as long as you reference the original contributors.

Usage

The purpose of this code is to find information-geometric methods of dimensionality reduction, using the properties of statistical manifolds. The setup is the same for all methods: Several multi-dimensional, large sample size data sets that are related in some fashion. The only requirement is that the dimensionality is the same for each set, and that each variable is the same in each set (ie variable 1 for set i is the same as variable 1 for set j). Each set is stored into the structure Y. For example:

for i=1:50
  Y{i}=randn(5,100);
end

From this structure, one may use FINE to embed each pdf (estimated from the set) into a single low-dimensional space with:

X=fine(Y,options);

or use IPCA to project each data set down individually into the same common space with

[A,J]=ipca(Y,options);

Details for inputs and outputs are available in the files ipca.m and fine.m.

List of Matlab Files

fine.m – Fisher Information Nonparametric Embedding code
ipca.m – Information Preserving Component Analysis code
fine_demo.m – Script demonstrating usage of FINE
ipca_demo.m – Script demonstrating usage of IPCA
load_data.m – Loads data (from directory structure) into a format for usage with IPCA and FINE
calc_weights.m – Function calculating weights for weighted IPCA, based on a heat kernel on information distances.
cgrscho.m – Classical Gram-Schmidt algorithm
div_calc.m – Approximates information divergence between 2 data sets (estimates PDFs internally)
div_mat.m – Calculates divergence matrix between a collection of data sets (calls div_calc.m)
div_grad.m – Calculates the gradient of the information divergence matrix wrt a projection matrix
ksizeMSP.m – Maximal smoothing principle calculation of kernel bandwidths
lda.m – Linear Discriminant Analysis
makeadj.m – Creates adjacency matrix for use with calc_weights.m and makegeo.m
makegeo.m – Function to compute the geodesic distance approximation from Euclidean distances.
dijkstra – Mex file used in makegeo.m. I have included the .mexglx, .mexw64 and .dll files, as well as the .cpp (modified from Tenenbaum and the ISOMAP code) file if you need to mex it yourself.

Tips

The load_data.m script will load all files from a single directory into a single structure. If you have multiple classes that you wish to analyze (flow cytometry for example), we suggest running the script once, storing the output structure Y as a separate name (say Y1), then running the script again on the directory containing the new class data sets (naming Y as Y2). Then join the classes together as Y=[Y1 Y2], being sure to keep tabs on which sets belong to which class.

Comments and Remarks

This code was tested on Windows XP and Linux systems, using Matlab 7 R2006a and Matlab 7 R2008a.