Kevin M. Carter, Raviv Raich, Alfred O. Hero III
The IGDR toolbox is a suite of matlab code designed to implement to techniques and algorithms developed in:

Kevin M. Carter, “Dimensionality reduction on statistical manifolds,” Ph.D. thesis, University of Michigan, January 2009.
Matlab Code
Download (.zip)
Copyright notice
Matlab scripts for an informationgeometric approach to dimensionality reduction. The details of the algorithms can be found in:

K. M. Carter,
“Dimensionality reduction on statistical manifolds,”
Ph.D. Thesis, University of Michigan, January, 2009. * K. M. Carter, R. Raich, W. G. Finn and A. O. Hero. "Information preserving component analysis: data projections for flow cytometry analysis," //IEEE Journal of Selected Topics in Signal Processing: Special Issue on Digital Image Processing Techniques for Oncology//, vol. 3, no. 1, Feb. 2009. * K. M. Carter, R. Raich, and A. O. Hero. "An information geometric approach to supervised dimensionality reduction," to appear in //IEEE Inter. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)//, April, 2009. * K. M. Carter, R. Raich, W. G. Finn and A. O. Hero. "Fine: Fisher information nonparametric embedding," in review for //IEEE Transactions on Pattern Recognition and Machine Learning//. * K. M. Carter, R. Raich, and A. O. Hero. "Fine: Information embedding for document classification," in //Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing//, pages 18611864, April 2008.
Published reports of research using the code provided here (or a modified version) should cite the applicable articles referenced above.
Comments and questions are welcome. We would also appreciate hearing about how you used this code, improvements made to it, etc. You are free to modify the code, as long as you reference the original contributors.
Usage
The purpose of this code is to find informationgeometric methods of dimensionality reduction, using the properties of statistical manifolds. The setup is the same for all methods: Several multidimensional, large sample size data sets that are related in some fashion. The only requirement is that the dimensionality is the same for each set, and that each variable is the same in each set (ie variable 1 for set i is the same as variable 1 for set j). Each set is stored into the structure Y. For example:
for i=1:50 Y{i}=randn(5,100); end
From this structure, one may use FINE to embed each pdf (estimated from the set) into a single lowdimensional space with:
X=fine(Y,options);
or use IPCA to project each data set down individually into the same common space with
[A,J]=ipca(Y,options);
Details for inputs and outputs are available in the files ipca.m and fine.m.
List of Matlab Files

fine.m – Fisher Information Nonparametric Embedding code

ipca.m – Information Preserving Component Analysis code

fine_demo.m – Script demonstrating usage of FINE

ipca_demo.m – Script demonstrating usage of IPCA

load_data.m – Loads data (from directory structure) into a format for usage with IPCA and FINE

calc_weights.m – Function calculating weights for weighted IPCA, based on a heat kernel on information distances.

cgrscho.m – Classical GramSchmidt algorithm

div_calc.m – Approximates information divergence between 2 data sets (estimates PDFs internally)

div_mat.m – Calculates divergence matrix between a collection of data sets (calls div_calc.m)

div_grad.m – Calculates the gradient of the information divergence matrix wrt a projection matrix

ksizeMSP.m – Maximal smoothing principle calculation of kernel bandwidths

lda.m – Linear Discriminant Analysis

makeadj.m – Creates adjacency matrix for use with calc_weights.m and makegeo.m

makegeo.m – Function to compute the geodesic distance approximation from Euclidean distances.

dijkstra – Mex file used in makegeo.m. I have included the .mexglx, .mexw64 and .dll files, as well as the .cpp (modified from Tenenbaum and the ISOMAP code) file if you need to mex it yourself.
Tips
The load_data.m script will load all files from a single directory into a single structure. If you have multiple classes that you wish to analyze (flow cytometry for example), we suggest running the script once, storing the output structure Y as a separate name (say Y1), then running the script again on the directory containing the new class data sets (naming Y as Y2). Then join the classes together as Y=[Y1 Y2], being sure to keep tabs on which sets belong to which class.