menu MENU

Information Geometric Dimensionality Reduction (IGDR) Toolbox

Kevin M. CarterRaviv RaichAlfred O. Hero III

The IGDR toolbox is a suite of matlab code designed to implement to techniques and algorithms developed in:

Matlab Code

Download (.zip)

Matlab scripts for an information-geometric approach to dimensionality reduction. The details of the algorithms can be found in:

  • K. M. Carter,

“Dimensionality reduction on statistical manifolds,”

 Ph.D. Thesis, University of Michigan, January, 2009.
* K. M. Carter, R. Raich, W. G. Finn and A. O. Hero.
 "Information preserving component analysis: data projections for flow cytometry analysis,"
 //IEEE Journal of Selected Topics in Signal Processing: Special Issue on Digital Image Processing Techniques for Oncology//, vol. 3, no. 1, Feb. 2009.
* K. M. Carter, R. Raich, and A. O. Hero.
 "An information geometric approach to supervised dimensionality reduction,"
 to appear in //IEEE Inter. Conf. on Acoustics, Speech, and Signal Processing (ICASSP)//, April, 2009.
* K. M. Carter, R. Raich, W. G. Finn and A. O. Hero.
 "Fine: Fisher information nonparametric embedding,"
 in review for //IEEE Transactions on Pattern Recognition and Machine Learning//.
* K. M. Carter, R. Raich, and A. O. Hero.
 "Fine: Information embedding for document classification,"
 in //Proc. IEEE Intl. Conf. on Acoustics, Speech and Signal Processing//, pages 1861-1864, April 2008.

Published reports of research using the code provided here (or a modified version) should cite the applicable articles referenced above.

Comments and questions are welcome. We would also appreciate hearing about how you used this code, improvements made to it, etc. You are free to modify the code, as long as you reference the original contributors.

Usage

The purpose of this code is to find information-geometric methods of dimensionality reduction, using the properties of statistical manifolds. The setup is the same for all methods: Several multi-dimensional, large sample size data sets that are related in some fashion. The only requirement is that the dimensionality is the same for each set, and that each variable is the same in each set (ie variable 1 for set i is the same as variable 1 for set j). Each set is stored into the structure Y. For example:

for i=1:50
  Y{i}=randn(5,100);
end

From this structure, one may use FINE to embed each pdf (estimated from the set) into a single low-dimensional space with:

X=fine(Y,options);

or use IPCA to project each data set down individually into the same common space with

[A,J]=ipca(Y,options);

Details for inputs and outputs are available in the files ipca.m and fine.m.

List of Matlab Files

  • fine.m – Fisher Information Nonparametric Embedding code
  • ipca.m – Information Preserving Component Analysis code
  • fine_demo.m – Script demonstrating usage of FINE
  • ipca_demo.m – Script demonstrating usage of IPCA
  • load_data.m – Loads data (from directory structure) into a format for usage with IPCA and FINE
  • calc_weights.m – Function calculating weights for weighted IPCA, based on a heat kernel on information distances.
  • cgrscho.m – Classical Gram-Schmidt algorithm
  • div_calc.m – Approximates information divergence between 2 data sets (estimates PDFs internally)
  • div_mat.m – Calculates divergence matrix between a collection of data sets (calls div_calc.m)
  • div_grad.m – Calculates the gradient of the information divergence matrix wrt a projection matrix
  • ksizeMSP.m – Maximal smoothing principle calculation of kernel bandwidths
  • lda.m – Linear Discriminant Analysis
  • makeadj.m – Creates adjacency matrix for use with calc_weights.m and makegeo.m
  • makegeo.m – Function to compute the geodesic distance approximation from Euclidean distances.
  • dijkstra – Mex file used in makegeo.m. I have included the .mexglx, .mexw64 and .dll files, as well as the .cpp (modified from Tenenbaum and the ISOMAP code) file if you need to mex it yourself.

Tips

The load_data.m script will load all files from a single directory into a single structure. If you have multiple classes that you wish to analyze (flow cytometry for example), we suggest running the script once, storing the output structure Y as a separate name (say Y1), then running the script again on the directory containing the new class data sets (naming Y as Y2). Then join the classes together as Y=[Y1 Y2], being sure to keep tabs on which sets belong to which class.

Comments and Remarks

This code was tested on Windows XP and Linux systems, using Matlab 7 R2006a and Matlab 7 R2008a.