Dimensionality reduction on statistical manifolds
Kevin M. Carter, Alfred O. Hero III
This thesis concerns the problem of dimensionality reduction through information geometric methods on statistical manifolds. While there has been considerable work recently presented regarding dimensionality reduction for the purposes of learning tasks such as classification, clustering, and visualization, these methods have focused primarily on Riemannian sub-manifolds in Euclidean space. While sufficient for many applications, there are many high-dimensional signals which have no straightforward and meaningful Euclidean representation. In these cases, signals may be more appropriately represented as a realization of some distribution lying on a statistical manifold, or a manifold of probability density functions (PDFs). These manifolds are often intrinsically lower dimensional than the domain of the data realization.
We begin by first discussing local intrinsic dimension estimation and its applications. There has been much work done on estimating the global dimension of a data set, typically for the purposes of dimensionality reduction. We show that by estimating dimension locally, we are able to extend the uses of dimension estimation to statistical manifolds as well as many applications which are not possible with global dimension estimation. We illustrate independent benefit of dimension estimation on complex problems such as anomaly detection, clustering, and image segmentation.
We then discuss two methods of dimensionality reduction on statistical manifolds. First, we propose a method for statistical manifold reconstruction that utilizes the principals of information geometry and Euclidean manifold learning to embed PDFs into a low-dimensional Euclidean space. This embedding enables comparative analysis of multiple high-dimensional data sets using standard Euclidean methods. Our second algorithm proposes a linear projection method which creates a dimension reduced subspace which preserves the high-dimensional relationships between multiple signals. Defining this information preserving projection contributes to both feature extraction and visualization of high-dimensional data.
Finally, we illustrate these techniques toward their original motivating problem of clinical flow cytometric analysis. These methods of dimensionality reduction approach the problems of diagnosis, visualization, and verification of flow cytometric data in a manner which has not been given significant consideration in the past. The tools we propose are illustrated for several case studies on actual patient data sets.
- Kevin M. Carter, “Dimensionality reduction on statistical manifolds,” Ph.D. thesis, University of Michigan, January 2009. (.pdf)
The following papers have resulted from this work:
- Kevin M. Carter, Raviv Raich, William G. Finn, and Alfred O. Hero, “Information Preserving Component Analysis: Data projections for flow cytometry analysis,” to appear in IEEE Journal on Selected Topics in Signal Processing: Special Issue on Digital Image Processing Techniques for Oncology, vol. 3, no. 1, Feb. 2009. (.pdf)
- William G. Finn, Kevin M. Carter, Raviv Raich, Lloyd M. Stoolman, and Alfred O. Hero, “Analysis of clinical flow cytometric immunophenotyping data by clustering on statistical manifolds: treating flow cytometry data as high-dimensional objects,” Cytometry Part B, vol. 76B, no. 1, Jan, 2009, pp. 1-7. (.pdf)
- Kevin M. Carter, Raviv Raich, and Alfred O. Hero III, “An information geometric approach to supervised dimensionality reduction”, to appear in Proc. Of IEEE Int. Conf. on Acoustics, Speech and Signal Processing, April 2009. (.pdf)
- Kevin M. Carter, Christine Kim, Raviv Raich, and Alfred O. Hero III, “Information preserving embeddings for discrimination”, to appear in Proc. Of IEEE Signal Processing Society DSP Workshop, Jan. 2009. (.pdf)
- Kevin M. Carter, Raviv Raich, William G. Finn, and Alfred O. Hero, “Dimensionality Reduction of Flow Cytometric Data Through Information Preservation,” Proc. of IEEE Machine Learning for Signal Processing Workshop, October 2008. (.pdf)
- Kevin M. Carter and Alfred O. Hero, “Variance reduction with neighborhood smoothing for local intrinsic dimension estimation,” IEEE Intl Conf. on Acoustics, Speech and Signal Processing, April 2008, pp. 3917-3920. (.pdf)
- Kevin M. Carter, Raviv Raich and Alfred O. Hero, “FINE: Information embedding for document classification,” IEEE Intl Conf. on Acoustics, Speech and Signal Processing, April 2008, pp. 1861-1864. (.pdf)
- Kevin M. Carter, Raviv Raich and Alfred O. Hero, “Learning on statistical manifolds for clustering and visualization,” Proc. of 45th Annual Allerton Conference on Communication, Control, and Computing, Sept. 2007. (.pdf)
- Kevin M. Carter, Raviv Raich and Alfred O. Hero, “De-biasing for local dimension estimation,” IEEE Workshop on Stat. Sig. Processing (SSP), 2007, pp. 601-605. (.pdf)
- Kevin M. Carter, Raviv Raich, William G. Finn, and Alfred O. Hero, “FINE: Fisher Information Non-parametric Embedding,” in review for IEEE Transactions on Pattern Analysis and Machine Intelligence.
- Kevin M. Carter, Raviv Raich, and Alfred O. Hero, “On Local Intrinsic Dimension Estimation and Its Applications,” in review for IEEE Transactions on Signal Processing.
The following links contain details on selected papers which have resulted from this work, as well as Matlab code to reproduce the work:
- Information Geometric Dimensionality Reduction (IGDR) Toolbox
- Local Intrinsic Dimension Estimation
- FINE: Information embedding for document classification