Order Preserving Factor Analysis

A. Tibau-Puig, A. Wiesel, A. Zaas, C. W. Woods, G. S. Ginsburg, G. Fleury and A. O. Hero

Abstract

We present a novel factor analysis method that can be applied to the discovery of common factors shared among trajectories in multivariate time series data. These factors satisfy a precedence- ordering property: certain factors are recruited only after some other factors are activated. Precedence-ordering arise in applications where variables are activated in a specific order, which is unknown. The proposed method is based on a linear model that accounts for each factor’s inherent delays and relative order. We present an algorithm to fit the model in an unsupervised manner using techniques from convex and nonconvex optimization that enforce sparsity of the factor scores and consistent precedence-order of the factor loadings. We illustrate the order-preserving factor analysis (OPFA) method for the problem of extracting precedence ordered factors from a longitudinal (time course) study of gene expression data.

Papers

  • Tibau-Puig, A., Wiesel, A., Zaas, A.K., Woods, C.W., Ginsburg, G.S., Fleury, G. and Hero, A.O. , “Order-Preserving Factor Analysis—Application to Longitudinal Gene Expression,” (IEEE Xplore) IEEE Transactions on Signal Processing, vol.59, no.9, pp.4447-4458, Sept. 2011
  • Tibau-Puig, A. and Hero, A.O., “Technical Report cspl-396: Order-Preserving Factor Analysis” (.pdf)

Matlab/Octave Package

We provide hereby a Matlab/Octavepackage implementing the OPFA and OPFA-C decompositions described in the paper. This package code has been tested on Matlab 7.10 (R2010a) running on Fedora 13, and it is provided under the conditions specified in the License section below. To use the OPFA package:

  1. Unzip.
  2. The package contains four folders and 7 files in its root folder:
    • ./Data/ : contains the normalized H3N2 gene expression dataset described in the paper.
    • ./Results/ : folder where the results will be output to
    • ./Temp/ : an internal temporary folder to store results in case something fails in the middle of a long execution
    • ./Utilities/ : a folder containing the procedures to fit the OPFA model
    • readme.txt : Instructions to use this package
    • license.txt : license information
    • demo_synthetic_data.m : Demo script that runs OPFA on random synthetic data.
    • OPFAvs*.m : the files used to generate the experimental data presented in the paper

The following files are all a user should need to interact with. The remaining files are helper functions that are called from these files.

  • demo_synthetic_data.m : Demo script that runs OPFA on random synthetic data.
  • Utilities/General/OPFA_Options.m : Creates an Options structure with default options to use the OPFA.m function.
  • Utilities/OPFA.m : Fits an OPFA/OPFA-c model to a cell structure of observed data matrices.
  • Utilities/Plotting/Overview_OPFA_Results.m : shows an overview of the OPFA fit, comparing the observed and the fitted responses and showing the obtained temporal factors and their corresponding scores.

To reproduce the numerical results in the IEEE TSP paper:

(WARNING: some of this scripts are very computationally intensive, as they involve multiple monte carlo runs fitting the OPFA, OPFA-C and SFA models for each run.)

  • OPFAvsSNR.m : Generates Figure 7 in Section IV.A
  • OPFAvsVarDelays.m : Generates Figure 6 in Section IV.A
  • OPFAvsInitialization.m : Generates the data for Table II of Section IV.A
  • OPFA_PHDData.m : Fits the OPFA model to the H3N2 PHD data of Section IV.B

Comments and remarks

This is package is constantly under development. If you find any bugs or errors, you may report them to the first author of the paper.

License

Copyright © 2011, Arnau Tibau-Puig and Alfred O. Hero III, University of Michigan All rights reserved.

Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:

  • Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
  • Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution
  • Neither the name of the University of Michigan nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS “AS IS” AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.