Temporal Dynamics of Host Molecular Responses Differentiate Symptomatic and Asymptomatic Influenza A Infection

Y. Huang, A. Zaas, A. Rao, N. Dobigeon, P. Wolfe, T. Veldman, N. Oien, L. Carin, S. Kingsmore, C. Woods, G. S. Ginsburg, A. O. Hero.

Abstract

Exposure to influenza viruses is necessary, but not sufficient, for healthy human hosts to develop symptomatic illness. The host response is an important determinant of disease progression. In order to delineate host molecular responses that differentiate symptomatic and asymptomatic Influenza A infection, we inoculated 17 healthy adults with live influenza (H3N2/Wisconsin) and examined changes in host peripheral blood gene expression at 16 timepoints over 132 hours. Here we present distinct transcriptional dynamics of host responses unique to asymptomatic and symptomatic infections. We show that symptomatic hosts invoke, simultaneously, multiple pattern recognition receptors-mediated antiviral and inflammatory response that may relate to virus-induced oxidative stress. In contrast, asymptomatic subjects tightly regulate these responses and exhibit elevated expression of genes that function in antioxidant responses and cell-mediated responses. We identify biomarkers whose expression patterns discriminate early from late phases of infection and stratify the risk of developing post-infection symptoms. Our results establish a temporal pattern of host molecular responses that differentiates symptomatic from asymptomatic infections and reveals an asymptomatic host-unique non-passive response signature, suggesting novel putative molecular targets both for prognostic assessment and ameliorative therapeutic intervention in seasonal and pandemic influenza. Reference

Reference

Y. Huang, A. Zaas, A. Rao, N. Dobigeon, P. Wolfe, T. Veldman, N. Oien, L. Carin, S. Kingsmore, C. Woods, GS. Ginsburg, AO. Hero. “Temporal dynamics of host molecular responses differentiate symptomatic and asymptomatic influenza A infection,” PLoS Genetics, Aug. 2011.

Code

Disclaimer: The code and data are provided as is. They are tested to be fully functional as of 10/01/2011. However, because our analysis pipeline and libraries are built on many other R packages, you might run into some incompatibility issues if your version of the package is different. We will be happy to provide assistance if necessary. Please feel free to contact us by email.

The following R libraries have to be loaded into R environment in order to run most of the R code to produce the figures. Other standard R libraries should be installed on-the-fly.

ZIP file with R custom libraries

Figure 1

Figure 1A Data
Figure 1B Data Use standard clustering software and treeview for heatmap visualization
Figure 1C Data Code

There are essentially three parts of the Figure 1.

MCA.zip: Matlab code and data for running the code that produces figure 1.A
2009.09.07.mca.symptom.score.xlsx: This a plain Excel file contains the color formatted version of clinical symptom data, as shown in Figure 1.B
Code and data for running logistic boosting algorithm to obtain the genes correlated with the 4 BLU classes. The expression of these genes were shown in the heatmap (Figure 1C).

To run the boosting code, make sure you have loaded all the necessary libraries in the library folder. Also, you will need mboost package and its related R packages.

The pairwise boosting is done separately – primarily because R cannot handle 6 pairwise boosting simultaneously. Our original implementation used a smaller set of genes and all the randomization can be run without any problem. But for final results presented in Figure 1C, we chose to run the boosting on all ~12,000 gene candidates, R took too long to complete. Therefore, we recommend running the 6 pairwise boosting separately, as given in the code ‘class.1vs2.r’, ‘class.1vs3.r’ so on so forth. Ideally you would have multiple machines/servers and run each script at the same time (hence you will find that many of the code in these 6 scripts are largely identical).

If you would like to try one of the code snippet, you can set the boosting parameters small enough so that it completes quickly, e.g., n.rand = 5 and n.miter = 10.

Figure 2

Differential GEP: Input data for running EDGE time course analysis. We ran EDGE with 1000 iteration, random seed=178, natural cubic spline fitting, dimension of basis=4, and bootstrap q-value estimation. The list of 5076 genes that we found can be found here Gene list.
Figure 2A Data Code
Figure 2B Data Use standard clustering software and treeview for heatmap visualization
Figure 2C Data Code