> MATH DATA SCIENCE SEMINAR
>
>
> SPEAKER: Kai Kang, NIH and MIT
>
> TIME: 2:10 PM-3:10 PM
> ROOM: Ayres 111
>
>
> TITLE: A Bayesian model for dissecting heterogeneous samples using gene expression data
>
> Abstract: Quantifying cell-type proportions and their corresponding gene expression profiles in tissue samples would enhance understanding of the contributions of individual cell types to the physiological states of the tissue. Computational approaches that use expression data from heterogeneous samples are promising, but most of current methods estimate either cell-type proportions or cell-type-specific expression profiles by requiring the other as input. Although such partial deconvolution methods have been successfully applied to tumor samples, the additional input required may be unavailable. We introduce a novel complete deconvolution method, CDSeq, that uses only RNA-seq data from bulk tissue samples to estimate both cell-type proportions and cell-type-specific expression profiles simultaneously. We build a Bayesian model that captures the stochastic nature of RNA-seq data. We used a Gibbs sampler for parameter estimation and developed a strategy to automatically determine the number of cell types present. Using several synthetic and real experimental datasets with known cell-type composition and cell-type-specific expression profiles, we show CDSeq outperformed existing deconvolution methods. Complete deconvolution using CDSeq represents a substantial technical advance over partial deconvolution approaches and will be useful for studying cell mixtures in tissue samples.
|