You are here

PhD Defense: James Ross


442 Dana

110 Forsyth Street, Boston MA 02115

June 24, 2014 9:00 am to June 25, 2014 11:00 am
June 24, 2014 9:00 am to June 25, 2014 11:00 am

Title: Probability Models and Bayesian Nonparametrics for Subtyping

Chronic Obstructive Pulmonary Disease



Chronic obstructive pulmonary disease (COPD) is a lung disease characterized by airflow limitation usually associated with an inflammatory response to noxious particles, such as cigarette smoke.

COPD is currently the third leading cause of death in the United States and is the only leading cause of death that is increasing in prevalence.

It also represents an enormous financial burden to society, costing tens of billions of dollars annually in the U.S. It is widely accepted by the medical community that COPD is a heterogeneous disease, with substantial evidence indicating that genetic variation contributes to varying levels of disease susceptibility. This heterogeneity makes it difficult to predict health decline and develop targeted treatments for better patient care. Although researchers have made several attempts to discover disease subtypes, results have been inconclusive, in part because standard clustering methods have not properly dealt with disease manifestations that may worsen with increased exposure.

Additionally, most existing features that describe disease severity only do so at a coarse level, further complicating the search for subtle population differences.


In this thesis, we describe several contributions to the COPD subtyping effort. The first is a novel airway generation labeling algorithm based on a Hidden Markov Tree Model (HMTM). This algorithm assigns anatomical labels to distinct airway branches of segmented airway trees derived from computed tomography (CT) scans. This anatomical labeling enables a more refined assessment of airways disease, one of the major components of COPD. The second contribution introduces a transformative way of looking at the COPD subtyping task.

Specifically, we propose a novel clustering with constraints method using a Dirichlet process mixture of Gaussian processes in a variational Bayesian nonparametric framework. We claim that individuals should be grouped according to biological and/or genetic similarity regardless of their level of disease severity; therefore, we introduce a new way of looking at subtyping/clustering by recasting it in terms of discovering associations of individuals to disease trajectories (i.e., grouping individuals based on their similarity in response to environmental and/or disease causing variables). The nonparametric nature of our algorithm allows for learning the unknown number of meaningful trajectories. Additionally, we acknowledge the usefulness of expert guidance by providing for their input using must-link and cannot-link constraints, which are encoded with Markov random fields. Inference is efficiently performed using a variational framework.


Lastly, we describe two alternative models for COPD disease subtyping, both instances of nonparametric overlapping subspace clustering. We will again be motivated by the concept of disease trajectories, but we also posit that individuals can be associated with multiple disease types (latent clusters), which we assume are influenced by genetics.

Furthermore, we predict that only subsets of the numerous disease-related quantitative features are useful for describing each latent subtype. We propose to model these associations using two separate beta process priors and to again use a variational inference approach. In a separate model, we use constrained Indian Buffet Processes specifically designed to leverage longitudinal data.


Tuesday June 24 9:00-11:00  442 Dana


Advisor:  Professor Jennifer Dy

Committee members:

Professor Dana Brooks

Professor Deniz Erdogmus

Carl-Fredrik (C-F) Westin