University of Pittsburgh

Dissertation Defense: Knowledge discovery with Bayesian Rule Learning methods for actionable

PhD Candidate
Date: 
Tuesday, August 27, 2019 - 9:00am - 11:00am

Abstract: Discovery of precise biomarkers are crucial for improved clinical diagnostic, prognostic, and therapeutic decision-making. They help improve our understanding of the underlying physiological (and pathophysiological processes) within an individual. To discover precise biomarkers, we must take a personalized medical approach that accounts for an individual's unique clinical, genetic, omic, and environmental information. The molecular-level omic information provides an opportunity to understand complex physiological processes at an unprecedented resolution. The reducing costs and improvements in high-throughput technologies, which collect omic data from an individual, has now made it feasible to include a person's omic information as a standard component to their medical record. This information can only be clinically actionable if it is understandable to a clinician and applicable in the correct medical context. Biomarker discovery from omic data is challenging because they are— 1) high-dimensional, which increases the chance of false positive discoveries from traditional data mining methods; 2) most diseases are multifactorial, where many factors influence the disease outcome, making it challenging to be modeled by most data mining algorithms while keeping it interpretable to a clinician; and 3) traditional data mining methods discover only statistically significant biomarkers but do not account for clinical relevance, therefore they do not translate well in clinical practice.

In this dissertation, I formulate the problem of learning both statistically significant and clinically relevant biomarkers as a knowledge discovery problem. In computer science, knowledge discovery in databases is "a non-trivial process of the extraction of valid, novel, potentially useful, and ultimately understandable patterns in data". Clinical practice guidelines in decision support systems are often presented as explicit propositional logic rules because they are easy for a clinician to understand and are often actionable instructions themselves. Bayesian rule learning (BRL) is a rule-learning classifier that learns patterns as a set of probabilistic classification rules. I develop BRL to efficiently learn from high-dimensional data and obtain a robust set of rules by identifying context-specific independencies in the data. To help model multifactorial diseases, I study various ensemble methods with BRL, collectively called Ensemble Bayesian Rule Learning (EBRL). I also develop a novel ensemble model visualization method called Bayesian Rule Ensemble Visualization tool (BREVity) to make EBRL more human-readable for a researcher or a clinician. I develop BRL with informative priors (BRLp) to enable BRL to incorporate prior domain knowledge into the model learning process, thereby further reducing the chance of discovering false positives. Finally, I develop BRL for knowledge discovery (BRL-KD) that can incorporate a clinical utility function to learn models that are clinically more relevant. Collectively, I use these BRL methods, developed for the task of biomarker discovery, as the knowledge engine of an intelligent clinical decision support system called Bayesian Rules for Actionable Informed Decisions or BRAID, a concept framework that can be deployed in clinical practice.

Copyright 2009–2020 | Send feedback about this site