University of Pittsburgh

Enabling translational medicine using Bayesian Rule Learning with informative structure priors

graduate student
Friday, March 23, 2018 - 12:30pm - 1:00pm

Translational medicine involves the improvement in clinical practice by harnessing knowledge from the basic sciences. An important task in clinical practice is developing predictive models of clinical outcomes from biomedical datasets. Typical biomedical datasets suffer from the problem of high-dimensionality, where a large number of candidate variables can explain a few observations. Data mining algorithms in such datasets can easily get stuck in a local optima or infer models with spurious variables that are predictive of the outcome by chance. As a result, these algorithms may learn suboptimal or meaningless models. In biomedicine, in addition to the dataset, we often have related domain knowledge from the basic sciences that can help assist in data mining. This knowledge can come from domain literature, an expert, curated knowledge-bases (like ontologies) or datasets from other related studies. Developing intelligent systems that enable integration of these domain knowledge into the model learning process is vital towards a more informed model learning. In addition to meaningful models with good predictive performance, biomedicine benefits from learning comprehensible (human readable) models that can subsequently be verified by a domain expert. Bayesian Rule Learning (BRL) is a data mining method that has been shown to be successful in learning predictive rule models from high-dimensional biomedical datasets, which are comprehensible and have good predictive performance. BRL learns predictive rules from constrained Bayesian networks (BNs) inferred from data by searching the BN model space. In this project, we develop an intelligent system called iBRL that makes use of BRL’s Bayesian framework to implement an approach to incorporate prior domain knowledge using informative structure priors. We demonstrate iBRL on a real-world dataset to identify differentially expressed genes in lung cancer cells, and then evaluate the impact of incorporating prior domain knowledge about the dataset.

Copyright 2009 | Web site by UMC Web Team