University of Pittsburgh

KARL: Knowledge Augmented Rule Learning for Biological Pattern Discovery

graduate student
Friday, March 22, 2019 - 12:30pm - 1:00pm

Background: Ongoing molecular profiling studies enabled by advances in biomedical technologies are producing vast amounts of `omic' data for early detection, monitoring and prognosis of diverse diseases. A major common limitation is the scarcity of biological samples from which biomarker measurements are made and evaluated using case-control designs, necessitating integrative modeling frameworks that can make optimal use of all available data for any particular disease classification task. Related data sets are often available from different studies within and across laboratories, but may have been generated using different technology platforms. There is thus, a critical need for flexible modeling methods that can handle data from diverse sources to facilitate discovery of robust biological patterns that underlie disease regulatory processes.

Results: This paper develops and evaluates a novel framework called Knowledge Augmented Rule Learning (KARL), that is based on transfer learning of classification rules, and incorporates lookup tables to augment prior knowledge use when learning interpretable predictive models from data. Classification rules facilitate the extraction of robust biological patterns characterized by their statistical evidence, given both knowledge and data. In this work, KARL models are generated on twenty-five publicly available gene expression data sets, five each for five cancers of the brain, breast, colon, lung, and prostate. These are evaluated for completeness and consistency, along with positive or negative impact of knowledge transfer using ten-fold cross-validation measures of Balanced Accuracy (BAcc), which captures the sensitivity-specificity trade-off.

Conclusions: Our results show that knowledge augmented rule learning with KARL produces, on average, rule models that are more robust classifiers than baseline RL without any background knowledge, using 25 publicly available gene expression datasets. Moreover, KARL produces biologically interpretable rule patterns with complementary classification rules, and detects unique and consistent behavior for gene families that are discriminative for the cancer datasets studied herein. Future work would involve extensions to KARL to handle hierarchical knowledge to derive more general hypotheses to drive biomedicine.


Copyright 2009–2021 | Send feedback about this site