Yan Cui and Team Are Innovating Artificial Intelligence Approach to Address Biomedical Data Inequality


Yan Cui, PhD, associate professor in the UTHSC Department of Genetics, Genomics, and Informatics, recently received a $1.7 million grant from the National Cancer Institute for a study titled “Algorithm-based prevention and reduction of cancer health disparity arising from data inequality.”

Dr. Cui’s project aims to prevent and reduce health disparities caused by ethnically-biased data in cancer-related genomic and clinical omics studies. His objective is to establish a new machine learning paradigm for use with multiethnic clinical omics data.

For nearly 20 years, scientists have been using genome-wide association studies, known as GWAS, and clinical omics studies to detect the molecular basis of diseases. But statistics show that over 80% percent of data used in GWAS come from people of predominantly European descent.

As artificial intelligence (AI) is increasingly applied to biomedical research and clinical decisions, this European-centric skew is set to exacerbate long-standing disparities in health. With less than 20% of genomic samples coming from people of non-European descent, underrepresented populations are at a severe disadvantage in data-driven, algorithm-based biomedical research and health care.

“Biomedical data-disadvantage has become a significant health risk for the vast majority of the world’s population,” Dr. Cui said. “AI-powered precision medicine is set to be less precise for the data-disadvantaged populations including all the ethnic minority groups in the U.S. We are committed to addressing the health disparities arising from data inequality.”

The project is innovative in the type of machine learning technique it will use. Multiethnic machine learning normally uses mixture learning and independent learning schemes. Dr. Cui’s project will instead be using a transfer learning process.

Transfer learning works much the same way as human learning.  When faced with a new task, instead of starting the learning process from scratch, the algorithm leverages patterns learned from solving a related task. This approach greatly reduces the resources and amount of data required for developing new models.

Using large-scale cancer clinical omics data and genotype-phenotype data, Dr. Cui’s lab will examine how and to what extent transfer learning improves machine learning on data-disadvantaged cohorts. In tandem with this, the team aims to create an open resource system for unbiased multiethnic machine learning to prevent or reduce new health disparities.

Neil Hayes, MD, MPH, assistant dean for Cancer Reesearch in the UTHSC College of Medicine and director of the UTHSC Center for Cancer Research, and Athena Starlard-Davenport, PhD, associate professor in the Department of Genetics, Genomics, and Informatics, are co-Investigators on the grant. Yan Gao, PhD, a postdoctoral scholar working with Dr. Cui, is a machine learning expert in the team. A pilot study for this project, funded by the UT Center for Integrative and Translational Genomics and UTHSC Office of Research, has been published in Nature Communications.