The Division of Biostatistics at the Department of Preventive Medicine invites you to attend the following seminar.
Time: Monday, October 30, 2:00 PM-3:00 PM CDT
Location: 4th Floor Conference Room 400 in the Doctors Office Building at 66 N. Pauline Street, Memphis, TN 38105.
Please park in the multi-story parking garage adjacent to the Doctors Office Building, and bring your parking ticket with you so we can give validate it.
ZOOM Virtual Room Connection: Register in advance for this meeting
Seminar Website: https://www.eventcreate.com/e/biostatisticsseminar
Speaker Bio: https://kiphandwerker.github.io/
A Clustering Approach to Non-Equal Length Joint Pattern Genetic and Epigenetics Factors Weighted by Covariates
Joseph (Kip) Handwerker, M.S.
Department of Prevention Medicine, University of Tennessee Health Science Center
School of Public Health, University of Memphis
Clustering analysis is a popular approach to gaining insight into the structure of data, especially on a large scale. Some of the most popular approaches are the K-means and K-prototype algorithms which are partitioning methods that use distance measures to assign groups. While these methods are good, especially for large datasets, when it comes to genetics data they fail to consider potential joint effects and require the same dimensionality across variables. The Vector in Partition (VIP) algorithm fills this gap with a distance measure designed to partition genetic and epigenetic data with non-equal length dimensions; specifically, gene expression (GE), DNA methylation (CPG), and single nucleotide polymorphisms (SNP). The VIP extension method extends this framework by adding another layer of complex joint effects of genetic and epi-genetic data with other potential health-related variables to dictate clustering. The extension algorithm performs well on simulated data when the clustering of the covariates follows the same clustering scheme of the genetics data. Like other distance measures, when the data does not follow a clear clustering scheme the algorithm tends to underperform, especially against numeric data. The results highlight many aspects of the algorithm’s performance capabilities, as well as multiple areas for future improvements.