Using Machine Learning to identify microRNA biomarkers for predisposition to Huntington’s Disease
Author(s): Patel K, Sheridan C, Chandrasegaran S, Shanley DP
Huntington’s disease (HD) is an autosomal dominant disease triggered by an expansion of CAG nucleotides in the HTT gene. The CAG expansion correlates with the age of disease onset in HD, however, clinical markers of HD can be seen in patients’ years before clinical symptoms. Thus, it would be of interest to identify molecular biomarkers which indicate predisposition to the development of HD, and as microRNAs (miRNAs) circulate in bio-fluids they would be particularly useful biomarkers. We explored a large HD miRNA-mRNA expression dataset (GSE65776) using bioinformatics and machine learning (ML) techniques. We sought sets of features (mRNAs or miRNAs) to predict HD or WT samples from aged or young mouse cortex samples, and we asked if a set of features could predict predisposition to HD or WT genotypes by training on aged samples and testing using the young samples. Several models were created, and the best performing models were further analysed using AUC curves and PCA plots. Finally, genes used to train our miRNA-based predisposition model were mined from HD patient bio-fluid samples. We generated several excellent age-based models with testing accuracies >80% and AUC scores >90%. Our mRNA-based predisposition model performed well (>80% test accuracy) while using two novel predicted protein coding genes (Gm5067, Gm6089) as features. Also, our miRNA-based predisposition model preformed decently (>70% test accuracy) when trained many miRNAs, including six which were differentially expressed (<0.05 p.value) homologues of miRNAs detected from HD patient blood samples (mmumiR- 154-5p, mmu-miR-181a-5p, mmu-miR-212-3p, mmu-miR-378b, mmu-miR-382-5p and mmu-miR-770-5p).