grouppred: A novel machine learning approach for gene module identification and prediction via a co-expression network of single-cell sequencing data
Gene co-expression network analysis is widely used in microarray and RNA sequencing data analysis. It groups genes based on their co-expression network. And genes within a group infer similarity in function or coregulation in the pathway.
In literature, the approaches to group genes are mainly unsupervised, which may introduce instability and variation across different datasets. Inspired by ensemble learning, we propose a novel approach that ensemble supervised and unsupervised learning techniques and simultaneously works on two tasks, gene module identification and phenotype prediction, during the data analysis process. The identified gene modules from this approach could suggest more candidate genes to the original pathway, and those genes are potential biomarkers for pathway-related diseases. In addition, the novel approach also improves the prediction accuracy for phenotypes.
The algorithm can be used as a general prediction algorithm. And, as it is specially designed to handle large samples, it is suitable for handling single-cell data with many cells. We showcased the use of the algorithm in single-cell cell-type auto-annotation.
In literature, the approaches to group genes are mainly unsupervised, which may introduce instability and variation across different datasets. Inspired by ensemble learning, we propose a novel approach that ensemble supervised and unsupervised learning techniques and simultaneously works on two tasks, gene module identification and phenotype prediction, during the data analysis process. The identified gene modules from this approach could suggest more candidate genes to the original pathway, and those genes are potential biomarkers for pathway-related diseases. In addition, the novel approach also improves the prediction accuracy for phenotypes.
The algorithm can be used as a general prediction algorithm. And, as it is specially designed to handle large samples, it is suitable for handling single-cell data with many cells. We showcased the use of the algorithm in single-cell cell-type auto-annotation.
Date and Time
-
Langue de la présentation orale
Anglais
Langue des supports visuels
Anglais