Ali Sharifi Zarchi

Research Associate, Colorado State University, Fort Collins


Using Deep Neural Networks to Understand the Cell Identity by Expression Fingerprints


Understanding the cell identity is a critically important task in many biomedical areas, such as regenerative medicine and cancer research. The expression patterns of some marker genes have been used to assign the cells to a limited number of cell types. The limitations are unknown markers to accurately characterize many cell types, and the expression of markers in more than one cell type. A possible answer is using the whole-genome gene expression profiles (GEPs), but it has been computationally challenging to decide which genes can more accurately characterize the cell identity. Classical machine learning approaches, such as simple classification or clustering algorithms, have been applied for this problem as well as many other biological problems. Many aspects of biology, however, are much more sophisticated than can be modelled accurately using the simple approaches. During the past few years the deep learning methods have provided promising results in learning different patterns in games, images and video, etc. Their application in biology and health, however, has been limited. Here we analyzed a massive number of gene and miRNA expression profiles, measured by both Microarray and Next Generation Sequencing (NGS) platforms, to learn the more sphisticated biological properties of the data. After analyzing different architectures, we identified a specific artchitecture of the deep autoencoders that can compress the whole gene expression profiles into a small gene expression fingerprint (GEF) consisting of as few as 30 numeric values, that can reproduce the expression values of tens of thousands of genes with an accuracy comparable to technical replictes of the same experiment. We show that the scalars of the GEFs represent different biological pathways or processes, which are learned in an unsupervised approach. Furthermore, the cell identity can be inferred from the GEFs at very high accuracy, comparable to the state-of-the-art tools that work on the whole GEP.


Ali is a gold medal winner of the International Olympiad in Informatics (IOI), a graduate of B.Sc. and M.Sc. of Computer Engineering from Sharif Univ. of Tech. and Ph.D. of Bioinformatics from University of Tehran, a prior research fellow at Max Planck Institute for Molecular Biomedicine, and a prior Postdoc of Bioinformatics at Chitsaz Lab, Colorado State University. He is currently an associate of Bioinformatics in Colorado State University, an invited instructor at Computer Engineering department of Sharif Univ. of Tech., the head of Bioinformatics lab in Royan Institute for Stem Cell Biology and Technology, and most importantly a husband and father of two.


Download presentation from here.


Watch video from here.