DeepGenGrep

Gene regulation in eukaryotes is tightly controlled by various genome signals and regions (GSRs), such as promoter, transcription start site, polyadenylation signals (PAS), splice site, translational initiation site (TIS), etc. Recognition of GSRs in DNA can help us understand gene structure, gene regulations and gene function, and then further improve gene annotations. Many identification methods of GSR-specific based on machine learning models have been proposed to identify the considered GSR. These methods depend strongly on a set of handcrafted features which favour the specific GSR. Thus, some researchers try to develop generalized deep learning-based models which do not need specialized features for recognition of various GSRs. But the performance of these methods needs further be improved. Thus, it is highly desirable to explore a general deep learning-based model to more accurately identify various GSRs. In this work, we propose a general deep learning-based model, termed DeepGenGrep, for improved prediction of various GSRs. DeepGenGrep utilizes a hybrid neural network to extract essential features for recognition of various types of GSRs. Ablation experimental results show that the structure of DeepGenGrep is effective for recognition of GSRs. We further evaluate the performance of DeepGenGrep on the recognition of PAS, TIS and splice site of different species. Results show that DeepGenGrep outperformed the state-of-the-art GSRs-specific predictors and generalizable predictors. Moreover, recognition of GSRs on cross-species show that DeepGenGrep has a stronger ability to transfer knowledge between two species than the state-of-the-art predictors. Performance evaluations on deep transfer learning further show that DeepGenGrep is a good framework for recognition of GSRs.