日前,国际生物信息学领域学术期刊《Bioinformatics》以“TSPTFBS: a docker image for Trans-Species Prediction of Transcription Factor Binding Sites in plants”为题在线发表了我校bat365正版唯一官网生物统计团队胡学海教授课题组的最新研究成果,文章报道了一款针对植物转录因子结合位点预测的工具及其docker镜像。
最后当使用迁移学习技术尝试从计算的途径解决目前植物TFBS研究问题的困难时,作者发现在不同的植物种类中,迁移学习的表现具有很大的不同。在水稻的十个TF中的三个都取得了比较好的预测效果,BZIP23 、ERF48和MADS29的 PPV(Positive predictive value)分别为0.752、0.951和0.816。而当迁移到玉米和大豆中时,预测效果均不甚理想。这表明迁移学习在植物的跨物种转录因子结合位点预测问题上具有一定的可行性,但是未来我们仍需设计更加有效的迁移学习策略。
Motivation: Both the lack or limitation of experimental data of transcription factor binding sites (TFBS) in plants and the independent evolutions of plant TFs make computational approaches for identifying plant TFBSs lagging behind the relevant human researches. Observing that TFs are highly conserved among plant species, here we first employ the deep convolutional neural network (DeepCNN) to build 265 Arabidopsis TFBS prediction models based on available DAP-seq (DNA affinity purification sequencing) datasets, and then transfer them into homologous TFs in other plants.
Results: DeepCNN not only achieves greater successes on Arabidopsis TFBS predictions when compared with gkm-SVM and MEME, but also has learned its known motif for most Arabidopsis TFs as well as cooperative TF motifs with PPI (protein-protein-interaction) evidences as its biological interpretability. Under the idea of transfer learning, trans-species prediction performances on ten TFs of other three plants of Oryza sativa, Zea mays and Glycine max demonstrate the feasibility of current strategy.
Availability and implementation: The trained 265 Arabidopsis TFBS prediction models were packaged in a Docker image named TSPTFBS, which is freely available on DockerHub at https://hub.docker.com/r/vanadiummm/tsptfbs. Source code and documentation are available on GitHub at: https://github.com/liulifenyf/TSPTFBS.
Contact: huxuehai@mail.hzau.edu.cn