The protein-coding genes were predicted with Tiberius in ab initio mode. The soft-masked genome was input only. The command was:
tiberius.py --genome genome.fa --out tiberius.gtf
Table with predicted coordinates, protein sequences and coding sequences of all mammals.
Download code and see accuracy statistics on the Tiberius GitHub page.
Tiberius is a deep learning model that combines a HMM layer with other sequence-to-sequence models (convolutional neural networks, LSTM).
Tiberius was trained on 32 mammalian genomes that did not include any Hominidae (see supplements of below preprint).
Questions should be directed to Lars Gabriel or Mario Stanke.