Datasets used to train/test DeepMito

The SM424-18 dataset

This dataset derives from UniprotKB/SwissProt release 2018_02 and it is the main dataset used to train DeepMito. It contains 424 non-redundant protein sequences endowed with sub-mitochondrial experimental subcellular localization. In particular, the SM424-18 dataset comprises: 74 outer membrane, 190 inner membrane, 25 intermembrane space and 135 matrix proteins.

The SubMitoPred dataset

This dataset comprising 570 protein sequences has been generated by SubMitoPred authors and described in the following publication:

Kumar et al. (2018) Proteome-wide prediction and annotation of mitochondrial ad sub-mitochondrial proteins by incorporating domain information. Mitochondrion, 42, 11-22.

The Human Cell Atlas mitochondrial dataset

This dataset comprises 1050 human mitochondrial proteins extracted from the Human Cell Atlas database.