The positive dataset, used to construct the AMP classifier, were collected from various of AMP databases, including APD3, ADAM, ParaPep, AVPdb, CancerPPD, MLACP, AntiCP, AntiFP, and DRAMP, and a total number of 6,766 sequences were collected. On the other hands, the non-AMP formed the negative dataset and was obtained from AmPEP4 that collected from UniProt28 with 5-255 amino acid residues long and then filtered sequences with unnatural amino acid B, J, O, U, X, Z. After reducing the homology bias and redundancy, the training set contained 1,686 AMP sequences and testing set included 723 ones.
Training set | Testing set | ||
---|---|---|---|
Positive | Negative | Positive | Negative |
AMP | AMP | AMP | AMP |
The construction of class-specific classifiers dataset consists of the aforementioned AMP databases. Note that once a sequence has the activity that we now concern, then this sequence belongs to the positive set, otherwise, to negative set.