Skip to main content

Supporting data and materials for "The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches".

Dataset type: Software, Proteomic
Data released on September 15, 2015

Khan IK; Wei Q; Chapman S; KC DB; Kihara D (2015): Supporting data and materials for "The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches". GigaScience Database. https://doi.org/10.5524/100161

DOI10.5524/100161

Functional annotation of novel proteins is one of the central problems in bioinformatics. With the ever-increasing development of genome sequencing technologies, more and more sequence information is becoming available to analyze and annotate. To achieve fast and automatic function annotation, many computational (automated) function prediction (AFP) methods have been developed. To objectively evaluate the performance of such methods on a large scale, community-wide assessment experiments have been conducted. The second round of the Critical Assessment of Function Annotation (CAFA) experiment was held in 2013-2014. Evaluation of participating groups was reported in a special interest group meeting at the Intelligent Systems in Molecular Biology (ISMB) conference in Boston in 2014. Our group participated in both CAFA1 and CAFA2 using multiple, in-house AFP methods. Here, we report benchmark results of our methods obtained in the course of preparation for CAFA2 prior to submitting function predictions for CAFA2 targets.
For CAFA2, we updated the annotation databases used by our methods, protein function prediction (PFP) and extended similarity group (ESG), and benchmarked their function prediction performances using the original (older) and updated databases. Performance evaluation for PFP with different settings and ESG are discussed. We also developed two ensemble methods that combine function predictions from six independent, sequence-based AFP methods. We further analyzed the performances of our prediction methods by enriching the predictions with prior distribution of gene ontology (GO) terms. Examples of predictions by the ensemble methods are discussed.

Additional details

Read the peer-reviewed publication(s):

  • Khan, I. K., Wei, Q., Chapman, S., KC, D. B., & Kihara, D. (2015). The PFP and ESG protein function prediction methods in 2014: effect of database updates and ensemble approaches. GigaScience, 4(1). https://doi.org/10.1186/s13742-015-0083-4

Additional information:

http://kiharalab.org/web/pfp.php

http://kiharalab.org/web/esg.php

http://pfam.xfam.org/

http://bioinf.cs.ucl.ac.uk/psipred/?ffpred=1

http://blast.ncbi.nlm.nih.gov/Blast.cgi?CMD=Web&PAGE=Proteins&PROGRAM=blastp&RUN_PSIBLAST=on

http://toolkit.tuebingen.mpg.de/hhblits

Click on a table column to sort the results.

Table Settings

File Name Description Sample ID Data Type File Format Size Release Date File Attributes Download
readme file describing the files included as the supplemetary material Readme TEXT 1.59 kB 2015-08-26 MD5 checksum: 192e3706373c52878581aab6de51839e
File with true annotations of the benchmark dataset (2055 proteins) Annotation TEXT 139.73 kB 2015-08-26 MD5 checksum: 005d5e5a61f610384222bb758afeaa7d
File with predicted annotations of the benchmark dataset by the PFP function prediction method Annotation TEXT 7.10 MB 2015-08-26 MD5 checksum: fde3afae9134c770a04121288ba31134
File with predicted annotations of the benchmark dataset by the ESG function prediction method Annotation TEXT 580.25 kB 2015-08-26 MD5 checksum: 225a2bf4ab0dcfcfe36ed5e42c724449
File with predicted annotations of the benchmark dataset by the ensemble method CONS Annotation TEXT 15.84 MB 2015-08-26 MD5 checksum: fa4c5daa71a4b3c2149b827d9e757b89
File with predicted annotations of the benchmark dataset by the ensemble method FPM Annotation TEXT 361.43 kB 2015-08-26 MD5 checksum: d9bd1d410e80912dd485ec612f4e91bc
Contains AA sequence for 2055 benchmark proteins Protein sequence FASTA 824.52 kB 2015-08-26 MD5 checksum: f32ef7877b1b76ab845ed4d038ad96dd
Funding body Awardee Award ID Comments
National Institute of General Medical Sciences R01GM097528
National Science Foundation IIS1319551
National Science Foundation DBI1262189
National Science Foundation DBI-0939454 Dukka B. KC
National Science Foundation IOS1127027
National Research Foundation of Korea NRF-2011-220-C00004
Date Action
September 24, 2015 File Benchmark_True_Annotation.txt updated
September 15, 2015 Dataset publish
September 15, 2015 Manuscript Link added : 10.1186/s13742-015-0081-6
September 15, 2015 Manuscript Link added : 10.1186/s13742-015-0083-4
September 24, 2015 File CONS_pred_formatted.txt updated
September 24, 2015 File UniRef50_size1500_nonEmpty_annotations.fasta updated