Why are the species-specific files (based on the International Protein Index (IPI)) and multi-species (UniProt) gene association files different?
The UniProtKB-GOA UniProt gene association file contains all manual and electronic annotations that UniProtKB-GOA has assigned to UniProtKB entries. This dataset contains annotations to more than 120,000 different species (http://www.ebi.ac.uk/GOA/uniprot_release.html) and is redundant for electronic annotations where two different electronic methods have assigned the same or less granular GO term. The IPI project (http://www.ebi.ac.uk/IPI/) provides the UniProtKB-GOA group with a minimally redundant yet maximally complete sets of proteins for a number of species and is assembled from protein sequence information taken from UniProtKB, RefSeq, Ensembl, TAIR, H-InvDB and Vega. The UniProtKB-GOA IPI files may contain extra InterPro2GO electronic annotations, as IPI sequences are automatically passed through the InterPro pipeline, and there will be IPI entries not in the UniProtKB database. In addition, while entries in the UniProt Knowledgebase (Swiss-Prot and TrEMBL) representing protein