Why do the sizes of Ensembl and IPI data sets differ so much?
IPI is built in order to provide maximum coverage of the major publicly available protein (and gene) databases, yet also to minimize the redundancy of such this large body of data (more than 200,000 source database entries are reduced to 56000 entries in IPI human v3.12). This is done by merging data from different data source entries into a single IPI entry when there is evidence that these source entries represent the same protein (i.e. a particular gene product).