Why isn the net similarity (objective function) maximized when every data point is an exemplar?
Because, it will usually be the case that the gain in similarity a data point achieves by assigning itself to an existing exemplar is higher than the preference value. If the shared preference value is larger than all similarities, then every data point will become an exemplar. This can happen unwittingly if all similarities are negative (e.g. using negative Euclidean distance) and the preference is set to zero. Lower preference values will lead the algorithm to find better configurations (higher net similarity) with data points assigned to clusters with other exemplars.