What data-mining method to use?
If you are less interested in finding relationships in the data, and more interested in making better and more accurate predictions, it is advisable to use an ensemble. An ensemble is simply a collection of models (trees in RDS), which is used to make a prediction by forming a collective vote from all contained models. The predictive performance of an ensemble is, among other things, affected by the number of contained models, how correct each individual model is and how much it differs from the others – but in general it outperforms each individual model by far. One drawback of ensembles is that they typically contain such a large number of models that it is not meaningful to display them right as they are. In RDS, one may however get a good overall picture of an ensemble by looking at the variable importance graph. In Fig. 4, this graph is shown for an ensemble generated from the Cleveland heart disease data. The two most important variables according to this, the number of major ves