Important Notice: Our web hosting provider recently started charging us for additional visits, which was unexpected. In response, we're seeking donations. Depending on the situation, we may explore different monetization options for our Community and Expert Contributors. It's crucial to provide more returns for their expertise and offer more Expert Validated Answers or AI Validated Answers. Learn more about our hosting issue here.

What data-mining method to use?

April 26, 2017analytics Data data analysis data mining method mining prediction predictive predictive analysis

0

Posted

What data-mining method to use?

1 Answer

0

Posted

If you are less interested in finding relationships in the data, and more interested in making better and more accurate predictions, it is advisable to use an ensemble. An ensemble is simply a collection of models (trees in RDS), which is used to make a prediction by forming a collective vote from all contained models. The predictive performance of an ensemble is, among other things, affected by the number of contained models, how correct each individual model is and how much it differs from the others – but in general it outperforms each individual model by far. One drawback of ensembles is that they typically contain such a large number of models that it is not meaningful to display them right as they are. In RDS, one may however get a good overall picture of an ensemble by looking at the variable importance graph. In Fig. 4, this graph is shown for an ensemble generated from the Cleveland heart disease data. The two most important variables according to this, the number of major ves