Why is data validation so important?
In a another topic, Data Collection, reference was made to validation. The reason why data must be validated is that a sample of the data taken from a population only represents reliably those items which make up the sample. Because none of the others in the population will have been considered the relatively small sample cannot accurately represent the mean of the population. If this is so, then how reliable are the results of the sample? The process of estimating the reliability is known as validation. We must accept that the results of a sample do not represent the population accurately but at least we can assess the extent of the error due to sampling, i.e. the statistical error From this, it is clear that accepting results of a sample is dangerous unless we appreciate the size of the statistical error.