training data - used by one or more learning schemes
validation data - used to optimize parameters
test data - used to calculate the performance metric
How large is "large enough"? Ideally, the training and test data sets should be representative of the data that will be observed in practice. This means that the data should span the possible cases and distribution of cases should be the same as that observed in practice. In general, it is impossible to know that the data is representative. If one assumes the data set is representative, the requirement for selecting training and test data is that it is stratified, that is that the distribution of cases is the same between the test and training data.
where i is the actual class for the instance. This function represents the information (in bits) required to express the actual class i with respect to the probability distribution $\mathbf{p}$, i.e. if one has the knowledge of the distribution, this is the number of bits required to communicate a specific class if done optimally.