
















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Community
Ask the community for help and clear up your study doubts
Discover the best universities in your country according to Docsity users
Free resources
Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors
An introduction to performance evaluation in machine learning, focusing on precision, recall, error rate, and other criteria for assessing the practical utility of classifiers. It covers basic variables, correct/incorrect classification, error rate vs. Rejection rate, and the interplay between precision and recall. The document also discusses the roc-curve, the f-measure, sensitivity, specificity, and performance in multi-label domains.
Typology: Lecture notes
1 / 24
This page cannot be seen from the preview
Don't miss anything!
Number of correctly classified examples: Number of misclassified examples: Total number of examples:
Error Rate Classification Accuracy Note that
Consider a heavily imbalanced set of examples ◦ (^) E.g., 970 examples are pos and 30 are neg ◦ (^) Consider a classifier that labels all examples as pos ◦ (^) The error rate is only 3% but the classifier is useless Such domains are quite common Therefore: we need criteria capable of quantifying the classifier’s practical utility
A classifier has been applied to a set of examples Precision : percentage of truly pos examples among those labeled as such by the classifier: Recall : percentage of pos examples labeled as such by the classifier among all truly positive examples:
This depends on the concrete application: ◦ (^) Recommender systems: ◦ (^) Need high precision, to make sure the customer is rarely disappointed ◦ (^) Recall is here unimportant (no need to identify all relevant movies) ◦ (^) Medical diagnosis: ◦ (^) Usually, recall is more important ◦ (^) Precision can be improved by follow-up tests
Different types of error can often be influenced by the classifier’s parameters Compare two classifiers, and
Sensitivity (recall measured on positive examples): Specificity (recall measured on negative examples):
Each class is weighed according to its frequency among the examples
Unless the set of pre-classified examples is really big, the results can be unreliable Therefore: specific methodologies of repeated trials ◦ (^) Random subsampling ◦ (^) N -fold cross-validation ◦ (^) Stratified versions of these approaches ◦ (^) cross-validation
Let T be the set of pre-classified examples