ScholomanceAcademy

As a student of the Scholomance Academy, you are studying a course called \textit{Machine Learning}. You are currently working on your course project: training a binary classifier.

A binary classifier is an algorithm that predicts the classes of instances, which may be positive $({+})$ or negative $({-})$ . A typical binary classifier consists of a scoring function ${S}$ that gives a score for every instance and a threshold $\theta$ that determines the category. Specifically, if the score of an instance $S(x) \geq \theta$ , then the instance ${x}$ is classified as positive; otherwise, it is classified as negative. Clearly, choosing different thresholds may yield different classifiers.

Of course, a binary classifier may have misclassification: it could either classify a positive instance as negative (false negative) or classify a negative instance as positive (false positive).

Given a dataset and a classifier, we may define the true positive rate ( ${TPR}$ ) and the false positive rate ( ${FPR}$ ) as follows:

${TPR} = \frac{\# {TP}} {\# {TP} + \# {FN}}, \quad {FPR} = \frac{\# {FP}} {\# {TN} + \# {FP}}$

where $\# TP$ is the number of true positives in the dataset; $\# FP, \#TN, \#FN$ are defined likewise.

Now you have trained a scoring function, and you want to evaluate the performance of your classifier. The classifier may exhibit different TPR and FPR if we change the threshold $\theta$ . Let ${TPR}(\theta), FPR(\theta)$ be the ${TPR, FPR}$ when the threshold is $\theta$ , define the ${area\;under\;curve}$ ( ${AUC}$ ) as
${AUC} = \int_{0}^{1} \max_{\theta \in \mathbb{R}} \{TPR(\theta)|FPR(\theta) \leq r\} d r$
where the integrand, called ${receiver\;operating\;characteristic}$ (ROC), means the maximum possible of ${TPR}$ given that $FPR \leq r$ .

Given the actual classes and predicted scores of the instances in a dataset, can you compute the ${AUC}$ of your classifier?

For example, consider the third test data. If we set threshold $\theta = 30$ , there are 3 true positives, 2 false positives, 2 true negatives, and 1 false negative; hence, ${TPR}(30) = 0.75$ and ${FPR}(30) = 0.5$ . Also, as $\theta$ varies, we may plot the ROC curve and compute the AUC accordingly, as shown in Figure 1.

详情