列表

详情


NC223861. ScholomanceAcademy

描述

As a student of the Scholomance Academy, you are studying a course called \textit{Machine Learning}. You are currently working on your course project: training a binary classifier.

A binary classifier is an algorithm that predicts the classes of instances, which may be positive or negative . A typical binary classifier consists of a scoring function that gives a score for every instance and a threshold that determines the category. Specifically, if the score of an instance , then the instance is classified as positive; otherwise, it is classified as negative. Clearly, choosing different thresholds may yield different classifiers.

Of course, a binary classifier may have misclassification: it could either classify a positive instance as negative (false negative) or classify a negative instance as positive (false positive).

Given a dataset and a classifier, we may define the true positive rate () and the false positive rate () as follows:


where is the number of true positives in the dataset; are defined likewise.

Now you have trained a scoring function, and you want to evaluate the performance of your classifier. The classifier may exhibit different TPR and FPR if we change the threshold . Let be the when the threshold is , define the  () as

where the integrand, called  (ROC), means the maximum possible of given that .

Given the actual classes and predicted scores of the instances in a dataset, can you compute the of your classifier?

For example, consider the third test data. If we set threshold , there are 3 true positives, 2 false positives, 2 true negatives, and 1 false negative; hence, and . Also, as varies, we may plot the ROC curve and compute the AUC accordingly, as shown in Figure 1.

输入描述

The first line contains a single integer  , the number of instances in the dataset. Then follow  lines, each line containing a character  and an integer  , denoting the actual class and the predicted score of an instance.

It is guaranteed that there is at least one instance of either class.

输出描述

Print the  of your classifier within an absolute error of no more than .

示例1

输入:

3
+ 2
- 3
- 1

输出:

0.5

示例2

输入:

6
+ 7
- 2
- 5
+ 4
- 2
+ 6

输出:

0.888888888888889

示例3

输入:

8
+ 34
+ 33
+ 26
- 34
- 38
+ 39
- 7
- 27

输出:

0.5625

说明:

ROC and {AUC} of the third sample data.

原站题解

上次编辑到这里,代码来自缓存 点击恢复默认模板

C++ 解法, 执行用时: 454ms, 内存消耗: 17064K, 提交时间: 2021-07-25 17:01:24

#include<bits/stdc++.h>
using namespace std;
int n, a, dat[2][1123456], cnt[2];
double ans;
char c;
signed main()
{
	cin >> n;
	for(int i = 0; i < n; i++)
	{
		cin >> c >> a;
		dat[c=='+'][cnt[c=='+']++] = -a;
	}
	for(int i = 0; i < 2; i++)
		sort(dat[i], dat[i]+cnt[i]);
	for(int i = 0; i < cnt[0]; i++)
		ans += upper_bound(dat[1], dat[1]+cnt[1], dat[0][i]-1) - dat[1];
	printf("%.10f", ans / cnt[0] / cnt[1]);
}

Python3 解法, 执行用时: 3750ms, 内存消耗: 48888K, 提交时间: 2021-09-30 00:14:50

n=int(input())
neg=[]
pos=[]
for i in range(n):
    ch,p=input().split(' ')
    p=int(p.strip())
    if ch=='-':
        neg.append(p)
    else:
        pos.append(p)
neg.sort()
pos.sort()
d=len(pos)*len(neg)
k=-1
cnt=0
for i in range(len(pos)):
    while k+1<len(neg) and neg[k+1]<pos[i]:
        k=k+1
    cnt+=(k+1)
    
print(cnt/d)

上一题