# The Azimuth Project Receiver Operating Characteristic analysis (Rev #1)

## Idea

When creating any system which maps from inputs to a classification output, but particularly machine learning classifiers, there are multiple kinds of error (false positive, etc). It’s desirable to be able to says which are the “better” classifiers even without fully committing to the relative importance of the various kinds of errors. Receiver Operating Characteristic (ROC) analysis is a method for doing this to extent the possible.

## Details

Note: for simplicity, we describe the case where the class prior probability of a data item being a particular class is equal; analogous results hold in the uneven prior case.

### Kinds of errors

Consider a system where each item has a feature vector $f \in F$ and a class $c \in C$. Then given classifier $\xi : F \rightarrow C$, on a data set $\{(f_i,c_i)\}$ it will in general have some misclassifications where $\xi(f_j) = d_j \ne c_j$; each is an instance of misclassifying $c_j$ as $d_j$. In the two class case, which we specialise to until further notice, these get the special names false positive (fp) ($d=true$ when $c=false$) and false negative (fn) ($d=false$ when $c=true$), along with the correct classifications true positive (tp) and true negatives (tn). Note that in the limit there are the relations

(1)$tp+fp=1 \quad and tn+fn=1$

so that one has some freedom in terms of which variables to use.