Target audience: Advanced
Estimated reading time: 5'
Introduction
The Naive Bayes approach is a generative supervised learning method which is based on a simplistic hypothesis: it assumes that the existence of a specific feature of a class is unrelated to the existence of another feature. This condition of independence between model features is essential to the proper classification.
Mathematically, Bayes' theorem gives the relationship between the probabilities of A and B, p(A) and p(B), and the conditional probabilities of A given B and B given A, p(A|B) and p(B|A).
In its most common form the Naive Bayes Formula is defined for a proposition (or class) A and evidence (or observation) B with \[p(A|B)= \frac{p(B|A).p(A)}{p(B)}\]
- p(A), the prior, is the initial degree of belief in A.
- p(A|B), the posterior, is the degree of belief having accounted for B
- p(B|A)/p(B) represents the support B provides for A
The case above can be extended to a network of cause-effect conditional probabilities p(X|Y).
- p(A), the prior, is the initial degree of belief in A.
- p(A|B), the posterior, is the degree of belief having accounted for B
- p(B|A)/p(B) represents the support B provides for A
The case above can be extended to a network of cause-effect conditional probabilities p(X|Y).
In case of the features of the model are known to be independent. The probability of a observation x =( ...,x,...) to belong to a class C is computed as: \[p(C|\vec{x})=\frac{\prod (p(x_{i}|C).p(C)}{p(\vec{x})}\]. It is usually more convenient to compute the maximum likelihood of the probability of a new observation to belong to a specific class by converting the formula above. \[log\,p(C|\vec{x}) = \sum log\,p(x_{i}|C) + log\,p(C) - log\,p(\vec{x})\]
Note: For the sake of readability of the implementation of algorithms, all non-essential code such as error checking, comments, exception, validation of class and method arguments, scoping qualifiers or import is omitted.
Software design
The class in the example below, implements a basic version Naive Bayes algorithm. The model and its feature is defined by the nested class NClass. This model class defines the features parameters (mean and variance of prior observations) and the class probability p(C). The computation of the mean and variances of prior is implemented in the NClass.computeStats method.Some of the methods, setters, getters, comments and conditional test on arguments are omitted for the sake of clarity. The kernel function is to be selected at run-time. This implementation supports any number of features and classes.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39 | public final class NaiveBayes implements Classifier {
public final class NClass {
private double[] _params = null;
private double[] _paramsVariance = null;
private double _classProb = 0.0;
public NClass(int numParms) {
_params = new double[numParams];
}
private void add(double[] data) {
int numObservations = 0;
_paramsVariance = new double[_params.length];
for(int j = 0; j < data.length; ) {
j++;
for( int k = 0; k < _params.length; k++, j++) {
_params[k] += data[j];
_paramsVariance[k] += data[j]*data[j];
}
numObservations++;
}
_classProb = numObservations;
}
private void computeStats() {
double inv = 1.0/_classProb;
double invCube = invClassProb*invClassProb*invClassProb;
for( int k = 0; k < _params.length; k++) {
_params[k] /= _classProb;
_paramsVariance[k] = _paramsVariance[k]*inv -
_params[k]*_params[k]*invCube;
}
_classProb /= _numObservations;
}
}
}
|
Kernel functions can be used to improve the classification observations by increasing the distance between prior belonging to a class during the training phase. In the case of 2 classes (Bernoulli classification) C1, C2 the kernel algorithm increases the distance between the mean values m1 and m2 of all the prior observations for each of the two classes, adjusted for the variance.
As Java as does not support local functions or closures we need to create a classes hierarchy to implement the different kernel(discriminant) functions. The example below defines a simple linear and logistic (sigmoid function) kernel functions implemented by nested classes. \[y = \theta x \,\,and\,\,y =\frac{1}{1+e^{-x}}\]
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22 | public interface Discriminant {
public double estimate(double value);
}
//Nested class that implement a linear Discriminant
public static class DiscriminantKernel
implements Discriminant {
private double _theta = 1.0;
public DiscriminantKernel(double theta) {
_theta = theta;
}
public double estimate(double value) {
return value*_theta;
}
}
// Nested class that implements a sigmoid function for kernel
public static class SigmoidKernel implements Kernel {
public double estimate(double value) {
return 1.0/(1.0 + Math.exp(-value)
}
}
|
Ultimately, the NaiveBayes class implements the three key components of the learning algorithm:
- Training: train
- Run time classification: classify
First let's define the NaiveBayes class and its constructors.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32 | public final class NaiveBayes implements Classifier {
public final class NClass { }
private CDoubleArray[] _valuesArray = null;
private NClass[] _classes = null;
private int _numObservations = 0;
private int _step = 1;
private Kernel _kF = null;
public NaiveBayes() { this(0,0) }
public NaiveBayes(int numParams, int numClasses) {
this(numParams, numClasses, new NLinearDiscriminant());
}
public NaiveBayes(
int numParams,
int numClasses,
final Discriminant kf
) {
_classes = new NClass[numClasses];
_valuesArray = new CDoubleArray[numClasses];
for( int k = 0; k < numClasses; k++) {
_classes[k] = new NClass(numParams);
_valuesArray[k] = new CDoubleArray();
}
_kF = kf;
this.discretize(0,numClasses);
}
..
}
|
Training
Next the training method, train is defined. The method consists merely in computing the statistics on historical data, _valuesArray and assign them to predefined classes _classes
1
2
3
4
5
6
7
8
9
10
11
12
13 | public int train() throws ClassifierException {
double[] values = null;
for( int j = 0; j < _valuesArray.length; j++) {
values = _valuesArray[j].currentValues();
_classes[j].add(values);
}
for( int j = 0; j < _classes.length; j++) {
_classes[j].computeStats();
}
return values.length;
}
|
Classification
The run-time classification method classify uses the prior conditional probability to assign a new observation to an existing class. It generate the class id for a set of values or observations.
A new observation values is assigned to the appropriate class croding to its likelihood or log of conditional probability, by the method logP.
logP computes the likelihood for each value and use the Naive Bayes formula for logarithm of prior probability and log of class probability
Some of the ancillary private methods are omitted for the sake of clarification. We will look at the implementation of the same classifier in Scala in a later post.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18 | public int classify(double[] values) {
// Compute the normalizing denominator value
double[] normalizedPriorProb = new double[values.length],
prob = 0.0;
for( int valueIndex = 0; valueIndex < values.length; valueIndex++) {
for(int classid = 0; classid < _classes.length; classid++) {
prob = Math.abs(values[valueIndex] -
_classes[classid]._parameters[valueIndex]);
if( prob > normalizedPriorProb[valueIndex]){
normalizedPriorProb[valueIndex] = prob;
}
}
}
return logP(values, normalizedPriorProb);
}
|
A new observation values is assigned to the appropriate class croding to its likelihood or log of conditional probability, by the method logP.
logP computes the likelihood for each value and use the Naive Bayes formula for logarithm of prior probability and log of class probability
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27 | private int logP(double[] values, double[] denominator) {
double score = 0.0,
adjustedValue = 0.0,
prior = 0.0,
bestScore = -Double.MAX_VALUE;
int bestMatchedClass = -1;
// Walk through all the classes defined in the model
for(int classid = 0; classid < _classes.length; classid++) {
double[] classParameters = _classes[classid]._parameters;
score = 0.0;
for( int k = 0; k < values.length; k++) {
adjustedValue = _kF.estimate(values[k]);
prior = Math.abs(adjustedValue - classParameters[k])/
denominator[k];
score += Math.log(1.0 - prior);
}
score += Math.log(_classes[classid]._classProb);
if(score > bestScore) {
bestScore = score;
bestMatchedClass = classid;
}
}
return bestMatchedClass;
}
|
Some of the ancillary private methods are omitted for the sake of clarification. We will look at the implementation of the same classifier in Scala in a later post.
References
- The Elements of Statistics Learning: Data mining, Inference & Prediction - Hastie, Tibshirani, Friedman - Springer
- Machine Learning for Multimedia Content Analysis - Y. Gong, W, Xu - Springer
- Effective Java - J Bloch - Addison-Wesley
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.