A function that performs one-hot encoding for class labels.
from mlxtend.preprocessing import one_hot
Typical supervised machine learning algorithms for classifications assume that the class labels are nominal (a special case of categorical where no order is implied). A typical example of an nominal feature would be "color" since we can't say (in most applications) that "orange > blue > red".
The one_hot
function provides a simple interface to convert class label integers into a so-called one-hot array, where each unique label is represented as a column in the new array.
For example, let's assume we have 5 data points from 3 different classes: 0, 1, and 2.
y = [0, # sample 1, class 0
1, # sample 2, class 1
0, # sample 3, class 0
2, # sample 4, class 2
2] # sample 5, class 2
After one-hot encoding, we then obtain the following array (note that the index position of the "1" in each row denotes the class label of this sample):
y = [[1, 0, 0], # sample 1, class 0
[0, 1, 0], # sample 2, class 1
[1, 0, 0], # sample 3, class 0
[0, 0, 1], # sample 4, class 2
[0, 0, 1] # sample 5, class 2
])
from mlxtend.preprocessing import one_hot
import numpy as np
y = np.array([0, 1, 2, 1, 2])
one_hot(y)
array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.], [ 0., 1., 0.], [ 0., 0., 1.]])
from mlxtend.preprocessing import one_hot
y = [0, 1, 2, 1, 2]
one_hot(y)
array([[ 1., 0., 0.], [ 0., 1., 0.], [ 0., 0., 1.], [ 0., 1., 0.], [ 0., 0., 1.]])
from mlxtend.preprocessing import one_hot
y = [0, 1, 2, 1, 2]
one_hot(y, dtype='int')
array([[1, 0, 0], [0, 1, 0], [0, 0, 1], [0, 1, 0], [0, 0, 1]])
from mlxtend.preprocessing import one_hot
y = [0, 1, 2, 1, 2]
one_hot(y, num_labels=10)
array([[ 1., 0., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.], [ 0., 1., 0., 0., 0., 0., 0., 0., 0., 0.], [ 0., 0., 1., 0., 0., 0., 0., 0., 0., 0.]])
with open('../../api_modules/mlxtend.preprocessing/one_hot.md', 'r') as f:
print(f.read())
## one_hot *one_hot(y, num_labels='auto', dtype='float')* One-hot encoding of class labels **Parameters** - `y` : array-like, shape = [n_classlabels] Python list or numpy array consisting of class labels. - `num_labels` : int or 'auto' Number of unique labels in the class label array. Infers the number of unique labels from the input array if set to 'auto'. - `dtype` : str NumPy array type (float, float32, float64) of the output array. **Returns** - `ary` : numpy.ndarray, shape = [n_classlabels] One-hot encoded array, where each sample is represented as a row vector in the returned array. **Examples** For usage examples, please see [https://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/](https://rasbt.github.io/mlxtend/user_guide/preprocessing/one_hot/)