A categorical attribute with $n$ distinct values is mapped into $n$ binary attributes.
It is also possible to map into $n-1$ binary values, where the scenario where all binary attributes are equal to zero corresponds to the last categorical value not indicated in the attributes.
# DAL ToolBox
# version 1.0.777
source("https://raw.githubusercontent.com/cefet-rj-dal/daltoolbox/main/jupyter.R")
#loading DAL
load_library("daltoolbox")
Loading required package: daltoolbox Registered S3 method overwritten by 'quantmod': method from as.zoo.data.frame zoo Attaching package: ‘daltoolbox’ The following object is masked from ‘package:base’: transform
iris <- datasets::iris
head(iris)
Sepal.Length | Sepal.Width | Petal.Length | Petal.Width | Species | |
---|---|---|---|---|---|
<dbl> | <dbl> | <dbl> | <dbl> | <fct> | |
1 | 5.1 | 3.5 | 1.4 | 0.2 | setosa |
2 | 4.9 | 3.0 | 1.4 | 0.2 | setosa |
3 | 4.7 | 3.2 | 1.3 | 0.2 | setosa |
4 | 4.6 | 3.1 | 1.5 | 0.2 | setosa |
5 | 5.0 | 3.6 | 1.4 | 0.2 | setosa |
6 | 5.4 | 3.9 | 1.7 | 0.4 | setosa |
cm <- categ_mapping("Species")
iris_cm <- transform(cm, iris)
print(head(iris_cm))
Speciessetosa Speciesversicolor Speciesvirginica 1 1 0 0 2 1 0 0 3 1 0 0 4 1 0 0 5 1 0 0 6 1 0 0
Can be made from a single column, but needs to be a data frame
diris <- iris[,"Species", drop=FALSE]
head(diris)
Species | |
---|---|
<fct> | |
1 | setosa |
2 | setosa |
3 | setosa |
4 | setosa |
5 | setosa |
6 | setosa |
iris_cm <- transform(cm, diris)
print(head(iris_cm))
Speciessetosa Speciesversicolor Speciesvirginica 1 1 0 0 2 1 0 0 3 1 0 0 4 1 0 0 5 1 0 0 6 1 0 0