One-hot Encoding

What is One-hot Encoding?

One-hot encoding is a technique used to represent categorical variables as binary vectors. It involves converting a categorical variable with k distinct categories into k separate binary features, each representing one category. Each binary feature takes the value of 1 if the category is present and 0 if it is not. One-hot encoding is commonly used in machine learning algorithms to handle categorical data, as it allows the algorithms to work with numerical inputs instead of text or nominal values.

Example of One-hot Encoding

Suppose we have a dataset with a categorical variable “color” containing three categories: “red”, “green”, and “blue”. One-hot encoding would convert this categorical variable into three binary features as follows:

  • Red: [1, 0, 0]
  • Green: [0, 1, 0]
  • Blue: [0, 0, 1]

Resources

To learn more about one-hot encoding and how to implement it in various programming languages and with different tools, check out these resources: