When you train a machine learning model and it performs poorly on the train and test data, we can say the model underfits.
Underfitting refers to a model that doesn’t capture the underlying pattern or relationship in the dataset. This often results in poor model performance on both the training data and test data. Furthermore, the model fails to learn the important features and characteristics of the data, leading to low accuracy and generalization.
Underfitting is very likely to occur when there isn’t enough data to train the model, poor choice of algorithm, and unclean dataset.
Reasons for Underfitting:
- Lack of data: If we don’t have enough data for the model to learn from, it may not be able to capture the complexities and relationships of the data.
- Poor choice of algorithm: Using a non-linear algorithm for a linear dataset is very likely to cause underfitting. For example, if we have a housing price dataset, which has the number of bedrooms, location, swimming pool, etc., if we choose a simple linear regression model for this objective, it may not be able to capture the complex, non-linear relationships between the features and the house prices.
- Unclean dataset: When a dataset is noisy, with incorrect values, outliers, and null values, it becomes difficult for the machine learning model to understand the underlying pattern in the dataset, which will result in underfitting.
Strategies to eliminate underfitting:
- Quality dataset: Increasing the amount or volume of quality training data will help the model learn more complex patterns.
- Clean data: Removing noise, incorrect values, outliers, and null values will help your model understand the underlying pattern or relationship in the dataset.
- Feature engineering: Feature engineering is a vital step in building a resilient machine learning model. Creating, combining, and selecting the best features will help the model better understand and capture underlying patterns, which ultimately improve the model’s performance.