Bias vs Variance


Bias and variance are two fundamental concepts in machine learning, particularly in the context of model performance evaluation.


Some basic differences:


Bias Variance
Bias refers to the error introduced by approximating a real-world problem with a simplified model. Variance refers to the variability of a model's predictions for a given data point if the model were trained on different datasets.
A model with high bias pays little attention to the training data and oversimplifies the underlying patterns, leading to systematic errors. A model with high variance pays too much attention to the training data and captures noise in the data as if it were genuine patterns.
High bias can cause the model to underfit the data, meaning it performs poorly both on the training set and new, unseen data. High variance can cause the model to overfit the training data, meaning it performs well on the training set but poorly on new, unseen data.
Examples of high bias models include linear regression models that are too simple to capture the underlying relationships in the data. Examples of high variance models include complex decision trees or neural networks with too many parameters, which can capture noise in the training data.
Bias measures how well a model can capture the true underlying relationships in the data. Variance measures how much the model's predictions vary for different training datasets.


Ideally, you want to find a balance between bias and variance, resulting in a model that generalizes well to new, unseen data. This balance is often referred to as the bias-variance tradeoff and addressed through regularization techniques, cross-validation, and model selection.



Highly regarded websites to learn about AI techniques

🔗 Coursera | 🔗 edX | 🔗 MIT OpenCourseWare | 🔗 Stanford Online | 🔗 Fast.ai | 🔗 Kaggle | 🔗 Towards Data Science | 🔗 GitHub