Mean Absolute Error (MAE): Robustness and Median Connection
When choosing a loss function for regression tasks, you often encounter both Mean Absolute Error (MAE) and Mean Squared Error (MSE). The MAE is defined as the average of the absolute differences between true values (y) and predicted values (y^). Its mathematical formula is:
LMAE(y,y^)=∣y−y^∣1234567891011121314import numpy as np import matplotlib.pyplot as plt errors = np.linspace(-4, 4, 400) mae = np.abs(errors) mse = errors**2 plt.plot(errors, mae, label="MAE") plt.plot(errors, mse, label="MSE") plt.title("MAE vs MSE Loss Functions") plt.xlabel("Error (y - ŷ)") plt.ylabel("Loss") plt.legend() plt.show()
Unlike MSE, which squares the error, MAE simply takes the absolute value. This difference has important consequences for how each loss function responds to large errors. While MSE penalizes large errors more heavily due to the squaring, MAE treats all errors in direct proportion to their magnitude. This means that the influence of any single, very large error is much less pronounced with MAE than with MSE.
12345import numpy as np errors = np.array([1, 2, 3, 20]) # outlier = 20 print("MAE:", np.mean(np.abs(errors))) print("MSE:", np.mean(errors**2))
MAE is less sensitive to outliers than MSE, making it a robust choice when your data contains extreme values or follows a heavy-tailed distribution. This robustness helps prevent a few large errors from dominating the loss and distorting your model's learning process.
Mathematically, the connection between MAE and the median emerges when you try to find the constant value that minimizes the MAE for a set of data points. If you have a set of observed values and you want to choose a single value that minimizes the sum of absolute differences to all points, the optimal choice is the median of the data. This is because the median splits the data such that half the points are above and half below, minimizing the total absolute deviation. In contrast, minimizing MSE leads to the mean as the optimal estimator. Therefore, using MAE as a loss function encourages your model's predictions to align with the median of the target distribution, rather than the mean.
123456789101112import numpy as np data = np.array([1, 2, 5, 8, 50]) # outlier = 50 mean = np.mean(data) median = np.median(data) mae_mean = np.sum(np.abs(data - mean)) mae_median = np.sum(np.abs(data - median)) print("Mean:", mean, "| Total ABS deviation:", mae_mean) print("Median:", median, "| Total ABS deviation:", mae_median)
Tak for dine kommentarer!
Spørg AI
Spørg AI
Spørg om hvad som helst eller prøv et af de foreslåede spørgsmål for at starte vores chat
Can you explain when to use MAE versus MSE in practice?
What are the advantages and disadvantages of MAE and MSE?
Can you show how outliers affect MAE and MSE with an example?
Awesome!
Completion rate improved to 6.67
Mean Absolute Error (MAE): Robustness and Median Connection
Stryg for at vise menuen
When choosing a loss function for regression tasks, you often encounter both Mean Absolute Error (MAE) and Mean Squared Error (MSE). The MAE is defined as the average of the absolute differences between true values (y) and predicted values (y^). Its mathematical formula is:
LMAE(y,y^)=∣y−y^∣1234567891011121314import numpy as np import matplotlib.pyplot as plt errors = np.linspace(-4, 4, 400) mae = np.abs(errors) mse = errors**2 plt.plot(errors, mae, label="MAE") plt.plot(errors, mse, label="MSE") plt.title("MAE vs MSE Loss Functions") plt.xlabel("Error (y - ŷ)") plt.ylabel("Loss") plt.legend() plt.show()
Unlike MSE, which squares the error, MAE simply takes the absolute value. This difference has important consequences for how each loss function responds to large errors. While MSE penalizes large errors more heavily due to the squaring, MAE treats all errors in direct proportion to their magnitude. This means that the influence of any single, very large error is much less pronounced with MAE than with MSE.
12345import numpy as np errors = np.array([1, 2, 3, 20]) # outlier = 20 print("MAE:", np.mean(np.abs(errors))) print("MSE:", np.mean(errors**2))
MAE is less sensitive to outliers than MSE, making it a robust choice when your data contains extreme values or follows a heavy-tailed distribution. This robustness helps prevent a few large errors from dominating the loss and distorting your model's learning process.
Mathematically, the connection between MAE and the median emerges when you try to find the constant value that minimizes the MAE for a set of data points. If you have a set of observed values and you want to choose a single value that minimizes the sum of absolute differences to all points, the optimal choice is the median of the data. This is because the median splits the data such that half the points are above and half below, minimizing the total absolute deviation. In contrast, minimizing MSE leads to the mean as the optimal estimator. Therefore, using MAE as a loss function encourages your model's predictions to align with the median of the target distribution, rather than the mean.
123456789101112import numpy as np data = np.array([1, 2, 5, 8, 50]) # outlier = 50 mean = np.mean(data) median = np.median(data) mae_mean = np.sum(np.abs(data - mean)) mae_median = np.sum(np.abs(data - median)) print("Mean:", mean, "| Total ABS deviation:", mae_mean) print("Median:", median, "| Total ABS deviation:", mae_median)
Tak for dine kommentarer!