AIC vs BIC – Difference and Comparison

 What is AIC?

The Akaike information criteria, abbreviated AIC, was developed by Japanese statistician Hirotugu Akaike. It was initially referred to as “an information criteria.” Akaike presented it for the first time in English at a conference in 1971, and the proceedings of the conference were published in 1973. In 1974, the first official paper was published. Because the approach is based on the idea of entropy in information theory, Akaike called it the “entropy maximization principle.”

The goal of AIC is to find the model that best describes the variation in the dependent variable with the fewest number of independent factors. As a result, choosing a simpler model over a complicated one is advantageous.

According to AIC, the best-fit model is the one that explains the most variance with the fewest number of independent variables.AIC is currently the base of a paradigm for statistical foundations and is utilized for statistical inference.

When assessing the amount of information lost by a model, AIC considers the trade-off between the model’s quality of fit and its simplicity. To put it another way, AIC addresses both the danger of overfitting and the risk of underfitting.

What is BIC?

The Bayesian information criteria, abbreviated BIC, is also known as the Schwarz information criterion. Gideon E. Schwarz developed the criteria as an asymptotic approximation to a transformation of a candidate model’s Bayesian posteriorprobability.

In 1978, Gideon E. Schwarz created and published the theory. Its popularity arises from its computational simplicity and efficient performance in a wide range of modeling frameworks, including Bayesian applications where prior distributions can be difficult to obtain.

In Bayesian statistics, the Bayesian Information Criteria is used to pick between two or more possible alternative models. It is feasible to raise the probability of a model by adding parameters, however, doing so may result in overfitting. The BIC overcomes this problem by providing a penalty term for the model’s parameter count. In comparison to AIC, BIC has a longer penalty term.

BIC seeks to select a model that maximizes the posterior probability of the data given the model. When choosing amongst many models, models with lower BIC values are favoured. A lower BIC does not always imply that one model is superior to another. The BIC is only a heuristic since it contains approximations. Differences in BIC, in particular, should never be interpreted as modified Bayes factors.

Difference Between AIC and BIC

  1. AIC stands for Akaike’s Information Criteria, whereas BIC stands for Bayesian Information Criteria.
  2. Hirotsugu Akaike published Akaike’s Information Criteria in 1974, while Gideon E. Schwarz published Bayesian Information Criteria in 1978.
  3. The AIC may be thought of as a metric for determining the quality of fit of any estimated statistical model, whereas the BIC is a model selection method among a class of parametric models with varying numbers of parameters.
  4. In Bayesian Information Criteria, the penalty for additional parameters is greater than in Akaike’s Information Criteria.
  5. Under the assumption that the “true model” is not in the candidate set, AIC is asymptotically best for picking the model with the least mean squared error, while BIC is not asymptotically optimal.
  6. The overall goal of Akaike’s Information Criteria is to identify an unknown model with high dimensional realism, On the other hand, the Bayesian Information Criteria, identifies only True models.
  7. The Bayesian Information Criteria is consistent, while Akaike Information Criteria is not.

Comparison Between AIC and BIC

Parameters Of ComparisonAICBIC
Definition AIC is an evaluation of a continuous and corresponding interval among the undefined, correct, and justified likelihood of the information.BIC is an appropriate evaluation of the purpose of the possibility following the model under a certain Bayesian structure.
Developed byHirotsugu AkaikeGideon E. Schwarz 
Published in19741978
FormulaAIC = 2k – 2ln(L^)BIC = k ln(n) – 2ln(L^)
Model-selection parameterAIC seeks to choose a model that minimizes the Kullback–Leibler divergence between the training and predicted values.BIC would like to select a model that maximizes the posterior probability of the data given the model.
Assumptions.  Most optimistic assumptionsLess optimal assumptions
 Penalty The penalty for AIC is 2kThe penalty for BIC is ln (n) k.

References

  1. https://journals.sagepub.com/doi/abs/10.1177/0049124103262065
  2. https://www.jstor.org/stable/43495189