Misfit and Bayes Score¶

In MC_fit the Nelder-Mead algorithm is used to minimize the difference between observational data and a thermodynamic model of the data as a function of the inversion parameters. This function is referred to as the misfit function. The misfit function is directly related to the likelihood function that is central to Bayesian inversion methods. Specifically:

(7)¶\[\text{likelihood} = \exp(-\text{misfit}),\]

thus, the minimum misfit model is identical to the maximum likelihood model.

By default, MC_fit identifies the minimum misfit model as the best model and the inversion parameters that produce this model are taken to be the best estimate of the unknown parameters. This choice is appropriate when the prior probability distribution of the inversion parameters is uniform, i.e., the parameters are equally likely to lie anywhere within their specified ranges. Both Bingo-Antidote [Duesterhoeft2020] and AvPT+ [Green2025] also rely on misfit to identify the best model (comparison).

If it is known, or assumed a priori, that the probability distribution of the inversion parameters is non-uniform, then, according to Bayes’ theorem, the most probable estimator for the parameters must account for this prior information. In Bayesian inversion, this accounting is done by maximizing the posterior probability, which is the product of the likelihood and the prior probability distributions. The likelihood distribution is estimated from an unbiased sample of the misfit as a function of the inversion parameters, as is done in the MCMC thermobarometry program ([Mackay2025]). There are two reasons why this is not done in MC_fit: (1) The Nelder-Mead algorithm does not yield an unbiased sample of the likelihood distribution. (2) Because there is no specific model at the maximum of the posterior probability distribution there is no guarantee that the thermodynamic model will predict the observed phase assemblage at these parameter values. Instead, MC_fit can be made to emulate Bayesian inversion by using Bayes scores to identify the best model. Bayes scoring, detailed below, applies a penalty to models for which the inversion parameters are far from the respective midpoints of their specified ranges.

In summary, MC_fit allows one of two criteria for identifying the best model:

The model that minimizes misfit (default).
The model that maximizes Bayes score.

The misfit criterion is appropriate when there is no prior information about the inversion parameters. The Bayes score criterion is appropriate when there is confidence in the initial estimates for the inversion parameters.

Note

Regardless of the criterion used to identify the best model, the objective function of the Nelder-Mead optimization is always the misfit function. Thus, the criterion chosen has no influence on the randomized sampling strategy. Using Bayes score as the objective function of the Nelder-Mead optimization would incorporate prior information about the inversion parameters into the optimization process. Such an implementation requires only minor modifications to MC_fit and will be undertaken upon an expression of interest.

—

Misfit¶

The misfit function is the sum of four components:

(8)¶\[\text{misfit} = h_{\text{missed}} + h_{\text{extraneous}} + h_{\text{composition}} + h_{\text{mode}}\]

\(h_{\text{missed}}\) accounts for the number of observed phases correctly predicted, \(\hat{n}_\phi\).
\(h_{\text{extraneous}}\) accounts for the number of phases predicted but not observed, \(\tilde{n}_\phi\).
\(h_{\text{composition}}\) and \(h_{\text{mode}}\) account for discrepancies between the compositions and modes of the observed and correctly predicted phases.

—

Missed Phase Component¶

The missed phase component is:

(9)¶\[h_{\text{missed}} = w_{\text{missed}} \left( 1 - \frac{\hat{n}_\phi}{n_\phi} \right)\]

where:

\(n_\phi\) is the number of observed phases.
\(w_{\text{missed}}\) is a weighting factor that can be adjusted to increase or decrease the importance of predicting all observed phases.

Because \(\hat{n}_\phi \leq n_\phi\), the second term in Eq 9 lies within [0, 1].

—

Extraneous Phase Component¶

The extraneously predicted phase component is:

(10)¶\[h_{\text{extraneous}} = w_{\text{extraneous}} \sum_{m=1}^{\tilde{n}_\phi} \tilde{m}_m^2\]

where:

\(\tilde{n}_\phi\) is the number of phases predicted but not observed.
\(\tilde{m}_m\) is the volumetric mode of the extraneous phase \(m\).
\(w_{\text{extraneous}}\) is an adjustable weighting factor.

The penalty for small amounts of extraneous phases should not be excessive for two reasons:

Thermodynamic solution models are simplifications incapable of perfectly reproducing natural mineral compositions. The consequent discrepancies often manifest as small amounts of extraneous phases.
If modal data are uncertain by \(\epsilon_m\), \(w_{\text{extraneous}} \epsilon_m^2\) should be insignificant in comparison to the next largest component of the misfit function.

—

Composition and Mode Components¶

Both \(h_{\text{composition}}\) and \(h_{\text{mode}}\) may be implemented as Least Squares (LSQ), Chi-squared (Chi), or weighted Chi-squared (wChi) functions. The implementation is specified by the misfit_composition and misfit_mode options. Although the weighted Chi-squared function is theoretically preferred, the MC_fit default is Chi-squared because it does not require knowledge of the analytical uncertainties. The form and characteristics of these functions are summarized below.

—

Least-Squares¶

(11)¶\[h_{\text{LSQ}} = w_{\text{LSQ}} \sum (\hat{x}_j - x_j)^2\]

where \(x_j\) and \(\hat{x}_j\) are observed and predicted values.

Appropriate when all observed values have the same absolute uncertainty.
Minimizing LSQ finds the best fit by minimizing the overall error.
Typically \(h_{\text{LSQ}}/w_{\text{LSQ}} \ll 1\).
Inconsistent with the likelihood function as in Eq 7.

—

Chi-Squared¶

(12)¶\[h_{\text{Chi}} = w_{\text{Chi}} \sum \frac{(\hat{x}_j - x_j)^2}{\hat{x}_j}\]

Appropriate when all observed values have the same relative uncertainty.
Increases the influence of minor compositions and/or phases present in small amounts.
Typically \(h_{\text{Chi}}/w_{\text{Chi}} \lt 1\).
Inconsistent with the likelihood function as in Eq 7.

—

Weighted Chi-Squared¶

(13)¶\[h_{wChi} = w_{\text{wChi}} \sum \left( \frac {\hat{x}_j - x_j}{\epsilon_j} \right) ^2\]

where \(\epsilon_j\) is the uncertainty of the observed value \(x_j\).

Appropriate when the uncertainty of the observed values is known.
Strictly related to the likelihood function as in Eq 8.
Typically \(h_{\text{wChi}}/w_{\text{wChi}} \lt 1\).

—

Choosing \(w_{\text{missed}}\), \(w_{\text{extraneous}}\), \(w_{\text{composition}}\), and \(w_{\text{mode}}\)¶

The weighting factors control the relative importance of the four terms of the misfit function (Eq 8) and can be modified via the w_missing, w_extra, w_composition, and w_mode options. In thermobarometric problems, the priorities, in order of importance, are usually to:

Predict all observed phases.
Avoid predicting large amounts, i.e., \(\tilde{m}_m > \epsilon_m\) of extraneous phases.
Accurately predict the compositions and modes of the observed phases.

This ordering requires that if the first two priorities are not met, \(h_{\text{missed}} > h_{\text{extraneous}} > \{h_{\text{composition}}, h_{\text{mode}} \}\) dominate the misfit function. The default values of the weighting factors have been chosen to reflect these priorities for a number of test cases. When modeling a new assemblage, it is prudent to verify that the components of the misfit function have the desired relative importance by monitoring the console output during the optimization or by examining the my_project.out file. See discussion above for additional considerations when setting \(w_{\text{extraneous}}\) specifically and the expected magnitudes of \(h_i / w_i\) generally.

In treating experimental data, the above priorities must be modified because placing too much emphasis on predicting all observed phases may lead to a model that predicts phase boundaries that, though consistent only with all observations, do not accurately reflect the observed compositions and/or modes. For example, given experimental measurements on a binary solvus, a high value for \(w_{\text{missed}}\) will favor a model that predicts a solvus that matches the narrowest observed compositional gap between coexisting phases.

—

Bayes Score¶

When the Bayes option is set to T, MC_fit identifies the best fit model as the model with the highest Bayes score:

(14)¶\[\text{Bayes-score} = -\ln(\text{misfit}) \prod pp_j\]

where \(pp_j\), the prior probability of inversion parameter j, is computed assuming a normal distribution as:

(15)¶\[pp_j = \frac{1}{\sqrt{2 \pi \sigma_j^2}} \exp \left[ -\frac{1}{2} \left( \frac{p_j - p_{j,\text{mid}}}{\sigma_j} \right)^2 \right]\]

where \(p_j\) is the value of parameter \(j\), \(p_{j,\text{mid}}\) is the midpoint of the specified range of \(p_j\). The coordinate of the centroid of the inversion parameter ranges (\(\{p_{1,\text{mid}} ... {p_{n,\text{mid}}\)) represents the prior estimate of the most probable inversion parameter values. In MC_fit_plot output this coordinate is identified asthe MPP coordinate (black diamond, Fig. 1).

The standard deviation of \(p_j\), \(\sigma_j\), is estimated from the specified range for the parameter, \(d_j\), as:

(16)¶\[\sigma_j = \frac{d_j}{c}\]

where the constant \(c\) is set by the range_to_sigma option. The default value of \(c\) is 4, which implies:

\(d_j\) corresponds to ±2 \(\sigma_j\)
there is a ca 95% prior probability that the true value of \(p_j\) lies within its specified range.
\(pp_j\) drops from 0.399 at \(p_{j,\text{mid}}\) to 0.054 at the limits of its specified range.

Bayes scoring discourages the selection of models with extreme parameter values that may fit the data well but are less plausible a priori. The strength of the penalty is controlled by the range_to_sigma option. By default, the prior_probability_PT_only option dictates that the summation in Eq 15 includes only the prior probabilities of pressure and temperature. Bayes scoring should only be used when there is good reason to believe that the true values of the inversion parameters lie near the center of their specified ranges.

—

Multiple Assemblages and/or Experiments¶

When multiple phase assemblages and/or experiments are modeled simultaneously, the misfit of each assemblage/experiment is calculated separately, and MC_fit optimizes the total to identify the best fit model.