Title: | Bayesian BIN (Bias, Information, Noise) Model of Forecasting |
---|---|
Description: | A recently proposed Bayesian BIN model disentangles the underlying processes that enable forecasters and forecasting methods to improve, decomposing forecasting accuracy into three components: bias, partial information, and noise. By describing the differences between two groups of forecasters, the model allows the user to carry out useful inference, such as calculating the posterior probabilities of the treatment reducing bias, diminishing noise, or increasing information. It also provides insight into how much tamping down bias and noise in judgment or enhancing the efficient extraction of valid information from the environment improves forecasting accuracy. This package provides easy access to the BIN model. For further information refer to the paper Ville A. Satopää, Marat Salikhov, Philip E. Tetlock, and Barbara Mellers (2021) "Bias, Information, Noise: The BIN Model of Forecasting" <doi:10.1287/mnsc.2020.3882>. |
Authors: | Ville Satopää [aut, cre]
|
Maintainer: | Ville Satopää <[email protected]> |
License: | GPL-3 |
Version: | 0.2.0 |
Built: | 2025-02-23 05:37:47 UTC |
Source: | https://github.com/cran/BINtools |
A DESCRIPTION OF THE PACKAGE
Stan Development Team (2020). RStan: the R interface to Stan. R package version 2.21.2. https://mc-stan.org
This function uses the return value of a call to the function estimate_BIN
and produces a full BIN analysis based on that object.
complete_summary(full_bayesian_fit)
complete_summary(full_bayesian_fit)
full_bayesian_fit |
The return value of a call to function |
List containing the parameter estimates of the model, the posterior inferences, and the analysis of predictive performance.
The elements of the list are as follows.
Parameter Estimates: Posterior means, standard deviations, and different quantiles of the model parameters and their differences. The parameter values represent the following quantities.
mu_star: Base rate of the event outcome in the probit scale. E.g., if mu_star = 0, then, in expectation, Phi(0)*100% = 50% of the events happen, where Phi(.) is the CDF of a standard Gaussian random variable.
mu_0: The level of bias in the control group. This can be any number in the real-line. E.g., if mu_0 = 0.1, then the control group believes the base rate to be Phi(mu_star + 0.1).
mu_1: The level of bias in the treatment group; otherwise the interpretation is the same as above for mu_0.
gamma_0: The level of information in the control group. This is a number between 0 and 1, where 0 represents no information and 1 represents full information.
gamma_1: The level of information in the treatment group; otherwise the interpretation is the same as above for gamma_0.
delta_0: The level of noise in the control group. This is a positive value, with higher values indicating higher levels of noise. Noise is in the same scale as information. E.g., delta_0 = 1.0 says that the control group uses as many irrelevant signals as there are relevant signals in the universe. In this sense it represents a very high level of noise.
delta_1: The level of noise in the treatment group; otherwise the interpretation is the same as above for delta_0.
rho_0: The level of within-group dependence between forecasts of the control group. This is a positive value, with higher values indicating higher levels of dependence. The dependence can be interpreted to stem from shared irrelevant (noise) and/or relevant (information) signals.
rho_1: The level of within-group dependence between forecasts of the treatment group; otherwise the interpretation is the same as above for rho_0.
rho_01: The level of inter-group dependence between forecasts of the control and treatment groups; otherwise the interpretation is the same as above for rho_0.
Posterior Inferences: Posterior probabilities of events. Compared to the control group, does the treatment group have: (i) less bias, (ii) more information, and (iii) less noise? Intuitively, one can think of these probabilities as the Bayesian analogs of the p-values in classical hypothesis testing – the closer the probability is to 1, the stronger the evidence for the hypothesis.
Control,Treatment: This compares the control group against the treatment group. The value of the contribution gives
the mean Brier score of the control group;
the mean Brier score of the treatment group;
how the difference can be explained in terms of bias, noise, or information; and
in percentage terms, how does the change in bias, noise, or information (from control to treatment group) changes the Brier score.
Control,Perfect Accuracy: This compares the control group against a treatment group with perfect accuracy; otherwise the interpretation is the same as above for 'Control,Treatment.'
## An example with one group # a) Simulate synthetic data: synthetic_data = simulate_data(list(mu_star = -0.8,mu_0 = -0.5,mu_1 = 0.2,gamma_0 = 0.1, gamma_1 = 0.3,rho_0 = 0.05,delta_0 = 0.1,rho_1 = 0.2, delta_1 = 0.3,rho_01 = 0.05),300,100,0) # b) Estimate the BIN-model on the synthetic data: full_bayesian_fit = estimate_BIN(synthetic_data$Outcomes,synthetic_data$Control, warmup = 500, iter = 1000) # c) Analyze the results: complete_summary(full_bayesian_fit) ## An example with two groups # a) Simulate synthetic data: synthetic_data = simulate_data(list(mu_star = -0.8,mu_0 = -0.5,mu_1 = 0.2,gamma_0 = 0.1, gamma_1 = 0.3, rho_0 = 0.05,delta_0 = 0.1, rho_1 = 0.2, delta_1 = 0.3,rho_01 = 0.05), 300,100,100) # b) Estimate the BIN-model on the synthetic data: full_bayesian_fit = estimate_BIN(synthetic_data$Outcomes,synthetic_data$Control, synthetic_data$Treatment, warmup = 500, iter = 1000) # c) Analyze the results: complete_summary(full_bayesian_fit)
## An example with one group # a) Simulate synthetic data: synthetic_data = simulate_data(list(mu_star = -0.8,mu_0 = -0.5,mu_1 = 0.2,gamma_0 = 0.1, gamma_1 = 0.3,rho_0 = 0.05,delta_0 = 0.1,rho_1 = 0.2, delta_1 = 0.3,rho_01 = 0.05),300,100,0) # b) Estimate the BIN-model on the synthetic data: full_bayesian_fit = estimate_BIN(synthetic_data$Outcomes,synthetic_data$Control, warmup = 500, iter = 1000) # c) Analyze the results: complete_summary(full_bayesian_fit) ## An example with two groups # a) Simulate synthetic data: synthetic_data = simulate_data(list(mu_star = -0.8,mu_0 = -0.5,mu_1 = 0.2,gamma_0 = 0.1, gamma_1 = 0.3, rho_0 = 0.05,delta_0 = 0.1, rho_1 = 0.2, delta_1 = 0.3,rho_01 = 0.05), 300,100,100) # b) Estimate the BIN-model on the synthetic data: full_bayesian_fit = estimate_BIN(synthetic_data$Outcomes,synthetic_data$Control, synthetic_data$Treatment, warmup = 500, iter = 1000) # c) Analyze the results: complete_summary(full_bayesian_fit)
This function allows the user to compare two groups (treatment and control) of forecasters in terms of their bias, information, and noise levels. Model estimation is performed with a Markov Chain Monte Carlo (MCMC) approach called Hamiltonian Monte Carlo.
estimate_BIN( Outcomes, Control, Treatment = NULL, initial = list(mu_star = 0, mu_0 = 0, mu_1 = 0, gamma_0 = 0.4, gamma_1 = 0.4, delta_0 = 0.5, rho_0 = 0.27, delta_1 = 0.5, rho_1 = 0.27, rho_01 = 0.1), warmup = 2000, iter = 4000, seed = 1 )
estimate_BIN( Outcomes, Control, Treatment = NULL, initial = list(mu_star = 0, mu_0 = 0, mu_1 = 0, gamma_0 = 0.4, gamma_1 = 0.4, delta_0 = 0.5, rho_0 = 0.27, delta_1 = 0.5, rho_1 = 0.27, rho_01 = 0.1), warmup = 2000, iter = 4000, seed = 1 )
Outcomes |
Vector of binary values indicating the outcome of each event. The j-th entry is equal to 1 if the j-th event occurs and equal to 0 otherwise. |
Control |
List of vectors containing the predictions made for each event by forecasters in the control group. The j-th vector contains predictions for the j-th event. |
Treatment |
(Default: |
initial |
A list containing the initial values for the parameters mu_star,mu_0,mu_1,gamma_0,gamma_1,delta_0,rho_0,delta_1,rho_1,and rho_01.
(Default: |
warmup |
The number of initial iterations used for “burnin.”
These values are not included in the analysis of the model. (Default: |
iter |
Total number of iterations.
Must be larger than warmup. (Default: |
seed |
(Default: |
Model estimation is performed with the statistical programming language called Stan. The return object is a Stan model. This way the user can apply available diagnostics tools in other packages, such as rstan, to analyze the final results.
simulate_data
, complete_summary
## An example with one group # a) Simulate synthetic data: synthetic_data = simulate_data(list(mu_star = -0.8,mu_0 = -0.5,mu_1 = 0.2,gamma_0 = 0.1, gamma_1 = 0.3,rho_0 = 0.05,delta_0 = 0.1,rho_1 = 0.2, delta_1 = 0.3,rho_01 = 0.05),300,100,0) # b) Estimate the BIN-model on the synthetic data: full_bayesian_fit = estimate_BIN(synthetic_data$Outcomes,synthetic_data$Control, warmup = 500, iter = 1000) # c) Analyze the results: complete_summary(full_bayesian_fit) ## An example with two groups # a) Simulate synthetic data: synthetic_data = simulate_data(list(mu_star = -0.8,mu_0 = -0.5,mu_1 = 0.2,gamma_0 = 0.1, gamma_1 = 0.3, rho_0 = 0.05,delta_0 = 0.1, rho_1 = 0.2, delta_1 = 0.3,rho_01 = 0.05), 300,100,100) # b) Estimate the BIN-model on the synthetic data: full_bayesian_fit = estimate_BIN(synthetic_data$Outcomes,synthetic_data$Control, synthetic_data$Treatment, warmup = 500, iter = 1000) # c) Analyze the results: complete_summary(full_bayesian_fit)
## An example with one group # a) Simulate synthetic data: synthetic_data = simulate_data(list(mu_star = -0.8,mu_0 = -0.5,mu_1 = 0.2,gamma_0 = 0.1, gamma_1 = 0.3,rho_0 = 0.05,delta_0 = 0.1,rho_1 = 0.2, delta_1 = 0.3,rho_01 = 0.05),300,100,0) # b) Estimate the BIN-model on the synthetic data: full_bayesian_fit = estimate_BIN(synthetic_data$Outcomes,synthetic_data$Control, warmup = 500, iter = 1000) # c) Analyze the results: complete_summary(full_bayesian_fit) ## An example with two groups # a) Simulate synthetic data: synthetic_data = simulate_data(list(mu_star = -0.8,mu_0 = -0.5,mu_1 = 0.2,gamma_0 = 0.1, gamma_1 = 0.3, rho_0 = 0.05,delta_0 = 0.1, rho_1 = 0.2, delta_1 = 0.3,rho_01 = 0.05), 300,100,100) # b) Estimate the BIN-model on the synthetic data: full_bayesian_fit = estimate_BIN(synthetic_data$Outcomes,synthetic_data$Control, synthetic_data$Treatment, warmup = 500, iter = 1000) # c) Analyze the results: complete_summary(full_bayesian_fit)
This function allows the user to generate synthetic data of two groups (control and treatment) of forecasters making probability predictions of binary events. The function is mostly useful for testing and illustration purposes.
simulate_data(parameters, N, N_0, N_1, rho_o = 0)
simulate_data(parameters, N, N_0, N_1, rho_o = 0)
parameters |
A list containing the true values of the parameters: mu_star,mu_0,mu_1,gamma_0,gamma_1,rho_0,delta_0,rho_1,delta_1 and rho_01 |
N |
Number of events |
N_0 |
Number of forecasters in the control group |
N_1 |
Number of forecasters in the treatment group |
rho_o |
The level of dependence between event outcomes. (Default: the events are independent conditional on the model parameter values. This sets |
See complete_summary
for a description of the model parameters.
Not all combinations of parameters are possible.
In particular, the covariance parameters gamma and rho are dependent on each other and must result in a positive semi-definite covariance matrix for the outcomes and predictions.
To find a feasible set of parameters, we recommend users to experiment: begin with the desired levels of mu, gamma, and delta, and values of rho close to zero, and then increase rho until data can be generated without errors.
List containing the simulated data. The elements of the list are as follows.
Outcomes: Vector containing binary values that indicate the outcome of each event. The j-th entry is equal to 1 if the j-th event occurs and equal to 0 otherwise.
Control: List of vectors (one for each event) containing probability predictions made by the forecasters in the control group.
Treatment: List of vectors (one for each event) containing probability predictions made by the forecasters in the treatment group.
estimate_BIN
, complete_summary
simulate_data(list(mu_star = -0.8,mu_0 = -0.5,mu_1 = 0.2,gamma_0 = 0.1,gamma_1 = 0.3, rho_0 = 0.05,delta_0 = 0.1,rho_1 = 0.2, delta_1 = 0.3,rho_01 = 0.05), 300,100,100)
simulate_data(list(mu_star = -0.8,mu_0 = -0.5,mu_1 = 0.2,gamma_0 = 0.1,gamma_1 = 0.3, rho_0 = 0.05,delta_0 = 0.1,rho_1 = 0.2, delta_1 = 0.3,rho_01 = 0.05), 300,100,100)