hypothesis: Non-linear hypothesis testing

Description.

  • Summary statistics of the posterior distributions related to the hypotheses.

Run the code above in your browser using DataCamp Workspace

An Introduction to brms

Using brms for parameter estimation: A walkthrough

I am using the native pipe operator, which is new to R 4.10. This pipe operator is written as a | followed by a > . In this document, the operator is printed as |> , due to the fact that I am using font ligatures. If the pipe doesn’t work for you, simply replace it with the older pipe %>% .

In this post, I’ll show how to use brms to infer the means of two independent normally distributed samples. I’ll try to follow the steps illustrated in the previous post on a principled Bayesian workflow .

Generate data

First, we’ll generate two independent normally distributed samples. These will correspond to two levels of a grouping variable, so let’s call them group A and group B.

Group A will have a mean \(\mu_A = 20\) and a standard deviation \(\sigma_A = 2\) , whereas group B have have the parameters \(\mu_B = 16\) and \(\sigma_B = 1.5\) .

We now draw 10 observations for each group.

Since we know the true values that generated the data, we know whether we will be able to successfully recover them. Of course, the sample means and standard deviations will differ slightly from the true values.

Probabilistic model

We assume the data are conditionally normally distributed

\[ y_i \sim \mathcal{N}(\mu_{[j]}, \sigma_{[j]}) \] \[ \text{for J = 1, 2} \]

We will initially assume that the two groups have equal standard deviations (SD), so that we need only estimate one common SD parameter. We therefore need to estimate three parameters in total, \(\mu_a, \mu_b, \sigma\) (you can allow both mean and standard deviation to vary in assignment 1 ).

Linear model

Using a linear model, we have several possibilities for choosing our contrast coding. We will use treatment coding, in which we choose one of the groups as reference category. This will be represented by the intercept. The other group will not be estimated directly. Instead, the second parameter will represent the difference between this group and the reference category.

We can check the levels of the grouping variable. The first levels will be chose as the reference group.

Another possibility is to omit the intercept, and then just estimate both group means independently.

For the first approach, we use the R formula

For the second parameterization, we write

Prior distributions

We can check which for which parameters we need to set priors, and what the default priors are, using the get_prior() function.

The output doesn’t look very appealing, so we can show just the first four columns:

The three parameters are groupB , represents the difference between group B and the reference category, Intercept , which represents group A, and sigma , the common standard deviation.

Both Intercept and sigma are given Student-t priors. The first parameter of this distribution can be considered as a “normality” parameter—the higher this is, the more normal the distribution looks. The prior on the intercept has a mean of 16.9, which is based on the median of the response variable ( median(d$score) ) and a standard devation of 2.5. The default priors are guesses to ensure that the posterior is in the raight range, while making it unlikely that the prior biases the inferences.

Something that is not apparent is that the prior on sigma is actually a folded Student-t distribution—this means that the distribution is folded in half, because the parameter sigma is constrained to be positive (a standard deviation must \(>0\) .

The prior on the groupB parameter is flat. This is basically never a good idea—you should always choose your own prior, instead of using the default flat prior.

For the second parameterization, we get

Here, we get the same statndard deviation parameter, but instead of an intercept we get two parameters, one for each level of the grouping variable. Both have flat priors.

One important difference between the two is that for the second parameterization, both levels are treated in the same manner, whereas for the first approach, the reference get a prior, and the non-reference category is coded as Intercept + groupB . There the mean of group B will be estimated with more uncertainty that that of group A. While this makes sense for hypothesis testing, for estimation this is questionable. McElreath ( 2020 ) generally recommends the second approach.

We will ignore McElreath’s advice for now, and estimate mean of group B as Intercept + groupB .

Since we already know from the summary above that the difference between groups cannot tbe very large, we set a normal(0, 4) on the group difference. This expresses the belief that we are about 95% certain that the parameter will lie between \(-8\) and \(8\)

We can use the brms function prior() to do this.

The priors on the intercept and and group difference look like this:

hypothesis testing in brm

Prior predictive distribution

In order to get the prior predictive distribution, we can first sample from the prior distributions using the sample_prior argument set to "only" . If we do this, we are running the same model that we will later use to obtain the posterior distribution, but we are ignoring the data.

This model will do three things: 1) provide prior distributions of the parameters, 2) provide distributions of the conditional means, i.e. the values of the linear predictor and 3) provide samples from the prior predictive distribution.

We can visualize the distribution of parameter values that our model expects using the mcmc_plots() function.

hypothesis testing in brm

These distributions just reflect the prior distributions, i.e. they are sampled from each parameter’s prior distribution. It is very helpful, though, to plot the conditional means, i.e. the expected means conditioned on group membership.

hypothesis testing in brm

Both groups are expected to have similar means, because that is what we expressed with our prior distribution on the group difference.

Prior predictive checks

We can then add additional variance by incorporating the residual error. This can be achieved by using the posterior_predict() function and then processing the output; however, it is often far simpler to use the built-in function pp_check() (the pp stand for posterior predictive). This function cab perform a variety of posterior predictive checks; here we are simply plotting the density of the data ( \(y\) ) along with densitites obtained from generated data ( \(y_{rep}\) ).

If we sample from the posterior, then pp_check() performs posterior predictive checks. If we sample from the prior only, then pp_check() performs prior predictive checks.

This plot can give us a good idea of what kind of data our model expects, and we can compare those to the actual data obtained

hypothesis testing in brm

We can also group by our grouping variable to compare the generated data separately by group.

hypothesis testing in brm

Posterior inference

If we are happy with our model, we can sample from the posterior, using the same model from above, but ommitting the sample_prior argument. As above, brms generated Stan code, which is then compiled to C++. Once the model is compiled, Stan runs 4 independent Markov chains, each of which will explore the posterior distribution. The number of chains can be specified, but it is rarely necesarry to change the default setting of 4.

It is a good idea to use as many cores as possible. Modern computers have multi-core processors. This means that Stan will make use of as many cores as it can, and run the chains in parallel. This will result in a huge speed-up. You can use the argument cores = parallel::detectCores() inside brm() to set this. It advisable to set this in the R options, so that you do have to do this every time you call brm() .

Before we look at the parameter estimates, it essential to check that the 4 chains have converged, i.e. that they are sampling from the same posterior. Calling the plot() method on the fitted object will plot traceplots (on the right of the plot), which are the estimates (on the y axis) plotted against the sample number.

hypothesis testing in brm

Another way of getting these is with the function mcmc_trace() from the bayesplot package.

hypothesis testing in brm

The plots for each parameter show the 4 chains (in different shades of blue). They should not be easily distinguishable from each other, and should resemble a “fat hairy caterpillar.”

Apart from visual inspection, we can check for convergence of the chains by looking at the Rhat values in the output. There is one for each estimated parameter, and these values should be approximately \(1.0\) , and \(> 1.05\) . The Rhat statistic measures the ratio of the average variance of samples within each chain to the variance of the pooled samples across chains. If these values are \(1.0\) , this means that the chains have mixed well and are sampling from the same posterior.

We can now look at the estimated parameters. Here, we get Population-Level Effects and Family Specific Parameters . The population-level effects are our intercept and group difference, the fFamily-specific parameter is the residual standard deviation. For each parameter we are shown the mean of the posterior distribution ( Estimate ), the standard deviation of the posterior distribution ( Est.Error ) as well as two-sided 95% credible intervals(l-95% CI and u-95% CI) based on quantiles. The Bulk and Tail ESS (expected sample size) are estimates of how many independent draws would contain the same amount of information as the correlated draws of the posterior (Markov chains obtain correlated draws).

Parameter estimates

The three parameters are Intercept , groupB and sigma . The latter represents the stndard deviation, which according to our model is not allowd to vary between groups (our model is thus mis-specified, as we know that sigma differs between groups.) The posterior is mean is 1.7, with a 95% CI of [1.24, 2.4]. We are thus 95% certain the the standard deviation lies within that interval.

Intercept and groupB reprsent the expected mean of the reference group, which is A in this case, and the difference between groups, respectively.

The intercept has a mean of 18.99, with a 95% CI of [17.94, 20.02], and the difference between groups has a mean of -3.24 with a 95% CI of [-4.68, 1.72].

These are merely summaries of the posterior distributions. It is also very important to look at the full posterior distributions. These can be plotted with the function mcmc_plot() .

hypothesis testing in brm

We can also choose which parameters to plot:

hypothesis testing in brm

Conditional means

A simple way to obtain the predicted conditional means is to use the the add_fitted_draws() from the tidybayes package.

This requires a grid of values for which we want the conditional means. In the case we use the data_grid() function from the modelr package to create this.

We can the use add_fitted_draws() to obtain the values of the linear predictor, which in this case will be either the Intercept for group A, or Intercept + groupB for group B.

These can then be plotted using the stat_pointinterval() function, which takes a .width argument to specify the width of the credible interval.

hypothesis testing in brm

Posterior predictive check

Similarly to above, we can use pp_check() , which will now perform psterior predictive check (because we have sampled from the posterior).

hypothesis testing in brm

It is apparent the while our model can adequately represent the shape of the data, the predictions vary quite a lot, which is due to there not being enough data (this is only a toy model, after all).

Using the functions from the tidybayes package, we can plot the conditional exptected means, the posterior predictions for the data along with the actual data, all in one plot.

hypothesis testing in brm

The blue band shows the posterior predictive density for new data (what data does our model predict?), the black dots within the blue bands show the actual data points, and the intervals underneath show the expected conditional means (values of the linear predictor).

Corrections

If you see mistakes or want to suggest changes, please create an issue on the source repository.

Text and figures are licensed under Creative Commons Attribution CC BY 4.0 . Source code is available at https://github.com/awellis/learnmultilevelmodels , unless otherwise noted. The figures that have been reused from other sources don't fall under this license and can be recognized by a note in their caption: "Figure from ...".

For attribution, please cite this work as

BibTeX citation

How to calculate contrasts from a fitted brms model

Answer more questions with your estimated parameters, without refitting the model.

Tilburg University

brms (Bayesian Regression Models using Stan) is an R package that allows fitting complex (multilevel, multivariate, mixture, …) statistical models with straightforward R modeling syntax, while using Stan for bayesian inference under the hood. You will find many uses of that package on this blog. I am particularly fond of brms’ helper functions for post-processing (visualizing, summarizing, etc) the fitted models. In this post, I will show how to calculate and visualize arbitrary contrasts (aka “(general linear) hypothesis tests”) with brms, with full uncertainty estimates.

Models and contrasts

Here, we will discuss linear models, which regress an outcome variable on a weighted combination of predictors, while allowing the weights to vary across individuals (hierarchical linear regression). After fitting the model, you will have estimates of the weights (“beta weights”, or simply regression parameters) that typically consist of an intercept (estimated level of outcome variable when all predictors are zero) and slopes, which indicate how the outcome variable changes as function of one-unit changes of the predictors, when other predictors are at 0.

However, we are often interested in further questions (contrasts, “general linear hypothesis tests”). For example, your model output may report one group’s change over time, and the difference of that slope between groups, but you are particularly interested in the other group’s slope. To find that slope, you’d need to calculate an additional contrast from your model. This is also commonly called “probing interactions” or sometimes “post hoc testing”.

Example data

To make this concrete, let’s consider a hypothetical example data set from Bolger and Laurenceau (2013) : Two groups’ ( treatment : 0/1) self-reported intimacy was tracked over 16 days ( time ). The dataset contains data from a total of 50 (simulated) individuals.

We might be interested in how the two groups’ feelings of intimacy developed over time, and how their temporal trajectories of intimacy differed. To be more specific, we have three questions:

Q1: How did intimacy develop over time for group 0? Q2: How did intimacy develop over time for group 1? Q3: How different were these two time-courses?

To answer, we model intimacy as a function of time, treatment, and their interactions. The hierarchical model includes varying intercepts and effects of time across participants.

Interpreting the model’s parameters

Let’s then answer our questions by looking at the model’s summary, and interpreting the estimated population-level parameters (the posterior means and standard deviations).

The first lesson is that most models are simply too complex to interpret by just looking at the numerical parameter estimates. Therefore, we always draw figures to help us interpret what the model thinks is going on. The figure below shows example participants’ data (left) and the model’s estimated effects on the right.

hypothesis testing in brm

Then, we can begin interpreting the parameters. First, the intercept indicates estimated intimacy when time and treatment were at their respective baseline levels (0). It is always easiest to interpret the parameters by eyeballing the right panel of the figure above and trying to connect the numbers to the figure. This estimate is the left-most point of the red line.

The estimated time parameter describes the slope of the red line (Q1); treatment1 is the difference between the two lines at time zero (Q3). However, we cannot immediately answer Q2 from the parameters, although we can see that the slope of the blue line is about 0.05 + 0.06. To get the answer to Q2, or more generally, any contrast or “general linear hypothesis test” from a brms model, we can use the hypothesis() method.

hypothesis()

hypothesis() truly is an underappreciated method of the brms package. It can be very useful in probing complex models. It allows us to calculate, visualize, and summarize, with full uncertainty estimates, any transformation of the model’s parameters. These transformations are often called “contrasts” or “general linear hypothesis tests”. But really, they are just transformations of the joint posterior distribution of the model’s parameters.

To answer Q2, then, we encode our question into a combination of the models parameters:

The slope of group 1 is calculated from the model’s parameters by adding the slope of group 0 ( time ) and the interaction term time:treatment1 . = 0 indicates that we are interested in contrasting the resulting estimate the zero (“testing against zero” or even “testing the null hypothesis”). Then, we pass this named string to hypothesis() , and observe the results.

The output indicates that the estimated answer to Question 2 is 0.11 with a standard error of 0.02. I will return to Evid.Ratio and Post.Prob shortly.

The results can also be visualized.

hypothesis testing in brm

That figure shows the (samples from the) posterior distribution of the answer to Question 2.

More contrasts

With hypothesis() you can answer many additional questions about your model, beyond the parameter estimates. To illustrate, say we are interested in the groups’ difference in intimacy at the end of the study (day 15; Question 4). (The difference at time 0 is reported by the group parameter.)

Directional hypotheses and posterior probabilities

We can also ask for directional questions. For example, what is the probability that group 0’s slope is greater than 0 (Q5)?

hypothesis testing in brm

We can now return to Evid.Ratio and Post.Prob : The latter indicates the posterior probability that the parameter of interest is greater than zero ( > 0 ). (More accurately, the proportion of samples from the posterior that are greater than zero.) That should correspond to what you see in the figure above. The former is the ratio of the hypothesis and its complement (the ratio of time > 0 and time < 0 ). I find posterior probabilities more intuitive than evidence ratios, but they both return essentially the same information. Perhaps of interest, with uniform priors, posterior probabilities will exactly correspond (numerically, not conceptually) to frequentist one-sided p-values ( Marsman & Wagenmakers, 2017 ).

Multiple hypotheses

You can evaluate multiple hypotheses in one function call:

Hierarchical hypotheses

Up to this point, we have “tested” the model’s population level effects. (Parameters for the average person. “Fixed effects.”) Because we fit a hierarchical model with varying intercepts and slopes of time, we can also test the individual specific parameters. For example, we can look at every individual’s estimated intercept (intimacy at time 0):

In the above, we asked for the results of the hypothesis test, split by group id (which is the grouping factor in our hierarchical model), and indicated coef as the scope. The latter means that the estimates are the subject-specific deviations with the fixed effect added, as opposed to ranef , which are zero-centered.

The results of this question would be a bit too much information to print on screen, so instead we will draw a figure:

hypothesis testing in brm

When you find that you have a brms model whose parameters don’t quite answer your questions, hypothesis() will probably give you the answer. For more advanced post-processing of your models, I recommend taking a look at the tidybayes package.

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Hypothesis Testing | A Step-by-Step Guide with Easy Examples

Published on November 8, 2019 by Rebecca Bevans . Revised on June 22, 2023.

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics . It is most often used by scientists to test specific predictions, called hypotheses, that arise from theories.

There are 5 main steps in hypothesis testing:

  • State your research hypothesis as a null hypothesis and alternate hypothesis (H o ) and (H a  or H 1 ).
  • Collect data in a way designed to test the hypothesis.
  • Perform an appropriate statistical test .
  • Decide whether to reject or fail to reject your null hypothesis.
  • Present the findings in your results and discussion section.

Though the specific details might vary, the procedure you will use when testing a hypothesis will always follow some version of these steps.

Table of contents

Step 1: state your null and alternate hypothesis, step 2: collect data, step 3: perform a statistical test, step 4: decide whether to reject or fail to reject your null hypothesis, step 5: present your findings, other interesting articles, frequently asked questions about hypothesis testing.

After developing your initial research hypothesis (the prediction that you want to investigate), it is important to restate it as a null (H o ) and alternate (H a ) hypothesis so that you can test it mathematically.

The alternate hypothesis is usually your initial hypothesis that predicts a relationship between variables. The null hypothesis is a prediction of no relationship between the variables you are interested in.

  • H 0 : Men are, on average, not taller than women. H a : Men are, on average, taller than women.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

hypothesis testing in brm

For a statistical test to be valid , it is important to perform sampling and collect data in a way that is designed to test your hypothesis. If your data are not representative, then you cannot make statistical inferences about the population you are interested in.

There are a variety of statistical tests available, but they are all based on the comparison of within-group variance (how spread out the data is within a category) versus between-group variance (how different the categories are from one another).

If the between-group variance is large enough that there is little or no overlap between groups, then your statistical test will reflect that by showing a low p -value . This means it is unlikely that the differences between these groups came about by chance.

Alternatively, if there is high within-group variance and low between-group variance, then your statistical test will reflect that with a high p -value. This means it is likely that any difference you measure between groups is due to chance.

Your choice of statistical test will be based on the type of variables and the level of measurement of your collected data .

  • an estimate of the difference in average height between the two groups.
  • a p -value showing how likely you are to see this difference if the null hypothesis of no difference is true.

Based on the outcome of your statistical test, you will have to decide whether to reject or fail to reject your null hypothesis.

In most cases you will use the p -value generated by your statistical test to guide your decision. And in most cases, your predetermined level of significance for rejecting the null hypothesis will be 0.05 – that is, when there is a less than 5% chance that you would see these results if the null hypothesis were true.

In some cases, researchers choose a more conservative level of significance, such as 0.01 (1%). This minimizes the risk of incorrectly rejecting the null hypothesis ( Type I error ).

The results of hypothesis testing will be presented in the results and discussion sections of your research paper , dissertation or thesis .

In the results section you should give a brief summary of the data and a summary of the results of your statistical test (for example, the estimated difference between group means and associated p -value). In the discussion , you can discuss whether your initial hypothesis was supported by your results or not.

In the formal language of hypothesis testing, we talk about rejecting or failing to reject the null hypothesis. You will probably be asked to do this in your statistics assignments.

However, when presenting research results in academic papers we rarely talk this way. Instead, we go back to our alternate hypothesis (in this case, the hypothesis that men are on average taller than women) and state whether the result of our test did or did not support the alternate hypothesis.

If your null hypothesis was rejected, this result is interpreted as “supported the alternate hypothesis.”

These are superficial differences; you can see that they mean the same thing.

You might notice that we don’t say that we reject or fail to reject the alternate hypothesis . This is because hypothesis testing is not designed to prove or disprove anything. It is only designed to test whether a pattern we measure could have arisen spuriously, or by chance.

If we reject the null hypothesis based on our research (i.e., we find that it is unlikely that the pattern arose by chance), then we can say our test lends support to our hypothesis . But if the pattern does not pass our decision rule, meaning that it could have arisen by chance, then we say the test is inconsistent with our hypothesis .

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Normal distribution
  • Descriptive statistics
  • Measures of central tendency
  • Correlation coefficient

Methodology

  • Cluster sampling
  • Stratified sampling
  • Types of interviews
  • Cohort study
  • Thematic analysis

Research bias

  • Implicit bias
  • Cognitive bias
  • Survivorship bias
  • Availability heuristic
  • Nonresponse bias
  • Regression to the mean

Hypothesis testing is a formal procedure for investigating our ideas about the world using statistics. It is used by scientists to test specific predictions, called hypotheses , by calculating how likely it is that a pattern or relationship between variables could have arisen by chance.

A hypothesis states your predictions about what your research will find. It is a tentative answer to your research question that has not yet been tested. For some research projects, you might have to write several hypotheses that address different aspects of your research question.

A hypothesis is not just a guess — it should be based on existing theories and knowledge. It also has to be testable, which means you can support or refute it through scientific research methods (such as experiments, observations and statistical analysis of data).

Null and alternative hypotheses are used in statistical hypothesis testing . The null hypothesis of a test always predicts no effect or no relationship between variables, while the alternative hypothesis states your research prediction of an effect or relationship.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 22). Hypothesis Testing | A Step-by-Step Guide with Easy Examples. Scribbr. Retrieved April 1, 2024, from https://www.scribbr.com/statistics/hypothesis-testing/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, choosing the right statistical test | types & examples, understanding p values | definition and examples, what is your plagiarism score.

Rens van de Schoot

Generalised Linear Models with brms

Intro to bayesian (multilevel) generalised linear models (glm) in r with brms, qixiang fang and rens van de schoot, last modified: date: 14 october 2019.

This tutorial provides an introduction to Bayesian GLM (genearlised linear models) with non-informative priors using the brms package in R. If you have not followed the Intro to Frequentist (Multilevel) Generalised Linear Models (GLM) in R with glm and lme4 tutorial, we highly recommend that you do so, because it offers more extensive information about GLM. If you are not familar with Bayesian inference, we also recommend that you read this tutorial Building a Multilevel Model in BRMS Tutorial: Popularity Data prior to using this tutorial.

The current tutorial specifically focuses on the use of Bayesian logistic regression in both binary-outcome and count/porportion-outcome scenarios, and the respective approaches to model evaluation. The tutorial uses the Thai Educational Data example in Chapter 6 of the book Multilevel analysis: Techniques and applications . Furthermore, the tutorial briefly demonstrates the multilevel extension of Bayesian GLM models.

This tutorial follows this structure: 1. Preparation; 2. Introduction to GLM; 3. Thai Educational Data; 4. Data Preparation; 5. Bayesian Binary (Bernoulli) Logistic Regression; 6. Bayesian Binomial Logistic Regression; 7. Bayesian Multilevel Logistic Regression.

Note that this tutorial is meant for beginners and therefore does not delve into technical details and complex models. For a detailed introduction into frequentist multilevel models, see this LME4 Tutorial . For an extensive overview of GLM models, see here . If you want to use the Bayesian approach for your own research, we recommend that you follow the WAMBS-checklist .

1. Preparation

This tutorial expects: – Installation of R packages brms for Bayesian (multilevel) generalised linear models (this tutorial uses version 2.9.0). Because of some special dependencies, for brms to work, you still need to install a couple of other things. See this tutorial on how to install brms . Note that currently brms only works with R 3.5.3 or an earlier version; – Installation of R package tidyverse for data manipulation and plotting with ggplot2; – Installation of R package haven for reading sav format data; – Installation of R package ROCR for calculating area under the curve (AUC); – Installation of R package sjstats for calculating intra-class correlation (ICC). Remember to install version 0.17.5 (using the command install_version("sjstats", version = "0.17.5") after loading the package devtools , because the latest version of sjstats does not support the ICC function anymore); – Installation of R package modelr for data manipulation; – Installation of R package tidybayes for extraction, manipulation, and visualisation of posterior draws from Bayesian models; – Basic knowledge of hypothesis testing and statistical inference; – Basic knowledge of Bayesian statistical inference; – Basic knowledge of coding in R; – Basic knowledge of plotting and data manipulation with tidyverse.

2. Introduction to Genearlised Linear Models (GLM)

If you are already familar with generalised linear models (GLM), you can proceed to the next section. Otherwise, click “Read More” to learn about GLM.

Recall that in a linear regression model, the object is to model the expected value of a continuous variable, \(Y\) , as a linear function of the predictor, \(\eta = X\beta\) . The model structure is thus: \(E(Y) = X\beta + e\) , where \(e\) refers to the residual error term. The linear regression model assumes that \(Y\) is continous and comes from a normal distribution, that \(e\) is normally distributed and that the relationship between the linear predictor \(\eta\) and the expected outcome \(E(Y)\) is strictly linear. However, these assumptions are easily violated in many real world data examples, such as those with binary or proportional outcome variables and those with non-linear relationships between the predictors and the outcome variable. In these scenarios where linear regression models are clearly inappropriate, generalised linear models (GLM) are needed.

The GLM is the genearlised version of linear regression that allows for deviations from the assumptions underlying linear regression. The GLM generalises linear regression by assuming the dependent variable \(Y\) to be generated from any particular distribution in an exponential family (a large class of probability distributions that includes the normal, binomial, Poisson and gamma distributions, among others). In this way, the distribution of \(Y\) does not necessarily have to be normal. In addition, the GLM allows the linear predictor \(\eta\) to be connected to the expected value of the outcome variable, \(E(Y)\) , via a link function \(g(.)\) . The outcome variable, \(Y\) , therefore, depends on \(\eta\) through \(E(Y) = g^{-1}(\eta) = g^{-1}(X\beta)\) . In this way, the model does not assume a linear relationship between \(E(Y)\) and \(\eta\) ; instead, the model assumes a linear relationship between \(E(Y)\) and the transformed \(g^{-1}(\eta)\) .

This tutorial focuses on the Bayesian version of the probably most popular example of GLM: logistic regression . Logistic regression has two variants, the well-known binary logistic regression that is used to model binary outcomes (1 or 0; “yes” or “no”), and the less-known binomial logistic regression suited to model count/proportion data.

Binary logistic regression assumes that \(Y\) comes from a Bernoulli distribution, where \(Y\) only takes a value of 1 (target event) or 0 (non-target event). Binary logistic regression connects \(E(Y)\) and \(\eta\) via the logit link \(\eta = logit(\pi) = log(\pi/(1-\pi))\) , where \(\pi\) refers to the probability of the target event ( \(Y = 1\) ).

Binomial logistic regression, in contrast, assumes a binomial distribution underlying \(Y\) , where \(Y\) is interpreted as the number of target events, can take on any non-negative integer value and is binomially distributed with regards to \(n\) number of trials and \(\pi\) probability of the target event. The link function is the same as that of binary logistic regression.

The next section details the exampler data ( Thai Educational Data ) in this tutorial, followed by the demonstration of the use of Bayesian binary, Bayesian binomial logistic regression and Bayesian multilevel binary logistic regression. For the frequentist versions of these models, see the Intro to Frequentist (Multilevel) Generalised Linear Models (GLM) in R with glm and lme4 tutorial.

3. Thai Educational Data

The data used in this tutorial is the Thai Eduational Data that is also used as an example in Chapter 6 of Multilevel analysis: Techniques and applications . The data can be downloaded from here .

The data stems from a national survey of primary education in Thailand (Raudenbush & Bhumirat, 1992). Each row in the data refers to a pupil. The outcome variable REPEAT is a dichotomous variable indicating whether a pupil has repeated a grade during primary education. The SCHOOLID variable indicates the school of a pupil. The person-level predictors include: SEX (0 = female, 1 = male) and PPED (having had preschool education, 0 = no, 1 = yes). The school-level is MSESC , representing school mean SES (socio-economic status) scores.

The main research questions that this tutorial seeks to answer using the Thai Educational Data are:

  • Ignoring the clustering structure of the data, what are the effects of gender and preschool education on whether a pupil repeats a grade?
  • Ignoring the clustering structure of the data, what is the effect of school mean SES on the proportion of pupil repeating a grade?
  • Considering the clustering structure of the data, what are the effects of gender, preschool education and school mean SES on whether a pupil repeats a grade?

These three questions are answered by using these following models, respectively: Bayesian binary logistic regressioin; Bayesian binomial logistic regression; Bayesian multilevel binary logistic regression.

4. Data Preparation

4.1. load necessary packages, 4.2. import data.

Alternatively, you can download the data directly from here and import it locally.

4.3. Data Processing

4.4. inspect missing data.

The data has 1066 observations missing for the MSESC variable. The treatment of missing data is a complicated topic on its own. For the sake of convenience, we simply list-wise delete the cases with missing data in this tutorial.

5. Bayesian Binary Logistic Regression (with Non-Informative Priors)

5.1. explore data: number of repeat by sex and pped.

It seems that the number of pupils who repeated a grade differs quite a bit between the two genders, with more male pupils having to repeat a grade. More pupils who did not have preschool education repeated a grade. This observation suggests that SEX and PPED might be predictive of REPEAT .

5.2. Fit a Bayesian Binary Logistic Regression Model

The brm function from the brms package performs Bayesian GLM. The brm has three basic arguments that are identical to those of the glm function: formula , family and data . However, note that in the family argument, we need to specify bernoulli (rather than binomial ) for a binary logistic regression. The brm function has a few more additional (and necessary) arguments that glm does not offer: warmup specifies the burn-in period (i.e. number of iterations that should be discarded); iter specifies the total number of iterations (including the burn-in iterations); chains specifies the number of chains; inits specifies the starting values of the iterations (normally you can either use the maximum likelihood esimates of the parameters as starting values, or simply ask the algorithm to start with zeros); cores specifies the number of cores used for the algorithm; seed specifies the random seed, allowing for replication of results.

See below the specification of the binary logistic regression model with two predictors, without using informative priors.

5.3. Model Convergence

Before looking at the model summary, we should check whether there is evidence of non-convergence for the two chains. To do so, we can use the stanplot function from the brms package.

First, we plot the caterpillar plot for each parameter of interest.

hypothesis testing in brm

The plot only shows the iterations after the burn-in period. The two chains mix well for all of the parameters and therefore, we can conclude no evidence of non-convergence.

We can also check autocorrelation, considering that the presence of strong autocorrelation would bias variance estimates.

hypothesis testing in brm

The plot shows no evidence of autocorrelation for all model variables in both chains, as the autocorrelation parameters all quickly diminish to around zero.

5.4. Interpretation

Now, we can safely proceed to the interpretation of the model. Below is the model summary of the Bayesian binary logistic regression model.

For comparison, below is the model summary of the frequentist binary logistic regression model.

From the model summary above, we can see that the Bayesian model estimates are almost identical to those of the frequentist model. The interpretation of these estimates are the same in both frequentist and Bayesian models. Nevertheless, note that the interpretation of the uncertainty intervals is not the same between the two models. In the frequentist model, the idea behind using a 95% uncertainty interval (confidence interval) is that, under repeated sampling, 95% of the resulting uncertainy intervals would cover the true population value. That allows us to say that, for a given 95% confidence interval, we are 95% confident that this confidence interval contains the true population value. However, it does not allow us to say that there is a 95% chance that the confidence interval contains the true population value (i.e. frequentist uncertainty intervals are not probability statements). In contrast, in the Bayesian model, the 95% uncertainty interval (called credibility interval), which is more interpretable, states that there is 95% chance that the true population value falls within this interval. When the 95% credibility intervals do not contain zero, we conclude that the respective model parameters are likely meaningful.

Let’s visualise the point estimates and their associated uncertainty intervals, using the stanplot function.

hypothesis testing in brm

The plot above shows the densities of the parameter estimates. The dark blue line in each density represents the point estimate, while the light-blue area indicates the 95% credibility intervals. We can easily see that both SEX and PPED are meaningful predictors, as their credibility intervals do not contain zero and their densities have a very narrow shape. SEX positively predicts a pupil’s probability of repeating a grade, while PPED negatively so. Specifically, in comparison to being a girl, being a boy is more likely to repeat a grade, assuming everything else stays constant. Having previous schooling is less likely to result in repeating a grade, assuming everything else stays constant.

To interpret the value of the parameter estimates, we need to exponentiate the estimates. See below.

We can also plot densities of these parameter estimates. For this, we again use the stanplot function from brms .

hypothesis testing in brm

Note that the interpretation of the parameter estimates is linked to the odds rather than probabilities. The definition of odds is: P(event occurring)/P(event not occurring). In this analysis, assuming everything else stays the same, being a boy increases the odds of repeating a grade by 54%, in comparison to being a girl; having preschool education lowers the odds of repeating a grade by (1 – 0.54)% = 46%, in comparison to not having preschool education, assuming everything else stays constant. The baseline odds (indicated by the intercept term) of repeating a grade, namely if you’re a girl with no previous schooling, is about 17%.

5.4. Visualisation of Parameter Effects

We can plot the marginal effects (i.e. estimated probabilities of repeating a grade) of the variables in the model. Below, we show how different combinations of SEX and PPED result in different probability estimates. The advantage of this approach is that probabilities are more interpretable than odds.

hypothesis testing in brm

As we can see, being a male pupil with no preschool education has the highest probability (~0.21), followed by being a girl with no preschool education (~0.15), being a boy with preschool education (~0.13), and lastly, being a girl with preschool education (~0.09). Note that both 68% (thicker inner lines) and 95% (thinner outer lines) credibility intervals for the estimates are included to give us some idea of the uncertainties of the estimates.

5.5. Model Evaluation

In the Intro to Frequentist (Multilevel) Generalised Linear Models (GLM) in R with glm and lme4 tutorial, we learn that we can use the likelihood ratio test and AIC to assess the goodness of fit of the model(s). However, these two approaches do not apply to Bayesian models. Instead, Bayesian models make use of so-called Posterior Predictive P-values (PPPs) to assess the fit of the model. In addition, many also use Bayes factors to quantify support from the data for the model. This tutorial does not delve into PPPs or Bayes factors because of the complexity of the topics.

The other two measures mentioned in Intro to Frequentist (Multilevel) Generalised Linear Models (GLM) in R with glm and lme4 are correct classification rate and area under the curve (AUC) . They are model-agnostic, meaning they can be applied to both frequentist and Bayesian models.

5.5.1. Correct Classification Rate

The percentage of correct classification is a useful measure to see how well the model fits the data.

We can see that the model correctly classifies 85.8% of all the observations. However, a closer look at the confusion matrix reveals that the model predicts all of the observations to belong to class “0”, meaning that all pupils are predicted not to repeat a grade. Given that the majority category of the REPEAT variable is 0 (No), the model does not perform better in classification than simply assigning all observations to the majority class 0 (No).

5.5.2. AUC (area under the curve).

An alternative to using correct classification rate is the Area under the Curve (AUC) measure. The AUC measures discrimination, that is, the ability of the test to correctly classify those with and without the target response. In the current data, the target response is repeating a grade. We randomly pick one pupil from the “repeating a grade” group and one from the “not repeating a grade” group. The pupil with the higher predicted probability should be the one from the “repeating a grade” group. The AUC is the percentage of randomly drawn pairs for which this is true. This procedure sets AUC apart from the correct classification rate because the AUC is not dependent on the imblance of the proportions of classes in the outcome variable. A value of 0.50 means that the model does not classify better than chance. A good model should have an AUC score much higher than 0.50 (preferably higher than 0.80).

With an AUC score of close to 0.60, the model does not discriminate well.

6. Bayesian Binomial Logistic Regression (with Non-Informative Priors)

As explained in the Intro to Frequentist (Multilevel) Generalised Linear Models (GLM) in R with glm and lme4 tutorial, logistic regression can also be used to model count or proportion data. Binary logistic regression assumes that the outcome variable comes from a bernoulli distribution (which is a special case of binomial distributions) where the number of trial \(n\) is 1 and thus the outcome variable can only be 1 or 0. In contrast, binomial logistic regression assumes that the number of the target events follows a binomial distribution with \(n\) trials and probability \(q\) . In this way, binomial logistic regression allows the outcome variable to take any non-negative integer value and thus is capable of handling count data.

The Thai Educational Data records information about individual pupils that are clustered within schools. By aggregating the number of pupils who repeated a grade by school, we obtain a new data set where each row represents a school, with information about the proportion of pupils repeating a grade in that school. The MSESC (mean SES score) is also on the school level; therefore, it can be used to predict proportion or count of pupils who repeat a grade in a particular school. See below.

6.1. Tranform Data

In this new data set, REPEAT refers to the number of pupils who repeated a grade; TOTAL refers to the total number of students in a particular school.

6.2. Explore data

hypothesis testing in brm

We can see that the proportion of students who repeated a grade is (moderately) negatively related to the inverse-logit of MSESC . Note that we model the variable MSESC as its inverse-logit because in a binomial regression model, we assume a linear relationship between the inverse-logit of the linear predictor and the outcome (i.e. proportion of events), not linearity between the predictor itself and the outcome.

6.3. Fit a Binomial Logistic Regression Model

To fit a Bayesian binomial logistic regression model, we also use the brm function like we did with the previous Bayesian binary logistic regression model. There are, however, two differences: First, to specify the outcome variable in the formula, we need to specify both the number of target events ( REPEAT ) and the total number of trials ( TOTAL ) wrapped in trials() , which are separated by | . In addition, the family should be “binomial” instead of “bernoulli”.

The frequentist model (for comparison):

We can see that the model estimates between the Bayesian and the frequentist binomial logistic regression models are very similar. Note that we skipped the step of checking model convergence, for the sake of keeping this tutorial shorter. You can use the same codes we showed before (with the binary logistic regression model) to check the convergence of this model.

6.4. Interpretation

The parameter interpretation in a binomial regression model is the same as that in a binary logistic regression model. We know from the model summary above that the mean SES score of a school is negatively related to the odds of students repeating a grade in that school. To enhance interpretability, we again calculate the exponentiated coefficient estimate of MSESC . Since MSESC is a continous variable, we can standardise the exponentiated MSESC estimate (by multiplying the original estimate with the SD of the variable, and then then exponentiating the resulting number).

We can see that with a SD increase in MSESC , the odds of students repeating a grade is lowered by about (1 – 85%) = 15%. “Q2.5” and “Q97.5” refer to the lower bound and the upper bound of the uncertainty interval, respectively. This credibility interval does not contain zero, suggesting that the variable is likely meaningful.

We can visualise the effect of MSESC .

hypothesis testing in brm

The plot above shows the expected influence of MSESC on the probability of a pupil repeating a grade. Holding everything else constant, as MSESC increases, the probability of a pupil repeating a grade lowers (from 0.19 to 0.08). The grey shaded areas indicate the 95% credibility intervals of the predicted values at each value of MSESC .

6.5. Model Evaluation

Similar to the Bayesian binary logistic regression model, we can use the PPPS and Bayes factor (which are not discussed in this tutorial) to evaluate the fit of a Bayesian binomial logistic regression model. Correct classification rate and AUC are not suited here, as the model is not concerned with classification.

7. Bayesian Multilevel Binary Logistic Regression (with Non-Informative Priors)

The Bayesian binary logistic regression model introduced earlier is limited to modelling the effects of pupil-level predictors; the Bayesian binomial logistic regression is limited to modelling the effects of school-level predictors. To incorporate both pupil-level and school-level predictors, we can use multilevel models, specifically, Bayesian multilevel binary logistic regression. If you are unfamiliar with multilevel models, you can use Multilevel analysis: Techniques and applications for reference and this tutorial for a good introduction to multilevel models with the lme4 package in R.

In addition to the motivation above, there are more reasons to use multilevel models. For instance, as the data are clustered within schools, it is likely that pupils from the same school are more similar to each other than those from other schools. Because of this, in one school, the probability of a pupil repeating a grade may be high, while in another school, low. Furthermore, even the relationship between the outcome (i.e. repeating a grade) and the predictor variabales (e.g. gender, preschool education, SES) may be different across schools. Also note that there are missing values in the MSESC variable. Using multilevel models can appropriately address these issues.

See the following plot as an example. The plot shows the proportions of students repeating a grade across schools. We can see vast differences across schools. Therefore, we need multilevel models.

hypothesis testing in brm

We can also plot the relationship between SEX and REPEAT by SCHOOLID , to see whether the relationship between gender and repeating a grade differs by school.

hypothesis testing in brm

In the plot above, different colors represent different schools. We can see that the relationship between SEX and REPEAT appears to be quite different across schools.

We can make the same plot for PPED and REPEAT .

hypothesis testing in brm

The relationship between PPED and REPEAT also appears to be quite different across schools. However, we can also see that most of the relationships follow a downward trend, going from 0 (no previous schooling) to 1 (with previous schooling), indicating a negative relationship between PPED and REPEAT .

Because of the observations above, we can conclude that there is a need for multilevel modelling in the current data, with not only a random intercept ( SCHOOLID ) but potentially also random slopes of the SEX and PPED .

7.1. Center Variables

Prior to fitting a multilevel model, it is necessary to center the predictors by using an appropriately chosen centering method (i.e. grand-mean centering or within-cluster centering), because the centering approach matters for the interpretation of the model estimates. Following the advice of Enders and Tofighi (2007), we should use within-cluster centering for the first-level predictors SEX and PPED , and grand-mean centering for the second-level predictor MSESC .

7.2. Intercept Only Model

To specify a multilevel model, we again use the brm function from the brms package. Note that the random effect term should be included in parentheses. In addition, within the parentheses, the random slope term(s) and the cluster terms should be separated by | .

We start by specifying an intercept-only model, in order to assess the impact of the clustering structure of the data. Note that we will skip the step of model convergence diagnostics.

Below we calculate the ICC (intra-class correlation) of the intercept-only model. Note that for non-Gaussian Bayesian models (e.g. logistic regression), we need to set “ppd = T” such that the variance calculation is based on the posterior predictive distribution.

A variance ratio (comparable to ICC) of 0.29 means that 29% of the variation in the outcome variable can be accounted for by the clustering stucture of the data. This provides evidence that a multilevel model may make a difference to the model estimates, in comparison with a non-multilevel model. Therefore, the use of multilevel models is necessary and warrantied.

7.3. Full Model

It is good practice to build a multilevel model step by step. However, as this tutorial’s focus is not on muitilevel modelling, we go directly from the intercept-only model to the full-model that we are ultimately interested in. In the full model, we include not only fixed effect terms of SEX , PPED and MSESC and a random intercept term, but also random slope terms for SEX and PPED . Note that we specify family = bernoulli(link = "logit") , as this model is essentially a binary logistic regression model.

We can plot the densities of the relevant model parameter estimates.

hypothesis testing in brm

The results (pertaining to the fixed effects) are similar to the results of the previous Bayesian binary logistic regression and binomial logistic regression models. On the pupil-level, SEX has a positive influence on the odds of a pupil repeating a grade, while PPED has a negative influence. On the school-level, MSESC has a negative effect on the outcome variable. Among three predictors, SEX and PPED have credibility intervals (indicated by the shaded light blue regions in the densities) that clearly do not contain zero. Therefore, they should be treated as meaningful predictors. In contrast, MSESC , despite having a 95% credibility interval without zero, the upper bound of the credibility interval is very close to zero, and its density only contains zero. Because of this, MSESC is likely a less relevant predictor than SEX and PPED .

Now let’s look at the random effect terms ( sd(Intercept) , sd(SEX) and sd(PPED) ). The density of sd(Intercept) in the plot is clearly away from zero, indicating the relevance of including this random intercept term in the model. The variance of the random slope of SEX is \(0.38^2 = 0.14\) , and that of PPED is \(0.26^2 = 0.07\) . Both variances are not negligible. However, if we look at the density plot, the lower bounds of the credibility intervals of both sd(SEX) and sd(PPED) are very close to zero, and their densities also not clearly separate from zero. This suggests that including these two random slope terms may not be necessary.

We can also plot the random effect terms across schools.

hypothesis testing in brm

Again, we can see that the posterior distributions of the random intercept term ( sd(Intercept) ) have a large variance across schools. Quite a number of them are also away from zero. Therefore, we can conclude that the inclusion of the random intercept is necessary. In comparison, all of the posterior distributions of sd(SEX) and sd(PPED) go through zero, suggesting that there is probably no need to include the two random slopes in the model.

To interpret the fixed-effect terms, we can calculate the exponentiated coefficient estimates.

We can see that the effects of SEX , PPED , and MSESC are very similar to the prevoius model results.

Bürkner, P. (2017). brms: An R Package for Bayesian Multilevel Models Using Stan. Journal of Statistical Software, 80 (1), 1-28. doi:10.18637/jss.v080.i01

Enders, C. K., & Tofighi, D. (2007). Centering predictor variables in cross-sectional multilevel models: A new look at an old issue. Psychological Methods, 12 (2), 121-138. doi:10.1037/1082-989X.12.2.121

Kay, M. (2019). tidybayes: Tidy Data and Geoms for Bayesian Models. doi:10.5281/zenodo.1308151 , R package version 1.1.0, http://mjskay.github.io/tidybayes/ .

Lüdecke, D. (2019). sjstats: Statistical Functions for Regression Models (Version 0.17.5). doi: 10.5281/zenodo.1284472

Raudenbush, S. W., & Bhumirat, C. (1992). The distribution of resources for primary education and its consequences for educational achievement in Thailand. International Journal of Educational Research, 17 (2), 143-164. doi:10.1016/0883-0355(92)90005-Q

Sing, T., Sander, O., Beerenwinkel, N. & Lengauer, T. (2005). ROCR: visualizing classifier performance in R. Bioinformatics, 21 (20), pp. 7881. http://rocr.bioinf.mpi-sb.mpg.de

Wickham, H. (2017). tidyverse: Easily Install and Load the ‘Tidyverse’. R package version 1.2.1. https://CRAN.R-project.org/package=tidyverse

Privacy Overview

How to properly compare interacting levels

I am new to brms , trying to figure out its behaviour in details. I was trying to run essentially an anova-like model with random effects (anova-like in the sense that all IVs are factors). For example,

conditional_effects() and hypothesis() are truly amazing analytic tools. However, when you deal with factors and, in particular, their interactions, that’s a bugger… It’s really hard to help students wrap their head around it. In addition to that, I am a bit stubborn and I really despise when people, for practical purposes, re-level their factors to figure out what contrasts are significant. There is no need for that. In frequentist framework, one could make use of Wald’s test etc. Here, I sense, the two functions can be of much use, but still one need to be careful around the intercept. All other levels are relative change to that reference.

If one have a simple dummy variable, 0 + Intercept + Dummy + ... is of help (another nice one!). But when you have factors with more levels and their interactions, things are far from straightforward.

What would be the brms way of making this less painful? Running another model without intercept and then use hypothesis() ? Something else?

  • Operating System: macOS
  • brms Version: 2.14.4

Hey @striatum , it’s difficult to follow your question. What exactly are you trying to make less painful? It might help if you provide a minimal reproducible example.

Apologies for the lack of clarity. Let me try again, this time with an example. (I am simplifying here, ignoring the possibility to have random effects, some covariates etc.)

This gives the summary table (it could be different in another run, given random number generators):

Now, what causes pains to grasp and work through are various comparisons when levels and combination in the table are relative to the reference: F1==A; F2==I; F3==M .

I’ve found it almost funny (and creative too) when some students approached to this by running comparable models with Intercept removed. Given the example above, something like:

And then they apply hypothesis() . For example:

This is rather clumsy and inelegant, if not altogether wrong. Yet, I see the struggle and I’m running out of ideas on how to help… In addition to this, it appears that what hypothesis() tells you “contradicts” what you observe in the plot from conditional_effects() . In fact, is there a way to get the BF directly from the conditional_effects() table?

This is clearer. Thank you.

A few things:

It is possible to have multiple factor variables where each level of each factor is expressed in its own terms. One way is with the non-linear syntax. I have an example, here , model b5.11 . For more on the non-linear syntax, see Paul’s vignette .

Another way is with the multilevel ANOVA approach. I have examples here and possibly more to your interest here with model fit19.1 and here with fit24.1 .

As to contrasts, I typically do these by computing the relevant difference scores by working directly with the output from posterior_samples() or possibly with fitted() . However, if you use the multilevel ANOVA approach, you can use the compare_levels() function from the tidybayes package (see here ).

As far as the hypothesis() approach and Bayes factors, I’m personally not a fan of those methods. Someone else will have to offer guidance with those.

Thanks so much for this! I am trying to make the model work, but I am getting an error message:

Error: Factors with more than two levels are not allowed as covariates. So, following my example above, this is my actual model:

So, it interaction the problem here? Or?

Consider a simplified version of your model from above. Here we’re modeling the criterion y with two factor variables, F1 and F2 , and their interaction.

I concede there might be more elegant ways to express this. But this works.

Thanks so much! May I ask, please, to give me a template that would also contain some random effects?

I haven’t fit a model with that specification, before.

@Solomon , thanks so much! You’ve been super-patient and helpful. To show my gratitude, and give back a bit from what I’ve learned.

This is one of the more (if not the most) elaborate models we’ve been working on recently. And, now, it works under the above specification/approach:

Apart from four factors (fixed effects), there is one covariate that is the order of trial in the experiment. Since it is scaled (z-transformed) then Gaussian prior with s=1.0 is appropriate.

Thanks again!

Related Topics

An Introduction to Data Analysis

13.6 testing hypotheses.

The brms package also contains a useful function to address hypotheses about model parameters. The function brms::hypothesis can compute Bayes factors for point-valued hypotheses using the Savage-Dickey method. It also computes a binary test of whether a point-valued hypothesis is credible based on inclusion in a Bayesian credible interval. For interval-valued hypotheses \(\theta \in [a;b]\) , the function brms::hypothesis computes the posterior odds (called evidence ratio in the context of this function): 63 \[ \frac{P(\theta \in [a;b] \mid D)}{P(\theta \not \in [a;b] \mid D)} \]

Computing Bayes factors for point-valued hypotheses with brms::hypothesis requires proper priors for all parameters that are part of the hypothesis. It also requires taking samples from the priors of parameters. 64 So, here is a function call of brms:brm which (i) specifies a reasonably unconstrained but proper parameter for the slope coefficient for year and (ii) also collects samples from the prior (by setting the option sample_prior = "yes" ):

Before addressing hypotheses about the slope parameter for year , let’s remind ourselves of the summary statistics for the posterior:

The main “research hypothesis” of interest is whether the slope for year is credibly positive. This is an interval-valued hypothesis and we can test it like so:

The table shows the estimate for the slope of year , together with an estimated error, lower and upper bounds of a credible interval (95% by default). All of these numbers are rounded. It also shows the “Evidence ratio” which, for an interval-valued hypothesis is not the Bayes factor, but the posterior odds (see above). In the present case, an evidence ratio of Inf means that all posterior samples for the slope coefficient were positive. This is also expressed in the posterior probability (“Post.Prod” in the table) for the proposition that the interval-valued hypothesis is true (given data and model).

The following tests a point-valued hypothesis:

For this point-valued hypothesis, the estimate (and associated error and credible interval) are calculated as a comparison against 0, as shown in the “Hypothesis” column. The evidence ratio given in the results table is the Bayes factor of the point-valued hypothesis against the embedding model (the full regression model with the prior we specified), as calculated by the Savage-Dickey method. As before, the posterior probability is also shown. The “Star” in this table indicates that the point-valued hypothesis is excluded from the computed credible interval, so that - if we adopted the (controversial) binary decision logic discussed in Chapter 11 - we would reject the tested hypothesis.

Notice that for priors where \(P(\theta \in [a;b]) = 0.5\) , the posterior odds equal the Bayes factor. For other priors, we’d need to correct the posterior odds by the priors to obtain Bayes factors, something that the brms package does not (presently seem to) do, unfortunately. ↩︎

It may seem unnecessary to take prior samples for parameters, because, after all, couldn’t we just look at the (closed-form) definition of the prior for that parameter? Well, that only works for top-level parameters, but not parameters in a hierarchical model which depend on the value of other parameters and which therefore have no (easily accessible) closed-form prior to look up. ↩︎

Statistical Rethinking with brms, ggplot2, and the tidyverse

7 interactions.

Every model so far in [McElreath’s text] has assumed that each predictor has an independent association with the mean of the outcome. What if we want to allow the association to be conditional?… To model deeper conditionality—where the importance of one predictor depends upon another predictor—we need interaction. Interaction is a kind of conditioning, a way of allowing parameters (really their posterior distributions) to be conditional on further aspects of the data. (p. 210)

7.1 Building an interaction.

“Africa is special” (p. 211). Let’s load the rugged data to see one of the reasons why.

And here we switch out rethinking for brms.

We’ll continue to use tidyverse-style syntax to wrangle the data.

The first two models predicting log_gdp are univariable.

In the text, McElreath more or less dared us to figure out how to make Figure 7.2. Here’s the brms-relevant data wrangling.

For this chapter, we’ll take our plot theme from the ggthemes package .

Here’s the plot code for our version of Figure 7.2.

hypothesis testing in brm

7.1.1 Adding a dummy variable doesn’t work.

Here’s our model with all the countries, but without the cont_africa dummy.

Now we’ll add the dummy.

Using the skills from Chapter 6, let’s compute the information criteria for the two models. Note how with the add_criterion() function, you can compute both the LOO and the WAIC at once.

Here we’ll compare the models with the loo_compare() function, first by the WAIC and then by the LOO.

Happily, the WAIC and the LOO are in agreement. The model with the dummy, b7.4 , fit the data much better. Here are the WAIC model weights.

As in the text, almost all the weight went to the multivariable model, b7.4 . Before we can plot that model, we need to wrangle a bit.

Behold our Figure 7.3.

hypothesis testing in brm

7.1.2 Adding a linear interaction does work.

Yes, it sure does. But before we fit, here’s the equation:

Fit the model.

For kicks, we’ll just use the LOO to compare the last three models.

And recall, if we want those LOO difference scores in the traditional metric like McElreath displayed in the text, we can do a quick conversion with algebra and cbind() .

And we can weight the models based on the LOO rather than the WAIC, too.

7.1.2.1 Overthinking: Conventional form of interaction.

The conventional equation for the interaction model might look like:

Instead of the y ~ 1 + x1*x2 approach, which will work fine with brm() , you can use this more explicit syntax.

From here on, I will default to this style of syntax for interactions.

Since this is the same model, it yields the same information criteria estimates within simulation error. Here we’ll confirm that with the LOO.

7.1.3 Plotting the interaction.

Here’s our prep work for the figure.

And here’s the code for our version of Figure 7.4.

hypothesis testing in brm

7.1.4 Interpreting an interaction estimate.

Interpreting interaction estimates is tricky. It’s trickier than interpreting ordinary estimates. And for this reason, I usually advise against trying to understand an interaction from tables of numbers along. Plotting implied predictions does far more for both our own understanding and for our audience’s. (p. 219)

7.1.4.1 Parameters change meaning.

In a simple linear regression with no interactions, each coefficient says how much the average outcome, \(\mu\) , changes when the predictor changes by one unit. And since all of the parameters have independent influences on the outcome, there’s no trouble in interpreting each parameter separately. Each slope parameter gives us a direct measure of each predictor variable’s influence. Interaction models ruin this paradise. (p. 220)

Return the parameter estimates.

“Since \(\gamma\) (gamma) doesn’t appear in this table—it wasn’t estimated—we have to compute it ourselves” (p. 221). Like in the text, we’ll do so first by working with the point estimates.

7.1.4.2 Incorporating uncertainty.

To get some idea of the uncertainty around those \(\gamma\) values, we’ll need to use the whole posterior. Since \(\gamma\) depends upon parameters, and those parameters have a posterior distribution, \(\gamma\) must also have a posterior distribution. Read the previous sentence again a few times. It’s one of the most important concepts in processing Bayesian model fits. Anything calculated using parameters has a distribution. (p. 212)

Like McElreath, we’ll avoid integral calcus in favor of working with the posterior_samples() .

And here is our version of Figure 7.5.

hypothesis testing in brm

What proportion of these differences is below zero?

7.2 Symmetry of the linear interaction.

Consider for example the GDP and terrain ruggedness problem. The interaction there has two equally valid phrasings. How much does the influence of ruggedness (on GDP) depend upon whether the nation is in Africa? How much does the influence of being in Africa (on GDP) depend upon ruggedness? While these two possibilities sound different to most humans, your golem thinks they are identical. (p. 223)

7.2.1 Buridan’s interaction.

Recall the original equation.

Next McElreath replaced \(\gamma_i\) with the expression for \(\mu_i\) .

And new we’ll factor together the terms containing \(\text{cont_africa}_i\) .

\[ \mu_i = \alpha + \beta_1 \text{rugged}_i + \underbrace{(\beta_2 + \beta_3 \text{rugged}_i)}_G \times \text{cont_africa}_i \]

And just as in the text, our \(G\) term looks a lot like the original \(\gamma_i\) term.

7.2.2 Africa depends upon ruggedness.

Here is our version of McElreath’s Figure 7.6.

hypothesis testing in brm

7.3 Continuous interactions

Though continuous interactions can be more challenging to interpret, they’re just as easy to fit as interactions including dummies.

7.3.1 The data.

Look at the tulips .

7.3.2 The un-centered models.

The likelihoods for the next two models are

Here we continue with McElreath’s very-flat priors for the multivariable and interaction models.

Much like in the text, these models yielded divergent transitions. Here, we’ll try to combat them by following Stan’s advice and “[increase] adapt_delta above 0.8.” While we’re at it, we’ll put better priors on \(\sigma\) .

Increasing adapt_delta did the trick. Instead of coeftab() , we can also use posterior_summary() , which gets us most of the way there.

This is an example where HMC yielded point estimates notably different from MAP. However, look at the size of those posterior standard deviations (i.e., ‘Est.Error’ column)! The MAP estimates are well within a fraction of those \(SD\) s.

Anyway, let’s look at WAIC.

Here we use our cbind() trick to convert the difference from the \(\text{elpd}\) metric to the more traditional WAIC metric.

Why not compute the WAIC weights?

As in the text, almost all the weight went to the interaction model, b7.7 .

7.3.3 Center and re-estimate.

To center a variable means to create a new variable that contains the same information as the original, but has a new mean of zero. For example, to make centered versions of shade and water , just subtract the mean of the original from each value. (p. 230, emphasis in the original)

Here’s a tidyverse way to center the predictors.

Now refit the models with our shiny new centered predictors.

Check out the results.

And okay fine, if you really want a coeftab() -like summary, here’s a way to do it.

Anyway, centering helped a lot. Now, not only do the results in the text match up better than those from Stan, but the ‘Est.Error’ values are uniformly smaller.

7.3.3.1 Estimation worked better.

Nothing to add, here.

7.3.3.2 Estimates changed less across models.

On page 231, we read:

The interaction parameter always factors into generating a prediction. Consider for example a tulip at the average moisture and shade levels, 2 in each case. The expected blooms for such a tulip is:

\[\mu_i | \text{shade}_{i = 2}, \text{water}_{i = 2} = \alpha + \beta_\text{water} (2) + \beta_\text{shade} (2) + \beta_{\text{water} \times \text{shade}} (2 \times 2)\]

So to figure out the effect of increasing water by 1 unit, you have to use all of the \(\beta\) parameters. Plugging in the [HMC] values for the un-centered interaction model, [ b7.7 ], we get:

\[\mu_i | \text{shade}_{i = 2}, \text{water}_{i = 2} = -107.1 + 159.9 (2) + 44.0 (2) -43.2 (2 \times 2)\]

With our brms workflow, we use fixef() to compute the predictions.

Even though or HMC parameters differ a bit from the MAP estimates McElreath reported in the text, the value they predicted matches quite closely with the one in the text. Same thing for the next one.

Here are the coefficient summaries for the centered model.

7.3.4 Plotting implied predictions.

Now we’re ready for the bottom row of Figure 7.7. Here’s our variation on McElreath’s tryptych loop code, adjusted for brms and ggplot2.

hypothesis testing in brm

But we don’t necessarily need a loop. We can achieve all of McElreath’s Figure 7.7 with fitted() , some data wrangling, and a little help from ggplot2::facet_grid() .

hypothesis testing in brm

7.4 Interactions in design formulas

The brms syntax generally follows the design formulas typical of lm() . Hopefully this is all old hat.

7.5 Summary Bonus: marginal_effects()

The brms package includes the marginal_effects() function as a convenient way to look at simple effects and two-way interactions. Recall the simple univariable model, b7.3 :

We can look at the regression line and its percentile-based intervals like so:

hypothesis testing in brm

If we nest marginal_effects() within plot() with a points = T argument, we can add the original data to the figure.

hypothesis testing in brm

We can further customize the plot. For example, we can replace the intervals with a spaghetti plot. While we’re at it, we can use point_args to adjust the geom_jitter() parameters.

hypothesis testing in brm

With multiple predictors, things get more complicated. Consider our multivariable, non-interaction model, b7.4 .

hypothesis testing in brm

We got one plot for each predictor, controlling the other predictor at zero. Note how the plot for cont_africa treated it as a continuous variable. This is because the variable was saved as an integer in the original data set:

One way to fix that is to adjust the data set and refit the model.

Using the update() syntax often speeds up the re-fitting process.

hypothesis testing in brm

Now our second marginal plot more clearly expresses the cont_africa predictor as categorical.

Things get more complicated with the interaction model, b7.5 .

hypothesis testing in brm

The marginal_effects() function defaults to expressing interactions such that the first variable in the term–in this case, rugged –is on the x axis and the second variable in the term– cont_africa , treated as an integer–is depicted in three lines corresponding its mean and its mean +/- one standard deviation. This is great for continuous variables, but incoherent for categorical ones. The fix is, you guessed it, to refit the model after adjusting the data.

Just for kicks, we’ll use probs = c(.25, .75) to return 50% intervals , rather than the conventional 95%.

hypothesis testing in brm

With the effects argument, we can just return the interaction effect, which is where all the action’s at. While we’re at it, we’ll use plot() to change some of the settings.

hypothesis testing in brm

Note, the ordering of the variables matters for the interaction term. Consider our interaction model for the tulips data.

The plot tells a slightly different story, depending on whether you specify effects = "shade_c:water_c" or effects = "water_c:shade_c" .

hypothesis testing in brm

One might want to evaluate the effects of the second term in the interaction– water_c , in this case–at values other than the mean and the mean +/- one standard deviation. When we reproduced the bottom row of Figure 7.7, we expressed the interaction based on values -1, 0, and 1 for water_c . We can do that, here, by using the int_conditions argument. It expects a list, so we’ll put our desired water_c values in just that.

hypothesis testing in brm

McElreath, R. (2016). Statistical rethinking: A Bayesian course with examples in R and Stan. Chapman & Hall/CRC Press.

Session info

Bayesian mixed effects (aka multi-level) ordinal regression models with brms

21 Feb 2017 | all notes

In the past two years I’ve found myself doing lots of statistical analyses on ordinal response data from a (Likert-scale) dialectology questionnaire. I’ve ended up with a good pipeline to run and compare many ordinal regression models with random effects in a Bayesian way using the handy R formula interface in the brms package.

In defense of ordinal regression

In (applied statistical) practice, ordinal data is often simply fit using linear regression (this seems to be particularly true in contemporary, quantitative grammaticality judgment-based syntax literature). While treating ordinal responses as continuous measures is in principle always wrong (because the scale is definitely not ratio), it can in practice be ok to apply linear regression to it, as long as it is reasonable to assume that the scale can be treated as interval data (i.e. the distances between individual response categories are meaningful/comparable). Situations that completely rule out the use of linear methods are:

  • when there are edge effects in the data (i.e. many data points in the highest/lowest category)
  • when one wants to derive predictions from the resulting statistical models – basing these on linear regression models will often lead to uninterpretable results

If you are happy with simply predicting something like a mean response which at best doesn’t match the actual response scale (whether they predicted response is actually outside the actual response scale or just between categories), and is at worst inaccurate due to its not taking into account any uneven distribution of responses across ordinal categories, that’s ok too. But the most reliable and insightful way of checking the quality/fit of a model is by visual inspection of the model predictions, which is why I always want to be able to produce meaningful (ordinal distribution) predictions to match them against the original data.

Ordinal regression a.k.a. cumulative link models

Ordinal regression methods are typically generalisations of methods used for modelling categorical (in the minimal case binary outcome) data. The most frequently used ordinal regression, ordered logistic (or more accurately ordered logit ) regression is an extension of logistic/logit regression: where in logistic regression you model one coefficient that captures the relative likelihood (in log-odds) of one outcome occurring over another (i.e. 2 outcomes captured by 1 coefficient), ordered logit regression models the relative likelihood of k different outcomes based on k-1 coefficients. The k-1 coefficients in the model capture the cumulative likelihood of responses falling into an expanding set of ordered response categories, e.g. in the case of 4 possible outcome categories ( o1 - o4 ) the three intercepts of the model, each of which will be at least as high as the preceding one, capture the baseline log-odds of observing:

As a consequence of this representation of the underlying probabilities, any log-odds coefficients that are added in the linear regression model actually lead to a multiplication of the odds in the model (i.e. the strength of the effect of a predictor on the modelled odds ratios is proportional to the original likelihood of those ratios, this is a consequence of the proportional odds assumption of logit regression).

The formulation and modelling in log-odds that is the result of the logit transformation are specific to ordered logit regression, however several other methods for modelling binary responses (such as probit models) also have ordinal counterparts (i.e. ordered probit ) where the coefficients and cumulative intercepts are interpreted in different ways.

Choosing a link function

So while the idea of representing the cumulative likelihood of an increasing pool of ordinal responses is a general one, there are several possible formats in which those cumulative probabilities can be represented and modelled, which is where link functions come into play. The role of a link function is simply to transform the probability or odds coefficients from the probability space [0,1] to the full linear space [-Inf,Inf] in such a way that the transformed coefficients can be modelled using linear regression in the best possible (read: most accurate/useful) way.

Different statistical packages support different link families, for example the ordinal package (which offers ordinal regression with one random effect) supports the cumulative links “logit”, “probit”, “cloglog”, “loglog” and “cauchit”, while brms (full-on Bayesian multi-level modelling) supports “logit”, “probit”, “probit_approx”, “cloglog” and “cauchit”. The choice of link function is typically not critical and most methods assume the “logit” function (the log-odds transformation that forms the basis of ordered logit regression) by default, but a different choice can be informed by your knowledge of the data.

The most appropriate or useful link function given a particular data set and model of it can be determined from the data by building models with several different link functions and selecting the one that yields the best (i.e. most predictive) fit, as measured by the models’ log-likelihood. While this is quite expensive to do with Bayesian models, here is a simple function making use of the clm function from the ordinal package to quickly fit several possible link functions and determine the one producing the best fit for a given model (note that this does not take any random effects into account; results in Bayesian models might vary even more):

The cumulative link function’s thresholds parameter can be used to apply additional constraints to the spacing of the model’s intercepts, but you’ll typically want to leave this at “flexible”, its most general default.

Before creating a full-on Bayesian model below, we can already create some simple models to inform our choice of link function for later:

Running a model in brms

While running Bayesian models using brms can be slightly more time-consuming than other R packages (because the STAN models have to be compiled first), its neat lmer() -like formula interface means it’s easy to create a large number of models with different sets of predictors which can then be compared. This maximally transparent way of presenting statistical model results is typical in fields such as sociolinguistics where you don’t just present your final model, but also dwell on insights gained along the way based on discovering that the addition of certain coefficients does not increase the predictiveness of the model.

When no priors over the model’s parameters are supplied explicitly via the prior argument, uninformative priors are assumed.

The link function fit above suggested that the logit link function with flexible thresholds is a good choice – both of these are the default for the cumulative() family, they’re only passed explicitly for the sake of clarity. After running the model, the resulting model fit can be inspected in usual ways using summary() , etc.

The trace of the Bayesian model fit as well as the posterior distribution of the coefficients can be visually inspected by calling plot(agemdl) .

Bayesian model comparison and hypothesis testing

brms models support comparison via the Watanabe-Akaike information criterion as well as Leave-One-Out cross-validation:

The model comparison suggests that neither adding the gender nor the age coefficients is particularly predictive (the standard errors of the improvements of the models is bigger than the improvements themselves), but we can still test hypotheses about specific parameters directly, such as whether the age coefficient is positive (where alpha specifies the size of the credible interval ):

Visual inspection, posteriors, random effects,…

brms provides many other useful functions, from ranef(agemdl) for estimating the relative size of the random effects per group to launch_shiny(agemdl) , which opens an interactive web interface that allows complete exploration of the model results and posterior distributions in your browser.

  • brms website
  • brms documentation
  • introduction to cumulative link models
  • stack exchange on mixed effects regression of centered/standardised ordinal responses vs. ordinal regression
  • stack exchange on the proportional odds assumption and how to test it

brms Bayesian Regression Models using 'Stan'

  • Package overview
  • Define Custom Response Distributions with brms
  • Estimating Distributional Models with brms
  • Estimating Monotonic Effects with brms
  • Estimating Multivariate Models with brms
  • Estimating Non-Linear Models with brms
  • Estimating Phylogenetic Multilevel Models with brms
  • Handle Missing Values with brms
  • Parameterization of Response Distributions in brms
  • Running brms models with within-chain parallelization
  • add_criterion: Add model fit criteria to model objects
  • add_ic: Add model fit criteria to model objects
  • addition-terms: Additional Response Information
  • add_rstan_model: Add compiled 'rstan' models to 'brmsfit' objects
  • ar: Set up AR(p) correlation structures
  • arma: Set up ARMA(p,q) correlation structures
  • as.brmsprior: Transform into a brmsprior object
  • as.data.frame.brmsfit: Extract Posterior Draws
  • as.mcmc.brmsfit: Extract posterior samples for use with the 'coda' package
  • AsymLaplace: The Asymmetric Laplace Distribution
  • autocor.brmsfit: (Deprecated) Extract Autocorrelation Objects
  • autocor-terms: Autocorrelation structures
  • bayes_factor.brmsfit: Bayes Factors from Marginal Likelihoods
  • bayes_R2.brmsfit: Compute a Bayesian version of R-squared for regression models
  • BetaBinomial: The Beta-binomial Distribution
  • bridge_sampler.brmsfit: Log Marginal Likelihood via Bridge Sampling
  • brm: Fit Bayesian Generalized (Non-)Linear Multivariate Multilevel...
  • brm_multiple: Run the same 'brms' model on multiple datasets
  • brmsfamily: Special Family Functions for 'brms' Models
  • brmsfit-class: Class 'brmsfit' of models fitted with the 'brms' package
  • brmsfit_needs_refit: Check if cached fit can be used.
  • brmsformula: Set up a model formula for use in 'brms'
  • brmsformula-helpers: Linear and Non-linear formulas in 'brms'
  • brmshypothesis: Descriptions of 'brmshypothesis' Objects
  • brms-package: Bayesian Regression Models using 'Stan'
  • brmsterms: Parse Formulas of 'brms' Models
  • car: Spatial conditional autoregressive (CAR) structures
  • coef.brmsfit: Extract Model Coefficients
  • combine_models: Combine Models fitted with 'brms'
  • compare_ic: Compare Information Criteria of Different Models
  • conditional_effects.brmsfit: Display Conditional Effects of Predictors
  • conditional_smooths.brmsfit: Display Smooth Terms
  • control_params: Extract Control Parameters of the NUTS Sampler
  • cor_ar: (Deprecated) AR(p) correlation structure
  • cor_arma: (Deprecated) ARMA(p,q) correlation structure
  • cor_arr: (Defunct) ARR correlation structure
  • cor_brms: (Deprecated) Correlation structure classes for the 'brms'...
  • cor_bsts: (Defunct) Basic Bayesian Structural Time Series
  • cor_car: (Deprecated) Spatial conditional autoregressive (CAR)...
  • cor_cosy: (Deprecated) Compound Symmetry (COSY) Correlation Structure
  • cor_fixed: (Deprecated) Fixed user-defined covariance matrices
  • cor_ma: (Deprecated) MA(q) correlation structure
  • cor_sar: (Deprecated) Spatial simultaneous autoregressive (SAR)...
  • cosy: Set up COSY correlation structures
  • cs: Category Specific Predictors in 'brms' Models
  • custom_family: Custom Families in 'brms' Models
  • data_predictor: Prepare Predictor Data
  • data_response: Prepare Response Data
  • density_ratio: Compute Density Ratios
  • diagnostic-quantities: Extract Diagnostic Quantities of 'brms' Models
  • Browse all...

hypothesis.brmsfit : Non-Linear Hypothesis Testing In brms: Bayesian Regression Models using 'Stan'

View source: R/hypothesis.R

Non-Linear Hypothesis Testing

Description.

Perform non-linear hypothesis testing for all model parameters.

Among others, hypothesis computes an evidence ratio ( Evid.Ratio ) for each hypothesis. For a one-sided hypothesis, this is just the posterior probability ( Post.Prob ) under the hypothesis against its alternative. That is, when the hypothesis is of the form a > b , the evidence ratio is the ratio of the posterior probability of a > b and the posterior probability of a < b . In this example, values greater than one indicate that the evidence in favor of a > b is larger than evidence in favor of a < b . For an two-sided (point) hypothesis, the evidence ratio is a Bayes factor between the hypothesis and its alternative computed via the Savage-Dickey density ratio method. That is the posterior density at the point of interest divided by the prior density at that point. Values greater than one indicate that evidence in favor of the point hypothesis has increased after seeing the data. In order to calculate this Bayes factor, all parameters related to the hypothesis must have proper priors and argument sample_prior of function brm must be set to "yes" . Otherwise Evid.Ratio (and Post.Prob ) will be NA . Please note that, for technical reasons, we cannot sample from priors of certain parameters classes. Most notably, these include overall intercept parameters (prior class "Intercept" ) as well as group-level coefficients. When interpreting Bayes factors, make sure that your priors are reasonable and carefully chosen, as the result will depend heavily on the priors. In particular, avoid using default priors.

The Evid.Ratio may sometimes be 0 or Inf implying very small or large evidence, respectively, in favor of the tested hypothesis. For one-sided hypotheses pairs, this basically means that all posterior draws are on the same side of the value dividing the two hypotheses. In that sense, instead of 0 or Inf, you may rather read it as Evid.Ratio smaller 1 / S or greater S , respectively, where S denotes the number of posterior draws used in the computations.

The argument alpha specifies the size of the credible interval (i.e., Bayesian confidence interval). For instance, if we tested a two-sided hypothesis and set alpha = 0.05 (5%) an, the credible interval will contain 1 - alpha = 0.95 (95%) of the posterior values. Hence, alpha * 100 % of the posterior values will lie outside of the credible interval. Although this allows testing of hypotheses in a similar manner as in the frequentist null-hypothesis testing framework, we strongly argue against using arbitrary cutoffs (e.g., p < .05 ) to determine the 'existence' of an effect.

A brmshypothesis object.

Paul-Christian Buerkner [email protected]

brmshypothesis

Related to hypothesis.brmsfit in brms ...

R package documentation, browse r packages, we want your feedback.

hypothesis testing in brm

Add the following code to your website.

REMOVE THIS Copy to clipboard

For more information on customizing the embed code, read Embedding Snippets .

An R companion to Statistics: data analysis and modelling

Chapter 13 bayesian hypothesis testing with bayes factors.

In this chapter, we will discuss how to compute Bayes Factors for a variety of General Linear Models using the BayesFactor package ( Morey and Rouder 2023 ) . The package implements the “default” priors discussed in the SDAM book.

13.1 The BayesFactor package

The BayesFactor package implements Bayesian model comparisons for General Linear Models (as well as some other models for e.g. contingency tables and proportions) using JZS-priors for the parameters, or fixing those parameters to 0. Because Bayes Factors are transitive , in the sense that a ratio of Bayes Factors is itself another Bayes factor: \[\begin{align} \text{BF}_{1,2} &= \frac{p(Y_1,\ldots,Y_n|\text{MODEL 1})}{p(Y_1,\ldots,Y_n|\text{MODEL 2})} \\ &= \frac{p(Y_1,\ldots,Y_n|\text{MODEL 1})/p(Y_1,\ldots,Y_n|\text{MODEL 0})} {p(Y_1,\ldots,Y_n|\text{MODEL 2})/p(Y_1,\ldots,Y_n|\text{MODEL 0})} \\ &= \frac{\text{BF}_{1,0}}{\text{BF}_{2,0}} , \end{align}\] you can compute many other Bayes Factors which might not be immediately provided by the package, by simply dividing the Bayes factors that the package does provide. This makes the procedure of model comparison very flexible.

If you haven’t installed the BayesFactor package yet, you need to do so first. Then you can load it as usual by:

13.1.1 A Bayesian one-sample t-test

A Bayesian alternative to a \(t\) -test is provided via the ttestBF function. Similar to the base R t.test function of the stats package, this function allows computation of a Bayes factor for a one-sample t-test or a two-sample t-tests (as well as a paired t-test, which we haven’t covered in the course). Let’s re-analyse the data we considered before, concerning participants’ judgements of the height of Mount Everest. The one-sample t-test we computed before, comparing the judgements to an assumed mean of \(\mu = 8848\) , was:

The syntax for the Bayesian alternative is very similar, namely:

This code provides a test of the following models:

\[\begin{align} H_0\!&: \mu = 8848 \\ H_1\!&: \frac{\mu - 8848}{\sigma_\epsilon} \sim \textbf{Cauchy}(r) \end{align}\]

After computing the Bayes factor and storing it in an object bf_anchor , we just see the print-out of the result by typing in the name of the object:

This output is quite sparse, which is by no means a bad thing. It shows a few important things. Under Alt. (which stands for the alternative hypothesis), we first see the scaling factor \(r\) used for the JZS prior distribution on the effect size. We then see the value of the Bayes Factor, which is “extreme” (>100), showing that the data increases the posterior odds ratio for the alternative model over the null model by a factor of 46,902,934,288. Quite clearly, the average judgements differed from the true height of Mount Everest! After the computed value of the Bayes factor, you will find a proportional error for the estimate of the Bayes factor. In general, the marginal likelihoods that constitute the numerator (“top model”) and denominator (“bottom model”) of the Bayes factor cannot be computed exactly, and have to be approximated by numerical integration routines or simulation. This results in some (hopefully small) error in computation, and the error estimate indicates the extend to which the true Bayes factor might differ from the computed one. In this case, the error is (proportionally) very small, and hence we can be assured that our conclusion is unlikely to be affected by error in the approximation.

As we didn’t set the scaling factor explicitly, the default value is used, which is the “medium” scale \(r = \frac{\sqrt{2}}{2} = 0.707\) . Note that this is actually different from the default value of \(r=1\) proposed in Rouder et al. ( 2009 ) , which first introduced this version of the Bayesian \(t\) -test to a psychological audience, and the one used to illustrate the method in the SDAM book. Whilst reducing the default value to \(r=0.707\) is probably reasonable given the effect sizes generally encountered in psychological studies, a change in the default prior highlights the subjective nature of the prior distribution in the Bayesian model comparison procedure. You should also realise that different analyses, such as t-tests, ANOVA, and regression models, use different default values for the scaling factor. As shown in the SDAM book, the value of the Bayes factor depends on the choice for the scaling factor. Although the default value may be deemed reasonable, the choice should really be based on a consideration of the magnitude of the effect sizes you (yes, you!) expect in a particular study. This is not always easy, but you should pick one (the default value, for instance, if you can’t think of a better one) before conducting the analysis. If you feel that makes the test too subjective, you may may want to check the robustness of the result for different choices of the scaling factor. You can do this by computing the Bayes factor for a range of choices of the scaling factor, and then inspecting whether the strength of the evidence is in line with your choice for a reasonable range of values around your choice. The code below provides an example of this:

hypothesis testing in brm

Given the scale of the \(y\) -axis (e.g., the first tick mark is at 1e+10 = 10,000,000,000), there is overwhelming evidence against the null-hypothesis for most choices of the scaling factor. Hence, the results seem rather robust to the exact choice of prior.

13.1.2 A Bayesian two-sample t-test

To compare the means of two groups, we can revisit the Tetris study, where we considered whether the number of memory intrusions is reduced after playing Tetris in combination with memory reactivation, compared to just memory reactivation by itself. The ttestBF function allows us to provide the data for one group as the x argument, and the data for the other group as the y argument, so we can perform our model comparison, by subsetting the dependent variable appropriately, as follows:

Which shows strong evidence for the alternative hypothesis over the null hypothesis that the means are identical (i.e. that the difference between the means is zero, \(\mu_1 - \mu_2 = 0\) ), as the alternative model is 2.82 times more likely than the null model, which sets the difference between the means to exactly \(\mu_1 - \mu_2 = 0\) , rather than allowing different values of this difference through the prior distribution.

A two-sample t-test should really be identical to a two-group ANOVA model, as both concern the same General Linear Model (a model with a single contrast-coding predictor, with e.q. values of \(-\tfrac{1}{2}\) and \(\tfrac{1}{2}\) ). Before fully discussing the way to perform an ANOVA-type analysis with the BayesFactor package, let’s just double-check this is indeed the case:

The results are indeed identical. Note that this is because both the ttestBF and anovaBF function use the same prior distribution for the effect.

13.1.3 A Bayesian ANOVA

More general ANOVA-type models can be tested though the anovaBF function. This function takes the following important arguments:

  • formula : a formula like in the lm function, specifying which factors to include as well as their interactions (by e.g. using an * operator to specify you want to include the main effects as well as their interactions. Note that unlike in the lm function, all terms must be factors .
  • data : a data.frame containing data for all factors in the formula.
  • whichRandom : a character vector specifying which factors are random. Random factors can be used to obtain a model similar to a repeated-measures ANOVA, or a (restricted) set of linear-mixed effects models (with only factors for the fixed effects).
  • whichModels : a character vector specifying which models to compare. The allowed values are "all" , "withmain" (the default), "top" , and "bottom" . Setting whichModels to "all" will test all models that can be created by including or not including a main effect or interaction. "top" will test all models that can be created by removing or leaving in a main effect or interaction term from the full model. "bottom" creates models by adding single factors or interactions to the null model. "withmain" will test all models, with the constraint that if an interaction is included, the corresponding main effects are also included. Setting the argument to top produces model comparisons similar to the Type 3 procedure, comparing the full model to a restricted model with each effect removed. Setting the argument to withMain produces model comparisons similar to the Type 2 procedure, with model comparisons that respect the “principle of marginality”, such that tests of the main effects do not consider higher-order interactions, whilst a test of any interaction includes the main effects that constitute the elements in the interactions.
  • rscaleFixed : prior scale for fixed effects. The value can be given numerically, or as one of three strings: "medium" ( \(r = \tfrac{1}{2}\) ), "wide" : \(r = \tfrac{\sqrt{2}}{2}\) , or "ultrawide" ( \(r=1\) ). The default is “ medium" .
  • rscaleRandom : prior scale for random effects. Accepts the same strings as rscaleFixed , and in addition "nuisance" ( \(r = 1\) ). The default is "nuisance" .
  • rscaleEffects : a named vector with prior scales for individual factors.

The anovaBF function will (as far as I can gather) always use contr.sum() contrasts for the factors . So setting your own contrasts will have no effect on the results. The exact contrast should not really matter for omnibus tests, and sum-to-zero are a reasonable choice in general ( contr.sum implements what we called effect-coding before). 3 While the anovaBF function always uses the JZS prior for any effects, it allows you to specify exactly which scaling factor to use for every effect, if so desired. One perhaps confusing thing is that effect-sizes for ANOVA designs (as far as I can gather) are based on standardized treatment-effects, whilst those for the t-test designs are based on Cohens- \(d\) effect sizes. Hence, the values of the scaling factor \(r\) for “medium”, “wide”, and “ultrawide” are different for the Bayesian \(t\) -test and ANOVA models (whilst they provide the same results for models with two conditions).

Let’s see what happens when we use a Bayesian ANOVA-type analysis for the data on experimenter beliefs in social priming. First, let’s load the data, and turn the variables reflecting the experimental manipulations into factors:

We can now use the anovaBF function to compute the Bayes factors:

A main thing to note here is that the comparisons of different versions of MODEL G are against the same MODEL R, which is an intercept-only model. We can see that all models which include experimenterBelief receive strong evidence against the intercept-only model, apart from the model which only includes primeCond , which has less evidence than the intercept-only model. Although this indicates that the primeCond effect might be ignorable, the comparisons are different from comparing reduced models to the general MODEL G with all effects included. We can obtain these Type 3 comparisons by setting the whichModels argument to `top``:

It is very important to realise that the output now concerns the comparison of the reduced model (in the numerator, i.e. the “top model”) against the full model (in the denominator, i.e. the “bottom model”), as is stated in the Against denimonator part of the output. So these are \(\text{BF}_{0,1}\) values, rather than \(\text{BF}_{1,0}\) values. That means that low values of the Bayes factor now indicate evidence for the alternative hypothesis that an effect is different from 0. As we find a very low \(\text{BF}_{0,1}\) value for the experimenterBelief effect, this thus shows strong evidence that this effect is different than 0. The \(\text{BF}_{0,1}\) values for the other effects are larger than 1, which indicate more support for the null hypothesis than for the alternative hypothesis.

We can change the output from a \(\text{BF}_{0,1}\) value to a \(\text{BF}_{1,0}\) value by simply inverting the Bayes factors, as follows:

As we noted before, we again see strong evidence for the effect of experimenterBelief when we remove it from the full model, but not for the other effects.

The transitivity of the Bayes factor means that we can also obtain some of these results through a ratio of the Bayes factors obtained earlier. For instance, a Type 3 test of the effect of experimenterBelief:primeCond interaction can be obtained by comparing a model with all effects included to a model without this interaction. In the analysis stored in bf_expB , we compared a number of the possible models to an intercept-only model. By comparing the Bayes factors of the model which excludes the interaction to a model which includes it, we can obtain the same Bayes factor of that interaction as follows. In the output of bf_expB , the fourth element compared the full model to the intercept-only model, whilst in the third element, a model with only the main effects of experimenterBelief and primeCond are compared to an intercept-only model. The Type 3 test of the interaction can then be obtained through the ratio of these two Bayes factors:

which indicates evidence for the null hypothesis that there is no moderation of the effect of experimenterbelief by primeCond , as the Bayes factor is well below 1. We cannot replicate all Type 3 analyses with the results obtained earlier, unless we ask the function to compare every possible model against the intercept-only model, by specifying whichModels = "all" :

For instance, we can now obtain a Type 3 test for experimenterBelief by comparing the full model (the 7th element in the output) to a model which just excludes this effect (i.e. the 6th element):

which reproduces mostly the result we obtained by setting whichModel = "top" before.

13.1.4 Bayesian regression and ANCOVA

Apart from different default values of the scaling factor \(r\) in the scaled-Cauchy distribution, the BayesFactor package works in the same way for models which include metric predictors. In a multiple regression model with only metric predictors, we can use the convenience function regressionBF . If you want to mix metric and categorical predictors, as in an ANCOVA model, you will have to use the generalTestBF function. All functions discussed so far are really just convenience interfaces to the generalTestBF , which implements Bayes factors for the General Linear Model. These convenience functions are used to determine an appropriate scaling factor for the different terms in the model, but not much else of concern, so you can replicate all the previous analyses through the generalTestBF function, if you’d like.

13.1.5 Bayesian repeated-measures ANOVA

An analysis similar to a repeated-measures ANOVA can also be obtained. Just like the afex package, the BayesFactor package requires data in the long format. Let’s first prepare the data of the Cheerleader-effect experiment:

The way the BayesFactor package deals with repeated-measures designs is a little different from how we treated repeated-measures ANOVA. Rather than computing within-subjects composite variables, the package effectively deals with individual differences by adding random intercepts (like in a linear mixed-effects model). To do this, we add Participant as an additive effect, and then classify it as a random effect through the whichRandom argument. To obtain Type-3 comparisons, we again set whichModels to top :

In this case, the proportional errors of the results may be deemed to high. We can get more precise results by obtaining more samples (for these complex models, the estimation of the Bayes factor is done with a sampling-based approximation). We can do this, without the need to respecify the model, with the recompute function, where we can increase the number of sampling iterations from the default (10,000 iterations) to something higher:

This provides somewhat better results, although it would be better to increase the number of iterations even more.

As before, the Bayes Factors are for the reduced model compared to the full model and we can get more easily interpretable results by computing the inverse value

We can see that we obtain “extreme” evidence for the main effect of Item. For the other effects, the evidence is more in favour of the null-hypothesis.

13.1.6 Parameter estimates

By default, the Bayes Factor objects just provide the values of the Bayes Factor. We don’t get estimates of the parameters.

To get (approximate) posterior distributions for the parameters, we can first estimate the general MODEL G with the lmBF function. This function is meant to compute a specific General Linear Model (rather than a set of such models). For example, for the Social Priming example, we can estimate the ANOVA model with lmBF as:

We can then use this estimated model to obtain samples from the posterior distribution over the model parameters. This is done with the posterior function of the Bayesfactor package. We can determine the number of samples through the iterations argument. This should generally be a high number, to get more reliable estimates:

The post_samples object can be effectively treated as a matrix, with columns corresponding to the different parameters, and in the rows the samples. So we can obtain posterior means as the column-wise averages:

Here, mu corresponds to the “grand mean” (i.e. the average of averages), which is is the intercept in a GLM with sum-to-zero contrasts. The next mean corresponds to the posterior mean of the treatment effect of the high-power prime condition ( primeCond-HPP ). I.e., this is the marginal mean of the high-power prime conditions, compared to the grand mean. The second effect is the posterior mean of the treatment effect of the low-power prime condition ( primeCond-LPP ). As there are only two power-prime conditions, this is exactly the negative value of the posterior mean of the high-power prime treatment effect (the grand mean is exactly halfway between these two treatment effects). We get similar treatment effects for the main effect of experimenter belief, and the interaction between power prime and experimenter belief. The posterior mean labelled sig2 is an estimate of the error variance. The columns after this are values related to the specification of the prior, which we will ignore for now.

We can do more than compute means for these samples from the posterior distribution. For instance, we can plot the (approximate) posterior distributions as well. For example, we can plot the posterior distribution of the high-power prime treatment effect as:

hypothesis testing in brm

13.1.7 Highest-density interval

A convenient way to obtain highest-density intervals, is by using the hdi function from the HDInterval package ( R-HDInterval? ) . This function is defined for a variety of objects, including those returned by the BayesFactor::posterior() function. The function has, in addition to the object, one more argument called credMass , which specifies the width of the credible interval ( credMass = .95 is the default). For example, the 95% HDI interval for the two parameters that we plotted above are obtained as follows:

The output shows the lower and upper bound for each HDI. We see that the 95% HDI for power prime effect includes 0, whilst the 95% HDI for the experimenter belief effect does not. Again, this corresponds to what we observed earlier, that there is strong evidence for the experimenter belief effect, but not for the power prime effect.

13.2 Bayes Factors for brms models

The BayesFactor package computes Bayes Factors for a number of standard analyses, using the default priors set in the package. We may want to compute Bayes Factors to test hypotheses for more complex models. In general, we can compare any models by first computing the marginal likelihood for each model, and then computing the Bayes Factor as the ratio of these marginal likelihoods. Computing marginal likelihoods is not always straightforward, but a general procedure that often works reasonably well is called “bridge sampling” ( Quentin F. Gronau et al. 2017 ) , and has been implemented in the bridgesampling ( Quentin F. Gronau and Singmann 2021 ) package. Before discussing how to use this, we will first discuss a simpler way to compute Bayes Factors for particular comparisons within the context of a brms model.

For the following examples, we will start with the multiple regression model for the trump2016 data that we also estimated in the previous chapter, but now setting the prior distributions to more informative ones, as is advisable when computing Bayes Factors:

The results of this model are

13.2.1 Testing simple point and directional hypotheses for parameters of brms models

The brms::hypothesis() function can be used to test hypotheses about single parameters of brms models. This requires to set the argument sample_prior=TRUE in the brms::brm() function. The brms::hypothesis() function will not work properly without this. The brms::hypothesis() function has the following important arguments:

  • x : a model of class brmsfit as fitted by brms::brm() . It can also be a data.frame with posterior draws (in rows) of parameters (in columns).
  • hypothesis : a character vector describing hypotheses to be tested.
  • class : a string specifying the class of parameter(s) tested ( "b" is common and the default, other common options are "sd" and "cor" ). This argument is not necessary, but can be handy to avoid having to specify the full names of the parameters.
  • group : a string characterising the grouping factor to evaluate only group-level effects parameters.
  • alpha : used to specify the width of the HDI (e.g.  alpha = .05 for a 95% HDI)
  • seed : seed for the random number generator to make the results reproducible.

The specification of the hypothesis argument is rather flexible. For example, we can test the (null) hypothesis that the slope of hate_groups_per_million equals 0 by specifying the hypothesis as "hate_groups_per_million = 0" :

This compares mod_regression to an alternative model where the prior for the slope of hate_groups_per_million is set to a point-prior at 0 (i.e. only the value 0 is allowed). The output of the function repeats some values that are also provided in the output of the summary() function (the posterior mean, standard deviation, and lower- and upper-bound of the 95% HDI). In addition, we find values of Evid.Ratio and Post.Prob . The Evidence Ratio ( Evid.Ratio ) is the value of the Bayes Factor \(\text{BF}_{01}\) comparing the model specified in the hypothesis (MODEl R) to the less restrictive MODEL G ( mod_regression ). So values larger than 1 indicate that the data provides evidence for the tested hypothesis (MODEL R) to MODEL G. Conversely, values smaller than 1 indicate evidence for MODEL G over MODEL R. The value found here ( \(\text{BF}_{01} = 0.1180559\) ) can be considered “moderate” evidence for MODEL G which allows hate_groups_per_million to have an effect, compared to MODEL R which fixes the effect to 0. The Bayes Factor in this procedure is calculated via the so-called Savage-Dickey density ratio ( Wagenmakers et al. 2010 ) . The Posterior Probability is the posterior probability that the hypothesis is true. For this point-hypothesis, this is the posterior probability of the model with the point-prior (MODEL R), assuming equal prior probabilities for this model and MODEL G.

Directional hypotheses can also be tested. For example, we can test the hypothesis that the slope of hate_groups_per_million is larger than 0 by specifying the hypothesis as "hate_groups_per_million > 0" :

Whilst the output is similar as before, for these directional tests, a different procedure is used to compute the “evidence ratio”. Here, the evidence ratio is the posterior probability that the parameter is larger than 0, divided by the posterior probability that the parameter value is smaller than 0, e.g.: \[\text{Evidence ratio} = \frac{p(\beta > 0|\text{data})}{p(\beta < 0|\text{data})}\] which is estimated simply by the proportion of posterior samples that are larger than 0 (which is also stated under Post.Prob ), divided by the proportion of posterior samples smaller than 0 (which equals \(1-\) Post.Prob ). You can also use the procedure to test some “wacky” hypotheses, such as that the slope of hate_groups_per_million is smaller than the slope of percent_bachelors_degree_or_higher :

As the scales of hate_groups_per_million and percent_bachelors_degree_or_higher are quite different, this hypotheses does not necessarily make too much sense from a scientific viewpoint. The example was mainly meant to show the flexibility of the procedure.

13.2.2 Using bridge sampling to obtain marginal likelihoods

Bridge sampling ( Bennett 1976 ; Meng and Wong 1996 ; Quentin F. Gronau et al. 2017 ) provides a general method to estimate the marginal likelihood from MCMC samples. We won’t go into the details of this algorithm here ( Quentin F. Gronau et al. 2017 provide a relatively readable introduction to this) , but note that it is a sampling-based approximation, and the accuracy of the method will depend on how many samples are used (and whether the MCMC algorithm has converged to sampling from the posterior distribution).

The implementation of bridge sampling in the bridgesampling , for which the brms::bridge_sampler() function provides a simple wrapper, requires that all parameters of the model are sampled at each step. This can be requested by setting the option save_pars = save_pars(all = TRUE) in the call to brms::brm() . We did not do this before, so to be able to use the brms::bridge_sampler() function, we should first re-estimate the model with:

Having set save_pars = save_pars(all = TRUE) , we can then call the brms::bridge_sampler() function on the estimated model as:

This function returns the approximate (natural) logarithm of the marginal likelihood, i.e. \[\widehat{\log p}(\text{data}|\text{MODEL})\] To compute a Bayes Factor, we will also need the (log) marginal likelihood for an alternative model. For example, we can set the prior for the slope of hate_groups_per_million to be a point-prior at 0 by specifying the prior distribution for that parameter as constant(0) :

We can now use the brms::bridge_sampler() function to compute the (log) marginal likelihood for this estimated model as:

We now have two (log) marginal likelihoods. To compute the actual Bayes Factor, we can use the factor that: \[\begin{aligned} \log \left( \frac{p(\text{data}|\text{MODEL 1})}{p(\text{data}|\text{MODEL 1})} \right) &= \log p(\text{data}|\text{MODEL 1}) - \log p(\text{data}|\text{MODEL 2}) \\ \frac{p(\text{data}|\text{MODEL 1})}{p(\text{data}|\text{MODEL 1})} &= \exp \left( \log p(\text{data}|\text{MODEL 1}) - \log p(\text{data}|\text{MODEL 2}) \right) \end{aligned}\] So we can compute the Bayes factor by taking the difference between the log marginal likelihoods, and then exponentiating:

where we have used that each object returned by the brms::bridge_sampler() is a list with the named element logml being equal to the (approximate) marginal log likelihood.

Using this explicit computation can be avoided by calling the bridgesampling::bf() function, which will provide the actual Bayes Factor from the log marginal likelihoods:

Curiously, a scaling factor of \(r = \tfrac{1}{2}\) in this case corresponds to a scaling factor of \(r = \tfrac{\sqrt{2}}{2}\) , which is something I don’t immediately understand, and will require further investigation. ↩︎

Advertisement

Advertisement

Natural Molecular Hydrogen Seepage Associated with Surficial, Rounded Depressions on the European Craton in Russia

  • Published: 15 November 2014
  • Volume 24 , pages 369–383, ( 2015 )

Cite this article

  • Nikolay Larin 1 ,
  • Viacheslav Zgonnik 2 ,
  • Svetlana Rodina 1 ,
  • Eric Deville 2 ,
  • Alain Prinzhofer 2   nAff3 &
  • Vladimir N. Larin 4  

1688 Accesses

65 Citations

22 Altmetric

Explore all metrics

In the Russian part of the European craton, several thousands of subcircular structures ranging in size from a hundred meters to several kilometers in diameter have been identified throughout the region extending from Moscow to Kazakhstan. Generally, these structures correspond to minor morphological depressions. In cultivated areas, the periphery of these structures is often outlined by a ring of soil bleaching associated with growth anomalies of vegetation. The cores of the structures commonly correspond to marshes, sometimes with lakes. Subsoil gas composition of these structures was studied. For this purpose, portable gas detectors were used, and the results obtained were confirmed by gas chromatography analysis. Inside and around these structures, the concentration of molecular hydrogen in soil was much greater inside than outside, up to 1.25% at 1.2 m in soils. The hydrogen is associated with a small quantity of methane. We estimated a daily hydrogen flow seeping out at the surface is between 21,000 and 27,000 m 3 in one of these structures.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

hypothesis testing in brm

Similar content being viewed by others

Evidence for natural molecular hydrogen seepage associated with carolina bays (surficial, ovoid depressions on the atlantic coastal plain, province of the usa).

Viacheslav Zgonnik, Valérie Beaumont, … Kathleen M. Farrell

hypothesis testing in brm

Influence of Spring Burns on the Properties of Humus Horizon of Chernozem in the Southeast of Western Siberia

I. N. Semenkov, S. A. Lednev, … T. V. Koroleva

The Content and Composition of Organic Matter in Soils of the Subpolar Urals

V. V. Startsev, A. S. Mazur & A. A. Dymov

Updates are published regularly on http://hydrogen-future.com/en/ .

Abrajano, T. A., Sturchio, N., Bohlke, J. K., Lyon, G., Poreda, R., & Stevens, C. (1988). Methane-hydrogen gas seeps, Zambales Ophiolite, Philippines: Deep or shallow origin? Chemical Geology, 71 , 211–222.

Article   Google Scholar  

Abrajano, T. A., Sturchio, N. C., Kennedy, B. M., Lyon, G. L., Muehlenbachs, K., & Bohlke, J. K. (1990). Geochemistry of reduced gas related to serpentinization of the Zambales ophiolite, Philippines. Applied Geochemistry, 5 , 625–630.

Angino, E. E., Coveney, R. M. J., Goebel, E. D., Zeller, E. J., & Dreschhoff, G. A. M. (1984). Hydrogen and nitrogen—origin, distribution, and abundance, a followup. Oil & Gas Journal, 82 , 142–146.

Google Scholar  

Apps, J.A., Van De Kamp, P.C. (1993). Energy gases of abiogenic origin in the Earth’s crust. Future Energy Gases. United States Geological Survey Professional Paper, pp. 81–130.

Birina, L.M. (Ed.). (1962). Map of pre-quaternary formations: N-37-III. Scale 1:200000 . Geologic administration of Central regions (in Russian).

Charlou, J.-L. (2002). Geochemistry of high H 2 and CH 4 vent fluids issuing from ultramafic rocks at the Rainbow hydrothermal field (36°14′N, MAR). Chemical Geology, 191 , 345–359.

Charlou, J.-L., Fouquet, Y., Bougault, H., Donval, J. P., Etoubleau, J., Jean-Baptiste, P., et al. (1998). Intense CH 4 plumes generated by serpentinization of ultramafic rocks at the intersection of the 15°20′N fracture zone and the Mid-Atlantic Ridge. Geochimica et Cosmochimica Acta, 62 , 2323–2333.

Coveney, R. M. J., Goebel, E. D., Zeller, E. J., Dreschhoff, G. A. M., & Angino, E. E. (1987). Serpentinization and origin of hydrogen gas in Kansas. AAPG Bulletin, 71 , 39–48.

Cussler, E. L. (2009). Diffusion. Mass transfer in fluid systems. Cambridge University Press. Deville E., Guerlais S.-H. 2009. Cyclic activity of mud volcanoes: Evidences from Trinidad (SE Caribbean). Marine and Petroleum Geology, Special issue on mud volcanoes, 26 , 1681–1691.

Deville, E., & Guerlais, S.-H. (2009). Cyclic activity of mud volcanoes: Evidences from Trinidad (SE Caribbean). Marine and Petroleum Geology, 26 (9), 1681–1691.

Firstov, P. P., & Shirokov, V. A. (2005). Dynamics of molecular hydrogen and its relation to deformational processes at the Petropavlovsk–Kamchatskii geodynamic test site: Evidence from observations in 1999–2003. Geochemistry International, 43 , 1056–1064.

Freund, F., Dickinson, J. T., & Cash, M. (2002). Hydrogen in rocks: an energy source for deep microbial communities. Astrobiology, 2 (1), 83–92.

Iosifova, Yu. I., Gorbatkina, T. E., & Fadeeva, L. I. (1998). Geologic map of pre-quaternary sediments of Voronezh region. Scale 1:500 000. Ed. Gavryushova, E. A., Dashevskiy, V. V. Ministry of Natural Resources of Russian Federation (In Russian).

Gusev, A. (1997). Gas - geochemical effects of modern geodynamic activity of platform structures (on example of South - East of Belarus) . Schmidt Institute of Physics of the Earth of Russian Academy of Sciences. Dissertation (in Russian).

Hosgörmez, H., Etiope, G., & Yalçin, M. N. (2008). New evidence for a mixed inorganic and organic origin of the Olympic Chimaera fire (Turkey): A large onshore seepage of abiogenic gas. Geofluids, 4 , 263–273.

Ikorsky, S. V., Gigashvili, G. M., Lanyov, V. S., Narkotiev, V. D., & Petersilye, I. A. (1999). The investigation of gases during the Kola Superdeep Borehole Drilling (to 11.6 km Depth), in: Whiticar, M. J., Faber, E. (Eds.), Geologisches Jahrbuch Reihe , D. E. Schweizerbart Science Publishers, Hannover, pp. 145–152.

Isaev, E. I., Skorodumova, N. V., Ahuja, R., Vekilov, Y. K., & Johansson, B. (2007). Dynamical stability of Fe–H in the Earth’s mantle and core regions. PNAS of the USA, 104 , 9168–9171.

Jones, V.T., & Pirkle, R.J. (1981). Helium and hydrogen soil gas anomalies associated with deep or active faults, in: 181st ACS National Meeting . Atlanta.

Kotelnikova, S. (2002). Microbial production and oxidation of methane in deep subsurface. Earth-Science Reviews, 58 , 367–395.

Larin, V.N. (1993). Hydridic Earth: the New Geology of Our Primordially Hydrogen - rich Planet . Ed. C. W. Hunt. Polar publishing, Alberta.

Larin, N.V., Larin, V.N., & Gorbatikov A.V. (2010). Circular structures, caused by the deep seeping of hydrogen. Degassing of the Earth: Geotectonics, geodynamics, Deep Fluids, oil and gas; hydrocarbons and life . Russian Conference with International Participation Dedicated to the 100th Anniversary of Academician P.N. Kropotkin, p. 282 (in Russian).

Lin, L.-H., Hall, J., Lippmann-Pipke, J., Ward, J. A., Lollar, B. S., DeFlaun, M., et al. (2005). Radiolytic H 2 in continental crust: Nuclear power for deep subsurface microbial communities. Geochemistry, Geophysics, Geosystems, 6 (7), 1–13.

McCarthy, H., & McGuire, E. (1998). Soil gas studies along the Carlin trend, Eureka and Elko counties, Nevada. In: Tosdal, R.M. (Ed.), Contributions to the Gold Metallogeny of Northern Nevada. Open - File Report 98 - 338 1998 . U.S. Dept. of the Interior, U.S. Geological Survey, pp. 243–250.

Mukhin, Y. (1970). Main results of the deep hydrogeological studies in Central Russia sedimentation basin in connection with estimation of the prospects for oil and gas. Proceedings of VNIIGaz , (33/41), pp. 157–295 (in Russian).

Neal, C., & Stanger, G. (1983). Hydrogen generation from mantle source rocks in Oman. Earth and Planetary Science Letters, 66 , 315–320.

Newell, K. D., Doveton, J. H., Merriam, D. F., Lollar, B. S., Waggoner, W. M., & Magnuson, L. M. (2007). H 2 -rich and hydrocarbon gas recovered in a deep precambrian well in Northeastern Kansas. Natural Resources Research, 16 , 277–292.

Proskurowski, G., Lilley, M. D., Kelley, D. S., & Olson, E. J. (2006). Low temperature volatile production at the Lost City Hydrothermal Field, evidence from a hydrogen stable isotope geothermometer. Chemical Geology, 229 , 331–343.

Rogozhin, E. A., Gorbatikov, A. V., Larin, N. V., & Stepanova, M. Y. (2010). Deep structure of the Moscow Aulacogene in the western part of Moscow. Izvestiya, Atmospheric and Ocean. Physics, 46 , 973–981.

Sano, Y., Urabe, A., Wakita, H., & Wushiki, H. (1993). Origin of hydrogen–nitrogen gas seeps, Oman. Applied Geochemistry, 8 , 1–8.

Sato, M., Sutton, A. J., McGee, K. A., & Russell-Robinson, S. (1986). Monitoring of hydrogen along the San Andreas and Calaveras Faults in central California in 1980–1984. Journal of Geophysical Research, 91 , 12315–12326.

Semlitsch, R. D. (2000). Size does matter: The value of small isolated wetlands. National Wetlands Newsletter, 22 , 5–6.

Shcherbakov, A. V., & Kozlova, N. D. (1986). Occurrence of hydrogen in subsurface fluids and the relationship of anomalous concentrations to deep faults in the USSR. Geotectonics, 20 , 120–128.

Sherwood Lollar, B., Lacrampe-Couloume, G., Slater, G. F., Ward, J. A., Moser, D. P., Gihring, T. M., et al. (2006). Unravelling abiogenic and biogenic sources of methane in the Earth’s deep subsurface. Chemical Geology, 226 , 328–339.

Shestopalov, V. M., & Makarenko, A. N. (2013). On several results of research, developing the idea of V.I. Vernadskiy about “gas breathing” of the Earth. Geological Journal, 3 , 7–25. (in Russian).

Shik, S.M. (Ed.), (2000). Map of pre-quaternary formations: M-37, (38) (Voronezh). Scale 1:1000000 . FGUP “VSEGEI” (in Russian).

Shishov, S. I. (2010). Geography and soil geochemistry characteristics of delineated depressions in the Ryazan region. Bulletin of the Esenin Ryazan State University, 28 (3), 116–129. (in Russian).

Smith, N.J.P., Shepherd, T.J., Styles, M.T., & Williams, G.M., (2005). Hydrogen exploration: a review of global hydrogen accumulations and implications for prospective areas in NW Europe, in: Petroleum Geology: North-West Europe and Global Perspectives — Proceedings of the 6th Petroleum Geology Conference. Geological Society, London, pp. 349–358.

Sukhanova, N. I., Trofimov, S. Y., Polyanskaya, L. M., Larin, N. V., & Larin, V. N. (2013). Changes in the humus status and the structure of the microbial biomass in hydrogen exhalation places. Eurasian Soil Science, 46 , 135–144.

Syvorotkin, V. L. (2010). Hydrogen degassing of the earth: Natural disasters and the biosphere. Man and the Geosphere (pp. 307–347). New York: Nova Science Publishers.

Takai, K., Gamo, T., Tsunogai, U., Nakayama, N., Hirayama, H., Nealson, K.H., & Horikoshi, K. (2004). Geochemical and microbiological evidence for a hydrogen-based, hyperthermophilic subsurface lithoautotrophic microbial ecosystem (HyperSLiME) beneath an active deep-sea hydrothermal field. Extremophiles, 8, 269–82.

Toulhoat, H., Beaumont, V., Zgonnik, V., Larin, N. V., & Larin, V. N. (2012). Chemical differentiation of planets: a core issue. Submitted for publication. Available at: http://arxiv.org/abs/1208.2909 .

Urdukhanov, R. I., Nikolaev, I. N., Voytov, G. I., Daniyalov, M. G., Prutskaya, L. D., Parshikova, N. G., et al. (2002). Instability of the hydrogen field in atmosphere of soils and subsoils in response to the earthquake in Dagestan in 1998–2000. Proceedings of the Academy of Science, Section Geophysics, 385 (6), 818–822. (in Russian).

Vacquand, C., (2011). Genèse et mobilité de l’hydrogène dans les roches sédimentaires: source d’énergie naturelle ou vecteur énergétique stockable? PhD dissertation. IFP Energies Nouvelles and Institut de Physique du Globe de Paris. Available at: http://www.ipgp.fr/docs/publications/theses/20110318-vacquand.pdf .

Wakita, H., Nakamura, Y., Kita, I., Fujii, N., & Notsu, K. (1980). Hydrogen release: New indicator of fault activity. Science (New York, N.Y.), 210 , 188–190.

Ware, R. H., Roecken, C., & Wyss, M. (1985). The detection and interpretation of hydrogen in fault gases. Pure and Applied Geophysics PAGEOPH, 122 , 392–402.

Welhan, J. A., & Craig, H. (1979). Methane and hydrogen in East Pacific rise hydrothermal fluids. Geophysical Research Letters, 6 , 829–831.

Zgonnik, V., Beaumont, V., Deville, E., Larin, N., Pillot, D., & Farrell, K. Evidence for natural molecular hydrogen seepage associated with surficial, rounded depressions. The Atlantic Coastal Plain Province of the USA (Under review).

Download references

Acknowledgement

We thank Hervé Toulhoat and Armand Lattes for their support for this research topic, ZAO “NTK” for their assistance with funding for this research, Pavel Vodnev, Aleksandr Sysolin, Olga Veretennikova, Irina Katsura, Vladimir Larin (Jr.) for their assistance with the fieldwork, and the anonymous reviewers for their comments and critics.

Author information

Alain Prinzhofer

Present address: IPEXCO Rua Dezenove de Fevereiro, 69-71 Botafogo, Rio de Janeiro, RJ, CEP 22.280-030, Brazil

Authors and Affiliations

Russian Academy of Science, Schmidt Institute of Physics of the Earth, B. Gruzinskaya St. 10, 123995, Moscow, Russia

Nikolay Larin & Svetlana Rodina

IFP Energies Nouvelles, 1 & 4 Avenue de Bois Préau, 92852, Rueil-Malmaison Cedex, France

Viacheslav Zgonnik, Eric Deville & Alain Prinzhofer

Natural Hydrogen Energy Ltd, 24165 County Road 90, Ault, CO, 80610, USA

Vladimir N. Larin

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Viacheslav Zgonnik .

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary material 1 (PDF 1520 kb)

Rights and permissions.

Reprints and permissions

About this article

Larin, N., Zgonnik, V., Rodina, S. et al. Natural Molecular Hydrogen Seepage Associated with Surficial, Rounded Depressions on the European Craton in Russia. Nat Resour Res 24 , 369–383 (2015). https://doi.org/10.1007/s11053-014-9257-5

Download citation

Received : 16 August 2014

Accepted : 03 November 2014

Published : 15 November 2014

Issue Date : September 2015

DOI : https://doi.org/10.1007/s11053-014-9257-5

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Find a journal
  • Publish with us
  • Track your research
  • History of cooperation
  • Areas of cooperation
  • Procurement policy
  • Useful links
  • Becoming a supplier
  • Procurement
  • Rosatom newsletter

© 2008–2024Valtiollinen Rosatom-ydinvoimakonserni

hypothesis testing in brm

  • Rosatom Global presence
  • Rosatom in region
  • For suppliers
  • Preventing corruption
  • Press centre

Rosatom Starts Life Tests of Third-Generation VVER-440 Nuclear Fuel

  • 16 June, 2020 / 13:00

This site uses cookies. By continuing your navigation, you accept the use of cookies. For more information, or to manage or to change the cookies parameters on your computer, read our Cookies Policy. Learn more

IMAGES

  1. Brm lecture9 hypothesis testing

    hypothesis testing in brm

  2. Hypothesis Testing Solved Problems

    hypothesis testing in brm

  3. Brm lecture9 hypothesis testing

    hypothesis testing in brm

  4. BRM Types of Hypothesis Day 5 part 2 07/07/2020

    hypothesis testing in brm

  5. PPT

    hypothesis testing in brm

  6. Hypothesis Testing for Variance and Standard Deviation using a P-Value

    hypothesis testing in brm

VIDEO

  1. How to choose Confidence of Level?

  2. Hypothesis Testing

  3. 8.1: Basics of Hypothesis Testing

  4. UNIT

  5. اختبارات الفروض Hypothesis Testing

  6. Hypothesis testing #study bs 7 semester statics

COMMENTS

  1. Bayesian Regression: Theory & Practice

    Priors and predictivs in. brms. This tutorial covers how to inspect, set and sample priors in Bayesian regression models with brms. We also look at how to sample from the prior and posterior distribution. The main conceptual take-home message of this tutorial is: The choice of prior should be informed by their effect on the prior (and possibly ...

  2. hypothesis function

    An R object typically of class brmsfit. hypothesis. A character vector specifying one or more non-linear hypothesis concerning parameters of the model. class. A string specifying the class of parameters being tested. Default is "b" for fixed effects. Other typical options are "sd" or "cor".

  3. An Introduction to brms

    We assume the data are conditionally normally distributed. yi ∼ N(μ[j],σ[j]) y i ∼ N ( μ [ j], σ [ j]) for J = 1, 2 for J = 1, 2. We will initially assume that the two groups have equal standard deviations (SD), so that we need only estimate one common SD parameter. We therefore need to estimate three parameters in total, μa,μb, σ μ ...

  4. Chapter 12 Bayesian estimation with brms

    12.1.1.1 Brms family. The family argument in brms::brm() is used to define the random part of the model. The brms package extends the options of the family argument in the glm() function to allow for a much wider class of likelihoods. You can see the help file (help("brmsfamily", package="brms")) for a full list of the current options.Some examples of available options are:

  5. How to calculate contrasts from a fitted brms model

    The slope of group 1 is calculated from the model's parameters by adding the slope of group 0 ( time) and the interaction term time:treatment1. = 0 indicates that we are interested in contrasting the resulting estimate the zero ("testing against zero" or even "testing the null hypothesis").

  6. R Linear Regression Bayesian (using brms)

    Regression - Default Priors. In this exercise you will investigate the impact of Ph.D. students' \(age\) and \(age^2\) on the delay in their project time, which serves as the outcome variable using a regression analysis (note that we ignore assumption checking!). As you know, Bayesian inference consists of combining a prior distribution with the likelihood obtained from the data.

  7. Hypothesis Testing

    There are 5 main steps in hypothesis testing: State your research hypothesis as a null hypothesis and alternate hypothesis (H o) and (H a or H 1 ). Collect data in a way designed to test the hypothesis. Perform an appropriate statistical test. Decide whether to reject or fail to reject your null hypothesis. Present the findings in your results ...

  8. 12 Bayesian Approaches to Testing a Point ("Null") Hypothesis

    12.2.3.2 Bonus: Hypothesis testing in brms. Disclaimer: I'm not a fan of hypothesis texting within the Bayesian framework. So I generally don't use these methods. ... Bayes factor, all parameters related to the hypothesis must have proper priors and argument sample_prior of function brm must be set to "yes". Otherwise Evid.Ratio (and Post ...

  9. Generalised Linear Models with brms

    - Basic knowledge of hypothesis testing and statistical inference; ... The brm function from the brms package performs Bayesian GLM. ... Generalised Linear Models (GLM) in R with glm and lme4 tutorial, we learn that we can use the likelihood ratio test and AIC to assess the goodness of fit of the model(s). However, these two approaches do not ...

  10. Testing Hypotheses

    Up to now the focus has been on point estimation, interpretation of estimators and model validation. In this chapter, the important concept of hypothesis testing is considered and various tests for the BRM are derived. It is interesting to note that in order to achieve some of the results for the BRM, knowledge about the \(EBRM_B^2\) and the \(EBRM_W^2\) is useful.

  11. How to properly compare interacting levels

    I am new to brms, trying to figure out its behaviour in details. I was trying to run essentially an anova-like model with random effects (anova-like in the sense that all IVs are factors). For example, brm(DV ~ A * C + B * C + (1|item) + (1|participant), ... conditional_effects() and hypothesis() are truly amazing analytic tools. However, when you deal with factors and, in particular, their ...

  12. 13.6 Testing hypotheses

    13.6 Testing hypotheses. The brms package also contains a useful function to address hypotheses about model parameters. The function brms::hypothesis can compute Bayes factors for point-valued hypotheses using the Savage-Dickey method. It also computes a binary test of whether a point-valued hypothesis is credible based on inclusion in a Bayesian credible interval.

  13. 12 Multilevel Models

    And you can get the data of a given brm() fit object like so. b12.3 $ data %>% head () ... Random effects structure for confirmatory hypothesis testing: Keep it maximal. 12.4 Multilevel posterior predictions. Producing implied predictions from a fit model, is very helpful for understanding what the model means. Every model is a merger of sense ...

  14. 7 Interactions

    7.1.4.2 Incorporating uncertainty.. To get some idea of the uncertainty around those \(\gamma\) values, we'll need to use the whole posterior. Since \(\gamma\) depends upon parameters, and those parameters have a posterior distribution, \(\gamma\) must also have a posterior distribution. Read the previous sentence again a few times. It's one of the most important concepts in processing ...

  15. Bayesian ordinal regression with random effects using brms

    Bayesian mixed effects (aka multi-level) ordinal regression models with. brms. In the past two years I've found myself doing lots of statistical analyses on ordinal response data from a (Likert-scale) dialectology questionnaire. I've ended up with a good pipeline to run and compare many ordinal regression models with random effects in a ...

  16. statistics

    I've come to realize that a cumulative ordered logistic regression is probably the best way to analyze this data. Using the brms package in R, I've specified my model using dummy coding for the relevant and incidental feature values, setting the reference categories for both to the modes. (I've tried a model with interactions wherein the system ...

  17. hypothesis.brmsfit: Non-Linear Hypothesis Testing in brms: Bayesian

    x: An R object. If it is no brmsfit object, it must be coercible to a data.frame.In the latter case, the variables used in the hypothesis argument need to correspond to column names of x, while the rows are treated as representing posterior draws of the variables.. hypothesis: A character vector specifying one or more non-linear hypothesis concerning parameters of the model.

  18. Chapter 13 Bayesian hypothesis testing with Bayes Factors

    13.2.1 Testing simple point and directional hypotheses for parameters of brms models. The brms::hypothesis() function can be used to test hypotheses about single parameters of brms models. This requires to set the argument sample_prior=TRUE in the brms::brm() function. The brms::hypothesis() function will not work properly without

  19. Study of modified VVER and typical PWR fuel in the HBWR ...

    Two experiments studying the standard and modified VVER fuel fabricated at the Machine-Building Plant (in Elektrostal) and PWR fuel produced according to the typical specifications were performed on the HBWR research reactor (Halden, Norway) from 1995 to 2005. The objective of these experiments was to study the effect of the structural-technological parameters on the behavior of VVER fuel in ...

  20. Natural Molecular Hydrogen Seepage Associated with Surficial ...

    During testing, the accuracy of these detectors ranged ±10%. ... The hypothesis of a primordial origin followed by a still active global degassing of H 2 from deep Earth has been proposed by V.N. Larin (Larin 1993). A recent study suggests that Earth's interior initially could have been largely hydrogen-enriched ...

  21. Rosatom Starts Life Tests of Third-Generation VVER-440 Nuclear Fuel

    The life tests started after successful completion of hydraulic tests (hydraulic filling) of the mock-up with the aim to determine RK3+ hydraulic resistance. Life tests are carried out on a full-scale research hot run-in test bench V-440 and will last for full 1500 hours. The aim of tests is to study mechanical stability of RK3+ components ...

  22. PDF On the Soviet Nuclear Scent

    The biophysicists under Born, as well as Riehl's Auer Company group, were left unaccounted for.The Russians rounded out their atomic recruitment early in 1946 by assembling a group of German scientists under Dr. Heinz Pose, who had worked on nuclear reactor physics at Ronneburg under the German Bureau of Standards.