A Workflow For Bayesian Modelling and Reporting in R

class: title-slide, bottom

# A Workflow For Bayesian Modelling and Reporting in R

<br>

### Mikhail Popov

<i class="fas  fa-envelope "></i> mikhail @ mpopov.com

<i class="fab  fa-twitter "></i> [bearloga](https://twitter.com/bearloga)

<i class="fab  fa-github "></i> [bearloga](https://github.com/bearloga)

<i class="fts" alt="Porkchop"></i> bearloga

<i class="fas  fa-code "></i> [github.com/bearloga/fts-bayesian-r-workflow-2019](https://github.com/bearloga/fts-bayesian-r-workflow-2019)

---

# Disclaimers

## What this talk isn't

- A lesson in Bayesian statistics and probability theory
    - [Bayesian Basics](https://m-clark.github.io/bayesian-basics/) by Michael Clark (free introduction)
    - [Bayesian Methods for Hackers](http://camdavidsonpilon.github.io/Probabilistic-Programming-and-Bayesian-Methods-for-Hackers/) by Cam Davidson-Pilon (notebooks)
    - [Statistical Rethinking](http://xcelab.net/rm/statistical-rethinking/) by Richard McElreath (book)
- An attempt to convince anyone to use Bayesian inference

## What this talk is

- A brief lesson in using tidyverse in your workflow
- Tips for Bayesian modeling with Stan in R
- Suggestions for nice presentation with R Markdown

---

_**However**_, Bayesian inference in a nutshell as a refresher:

.pull-left[
Models are *generative*
- How the data came to be
- Probability statements made about parameters

Prior beliefs are formalized
- Which values are more likely than others
- Constraint/support specification

Models are explicit
- All assumptions are explicit
- Uncertainty captured before **and** after data
- Yield distributions for parameters
]

.pull-right[
  ![:scale 100%](images/bayes.png)
]

.footnote[.smaller[Diagram by [Jason C. Doll and Stephen J. Jacquemin](http://jason-doll.com/wordpress/?page_id=127)]]

---

background-image: url(images/bayonce.jpg)
background-size: cover
background-position: center

---

# Packages

## Essentials

- `tidyverse` (especially `dplyr`, `tidyr`, `purrr`,` broom`)
- `rstan` for modelling
- `bridgesampling` for model comparison
- `kableExtra` for table formatting
- `rmarkdown` for final output

## Suggestions

- [`memor`](https://hebrewseniorlife.github.io/memor/) for a customizable template, which [`wmfpar`](https://github.com/bearloga/wmf-product-analytics-report/) is based on
- [`tinytex`](https://yihui.name/tinytex/) for LaTeX installation

---

# Brief intro to tidyverse

- `%>%` ("then do/apply") is used to pipe data between functions

- **tibbles** are super data frames where cells can be objects and lists

- **dplyr** verbs manipulate data, **tidyr** verbs reshape it

- **purrr** enhances functional programming
  - `map()`, `map2()`, and `pmap()` iterate over 1, 2, or multiple lists/vector
    - if using a formula, `.x` refers to element from first argument
  - all functions are type-stable
    - `map()` returns a list
    - `map_dbl()`/`map_int()` return numeric/integer vectors
    - `map_df()` returns a data frame

- **broom** extracts core results from model objects into tidy data

---

# Example

```r
library(tidyverse)
set.seed(0)
example <- tibble(n_sample = 3:5) %>%
  mutate(
    sampled_states = map(n_sample, sample, x = state.abb),
    concat_states = map_chr(sampled_states, paste0, collapse = ", ")
  )
example
```

```
## # A tibble: 3 x 3
##   n_sample sampled_states concat_states     
##      <int> <list>         <chr>             
## 1        3 <chr [3]>      VT, IN, LA        
## 2        4 <chr [4]>      NH, VT, GA, TX    
## 3        5 <chr [5]>      WV, NC, NM, AZ, GA
```

---

```R
example %>%
  pull(sampled_states) %>%
  map_dfr(~ tibble(state = .x)) %>%
  group_by(state) %>%
  tally() %>%
  ...
```

<table class="table table-striped" style="margin-left: auto; margin-right: auto;">
 <thead>
  <tr>
   <th style="text-align:left;"> state </th>
   <th style="text-align:right;"> n </th>
  </tr>
 </thead>
<tbody>
  <tr>
   <td style="text-align:left;"> AZ </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> GA </td>
   <td style="text-align:right;"> 2 </td>
  </tr>
  <tr>
   <td style="text-align:left;"> IN </td>
   <td style="text-align:right;"> 1 </td>
  </tr>
</tbody>
</table>

```R
  ... %>%
  spread(state, n)
```

---

# Probabilistic programming overview

With probabilistic programming languages (PPLs) you
- Write any generative probabilistic model as a program
- Use algorithms like [MCMC](https://en.wikipedia.org/wiki/Markov_chain_Monte_Carlo) and [VI](https://en.wikipedia.org/wiki/Variational_Bayesian_methods) to draw samples from posterior

PPLs recommended for R users:
- BUGS and [JAGS](http://mcmc-jags.sourceforge.net/) via rjags interface
- [Stan](http://mc-stan.org/) via RStan interface (also has a Python interface PyStan)
- [greta](https://greta-stats.org/) is a PPL in R, powered by TensorFlow

PPLs recommended for Python users:
- [PyMC3](https://docs.pymc.io/) (uses Theano as backend)
- [Edward2](https://github.com/tensorflow/probability/tree/master/tensorflow_probability/python/edward2), included in [TensorFlow Probability](https://www.tensorflow.org/probability/)
- [Pyro](http://pyro.ai/) (uses [PyTorch](https://pytorch.org/) as backend)
- [pomegranate](https://pomegranate.readthedocs.io/en/latest/)

PPL recommended for C# and F# users: [Infer.NET](https://dotnet.github.io/infer/)

---

# Probabilistic programming example

**Goal**: estimate a coin's hidden probability after observing outcome of flips

.small[

```r
library(greta)

x <- rbinom(10, 1, 0.8) # flips a very unfair coin 10 times

theta <- beta(3, 3) # creates a variable following a beta distribution

y <- as_data(x) # creates greta data array
distribution(y) <- binomial(1, theta) # specifies likelihood

m <- model(theta) # creates a greta model object
```
]

`plot(m)` produces a visual representation:

<div id="htmlwidget-7555a6427ac415e056d6" style="width:100%;height:144px;" class="grViz html-widget"></div>
<script type="application/json" data-for="htmlwidget-7555a6427ac415e056d6">{"x":{"diagram":"digraph {\n\ngraph [layout = \"dot\",\n       outputorder = \"edgesfirst\",\n       bgcolor = \"white\",\n       rankdir = \"LR\"]\n\nnode [fontname = \"Helvetica\",\n      fontsize = \"10\",\n      shape = \"circle\",\n      fixedsize = \"true\",\n      width = \"0.5\",\n      style = \"filled\",\n      fillcolor = \"aliceblue\",\n      color = \"gray70\",\n      fontcolor = \"gray50\"]\n\nedge [fontname = \"Helvetica\",\n     fontsize = \"8\",\n     len = \"1.5\",\n     color = \"gray80\",\n     arrowsize = \"0.5\"]\n\n  \"1\" [label = \"theta\n\", fontcolor = \"#000000\", fontsize = \"12\", penwidth = \"2\", shape = \"circle\", color = \"#B2B2B2\", width = \"0.6\", height = \"0.48\", fillcolor = \"#E5E5E5\"] \n  \"2\" [label = \"beta\", fontcolor = \"#000000\", fontsize = \"12\", penwidth = \"2\", shape = \"diamond\", color = \"#4C4C4C\", width = \"1\", height = \"0.8\", fillcolor = \"#B2B2B2\"] \n  \"3\" [label = \"3\", fontcolor = \"#000000\", fontsize = \"12\", penwidth = \"2\", shape = \"square\", color = \"#B2B2B2\", width = \"0.5\", height = \"0.4\", fillcolor = \"#FFFFFF\"] \n  \"4\" [label = \"3\", fontcolor = \"#000000\", fontsize = \"12\", penwidth = \"2\", shape = \"square\", color = \"#B2B2B2\", width = \"0.5\", height = \"0.4\", fillcolor = \"#FFFFFF\"] \n  \"5\" [label = \"binomial\", fontcolor = \"#000000\", fontsize = \"12\", penwidth = \"2\", shape = \"diamond\", color = \"#4C4C4C\", width = \"1\", height = \"0.8\", fillcolor = \"#B2B2B2\"] \n  \"6\" [label = \"1\", fontcolor = \"#000000\", fontsize = \"12\", penwidth = \"2\", shape = \"square\", color = \"#B2B2B2\", width = \"0.5\", height = \"0.4\", fillcolor = \"#FFFFFF\"] \n  \"7\" [label = \"y\n\", fontcolor = \"#000000\", fontsize = \"12\", penwidth = \"2\", shape = \"square\", color = \"#B2B2B2\", width = \"0.5\", height = \"0.4\", fillcolor = \"#FFFFFF\"] \n\"1\"->\"5\" [color = \"Gainsboro\", fontname = \"Helvetica\", fontcolor = \"gray\", fontsize = \"11\", penwidth = \"3\", label = \"prob\", style = \"solid\"] \n\"2\"->\"1\" [color = \"Gainsboro\", fontname = \"Helvetica\", fontcolor = \"gray\", fontsize = \"11\", penwidth = \"3\", penwidth = \"3\", style = \"dashed\"] \n\"3\"->\"2\" [color = \"Gainsboro\", fontname = \"Helvetica\", fontcolor = \"gray\", fontsize = \"11\", penwidth = \"3\", label = \"shape1\", style = \"solid\"] \n\"4\"->\"2\" [color = \"Gainsboro\", fontname = \"Helvetica\", fontcolor = \"gray\", fontsize = \"11\", penwidth = \"3\", label = \"shape2\", style = \"solid\"] \n\"5\"->\"7\" [color = \"Gainsboro\", fontname = \"Helvetica\", fontcolor = \"gray\", fontsize = \"11\", penwidth = \"3\", penwidth = \"3\", style = \"dashed\"] \n\"6\"->\"5\" [color = \"Gainsboro\", fontname = \"Helvetica\", fontcolor = \"gray\", fontsize = \"11\", penwidth = \"3\", label = \"size\", style = \"solid\"] \n}","config":{"engine":"dot","options":null}},"evals":[],"jsHooks":[]}</script>

---

# Priors

**tinydensR** can be used to play around with probability distributions and their parameters in RStudio IDE, which is especially useful for picking priors:

```R
# install.packages("devtools")
devtools::install_github("bearloga/tinydensR")
```

.left-column[![](images/addin.png)]

.right-column[.center[![:scale 55%](images/beta_prior.png)]]

---

Once we have a model, we can perform statistical inference:

```r
draws <- mcmc(m, verbose = FALSE) # performs inference
```

```r
bayesplot::mcmc_dens(draws) # plots posterior of theta
```

---

`tidyMCMC` works with `mcmc.list` objects (such as those produced by **greta**), **rjags** objects, and **stanfit** objects.

```r
broom::tidyMCMC(draws, conf.int = TRUE)
```

```
## # A tibble: 1 x 5
##   term  estimate std.error conf.low conf.high
##   <chr>    <dbl>     <dbl>    <dbl>     <dbl>
## 1 theta    0.624     0.118    0.378     0.835
```

By default the point estimate is the mean and the [credible interval](https://en.wikipedia.org/wiki/Credible_interval) is computed via quantiles. Alternatively we can get the median and the interval with the *highest 95% posterior density*:

```r
broom::tidyMCMC(draws, conf.int = TRUE,
                estimate.method = "median",
                conf.method = "HPDinterval")
```

```
## # A tibble: 1 x 5
##   term  estimate std.error conf.low conf.high
##   <chr>    <dbl>     <dbl>    <dbl>     <dbl>
## 1 theta    0.631     0.118    0.388     0.843
```

.footnote[These will soon be moved to the **broom.mixed** 📦 which is not yet on CRAN]

---

# Case study: advertisement campaign

.pull-left[.center[![](images/plot-1.png)]]

.pull-right[
- We manage a store
- We get 300 customers/day on average
- We run an ad for 30 days to increase business
- Ad takes a few days to yield full effect
- Afterward there's a gradual change to new normal
]

<br>

.center[.large[**What were the short-term and long-term effects of the ad?**]]

.footnote[For simulation code see [report.Rmd#L84-L121](https://github.com/fts-bayesian-r-workflow-2019/blob/master/report.Rmd#L84-L121); for example report see [report.pdf](https://github.com/bearloga/fts-bayesian-r-workflow-2019/blob/master/report.pdf)]

---

# Table formatting example

.left-column[![](images/table_params.png)]

.right-column[.smaller[
```R
data_frame(
   parameter = c(
    "$\\lambda_1$ (old normal)",
    ⋮
    "$N$ (total days)",
    ⋮
    "$d_2$ (time to new normal)"
  ),
  value = c(
    lambda_1, lambda_2, lambda_3, total_days,
    change_point_1, change_point_2,
    transition_period_1, transition_period_2
  )
) %>%
  kable(
    escape = FALSE, booktabs = TRUE,
    caption = "Simulation parameters"
  ) %>%
  # extra formatting with kableExtra:
  kable_styling(latex_options = "hold_position") %>%
  group_rows(index = c("Rates (unknown)" = 3,
                       "Other parameters" = 5))
```
]]

---

# Model specification in Stan

.small[
```Stan
data {
  int<lower=0> N; // number of observations
  int y[N];       // observed data
  real T[2];      // changepoint days
}
parameters {
  real<lower=0> lambda[3];
}
model {
  lambda ~ normal(300, 100); // prior
  for (i in 1:N) {
    if (i > T[1]) {
      if (i <= T[2]) {
        y[i] ~ poisson(lambda[2]); // likelihood during ad
      } else {
        y[i] ~ poisson(lambda[3]); // likelihood after ad
      }
    } else {
      y[i] ~ poisson(lambda[1]);   // likelihood pre-ad
    }
  }
}
```
]

---

# Model inference with RStan

```R
library(rstan)

models <- list(
  model_1 = stan_model("models/model_1.stan", "immediate"),
  model_2 = stan_model("models/model_2.stan", "gradual")
)
```

`stan_model()` translates the model spec to C++ and then compiles it into an executable which is used for inference with `sampling()` (if `data` is provided):

```R
data <- list(
  N = total_days, y = customers$customers,
  T = c(change_point_1, change_point_2),
  d = c(transition_period_1, transition_period_2)
)

draws <- map(models, sampling, data = data, refresh = 0)
```

.footnote[**Note**: if each model required a different dataset, we can have a "datas" list and use `map2(models, datas, sampling)` to iterate through each model-data combination]

---

# Model comparison

![](images/table_models-hl.png)

```R
extract_ci = function(draws, ...) {
  extracted_ci <- draws %>%
    tidyMCMC(pars = "lambda", conf.int = TRUE, ...) %>%
    rename(est = estimate) %>%
    mutate(est = sprintf("%.1f (%.1f--%.1f)", est,
                         conf.low, conf.high)) %>%
    select(term, est) %>%
    spread(term, est)
  return(extracted_ci)
}
```

---

`bridgesampling::bridge_sampler` calculates marginal likelihood `$p(\mathcal{D}|\mathcal{M})$`\* and is best used with `stanfit` models

```R
library(bridgesampling)

bridge_samples <- map(draws, bridge_sampler, silent = TRUE)
```

```R
posterior_probabilities <- data_frame(
  model = names(bridge_samples),
  post_prob = post_prob(map_dbl(bridge_samples, logml))
)
```

**Note**: `post_prob` also accepts a `prior_prob` argument (by default all models are equally likely a priori)

```R
extracted_cis <- map_dfr(draws, extract_ci, .id = "model")
```

.footnote[\[\*\] well, technically `bridge_sampler` computes the log marginal likelihood]

---

.center[![](images/table_models.png)]

.small[
```R
posterior_probabilities %>%
  left_join(extracted_cis, by = "model") %>%
  mutate(
    post_prob = sprintf("%.4f", post_prob),
    model = sub("model_([1-3])", "$\\\\mathcal{M}_\\1$", model),
  ) %>%
  kable(
    col.names = c("Model", "$\\text{Pr}(\\mathcal{M}|\\mathcal{D})$",
                  sprintf("$\\lambda_%i$ (%.0f)", 1:3,
                          c(lambda_1, lambda_2, lambda_3))),
    booktabs = TRUE, escape = FALSE, align = c("l", "r", "c", "c", "c")
  ) %>%
  kable_styling(latex_options = c("hold_position")) %>%
  add_header_above(header = c(" " = 2,
                              "Estimate (95% Credible Interval)" = 3))
```
]

---

# Inline Bayes factor

From the example report:

![Bayes factor inline](images/inline_bf.png)

Made with inline R code in R Markdown:

.small[
```
Using **bridgesampling** [@bridgesampling] allows us to calculate
marginal likelihoods -- $p(\mathcal{D}|\mathcal{M})$ -- of Stan
models really easily and therefore compute the BF (see `?bf`),
which -- for $\mathcal{M}_1$ compared to $\mathcal{M}_2$, for
example -- comes out to be `r
round(bayes_factor(bridge_samples$model_1, bridge_samples$model_2)$bf, 2)`,
meaning there is no evidence for choosing model 1 over model 2.
```
]

---

# Presentation of results

.pull-left[
.bold[.tiny[
```R
prepared_table <- draws$model_2 %>%
  tidyMCMC(
    pars = c(
      "lambda", "diff21",
      "diff32", "diff31"
    ),
    estimate.method = "median",
    conf.int = TRUE,
    conf.method = "HPDinterval"
  ) %>%
  mutate(
    term = c(
      paste0("$\\lambda_", 1:3, "$"),
      "$\\lambda_2 - \\lambda_1$",
      "$\\lambda_3 - \\lambda_2$",
      "$\\lambda_3 - \\lambda_1$"
    ),
    truth = c(lambda_1, lambda_2,
              lambda_3, differences),
    conf.int = sprintf("(%.1f, %.1f)",
                       conf.low, conf.high)
  ) %>%
  select(term, truth, estimate, conf.int)
```
]]
]

.pull-right[
![Table of results](images/table_results.png)

.bold[.tiny[
```R
kable(prepared_table,
      escape = FALSE, booktabs = TRUE,
      digits = 1, align = c("l", "r", "r", "c"),
      col.names = c(
       "Parameter", "Truth", "Point Estimate",
       "95\\% Credible Interval"),
      caption = "Inference using $\\mathcal{M}_2$.") %>%
  kable_styling(
    latex_options = c("striped", "hold_position")
  ) %>%
  group_rows(index = c("Rates" = 3,
                       "Differences" = 3))
```
]]
]

---
class: center, middle

# Thank you

Questions? mikhail @ mpopov.com

or @bearloga in <i class="fab  fa-slack "></i>

<br>

Source code: [github.com/bearloga/fts-bayesian-r-workflow-2019](https://github.com/bearloga/fts-bayesian-r-workflow-2019)

Slides created with [rmarkdown](https://rmarkdown.rstudio.com/) & [xaringan](https://github.com/yihui/xaringan)