PyCon Ireland
Data Science @ Personio
and Open Source Contributor @ PyMC
11/11/23
Predictive Models are inadequate decision making tools in product demand analysis.
Accept no substitutes for bayesian discrete choice models and causal inference.
A brief survey of historical use-cases and the kinds of problems Bayesian discrete choice models can solve.
“Transport projects involve sinking money in expensive capital investments, which have a long life and wide repercussions. There is no escape from the attempt both to estimate the demand for their services over twenty or thirty years and to assess their repercussions on the economy as a whole.” - Denys Munby, Transport, 1968 ”
We elaborate the basic multinomial model for choice and a problem with it.
Gas Central Heating and Electrical Central Heating described by their cost of installation and operation.
choice_id | chosen | ic_gc | oc_gc | … | oc_ec |
---|---|---|---|---|---|
1 | gc | 866 | 200 | … | 542 |
2 | ec | 802 | 195 | … | 510 |
3 | er | 759 | 203 | … | 495 |
4 | gr | 789 | 220 | … | 502 |
\[ \begin{split} \begin{pmatrix} \color{red}{u_{gc}} \\ \color{purple}{u_{gr}} \\ \color{orange}{u_{ec}} \\ \color{teal}{u_{er}} \\ 0 \end{pmatrix} = \begin{pmatrix} \color{red}{\alpha_{gc}} + \color{blue}{\beta_{ic}}gc_{ic} + \color{blue}{\beta_{oc}}gc_{oc} \\ \color{purple}{\alpha_{gr}} + \color{blue}{\beta_{ic}}gr_{ic} + \color{blue}{\beta_{oc}}gr_{oc} \\ \color{orange}{\alpha_{ec}} + \color{blue}{\beta_{ic}}ec_{ic} + \color{blue}{\beta_{oc}}ec_{oc} \\ \color{teal}{\alpha_{er}} + \color{blue}{\beta_{ic}}er_{ic} + \color{blue}{\beta_{oc}}er_{oc} \\ 0 + 0 + 0 \end{pmatrix} \end{split} \]
Utility determines choice probability of choice:
\[\text{softmax}(\color{green}{u})_{j} = \frac{\exp(\color{green}{u_{j}})}{\sum_{q=1}^{J}\exp(\color{green}{u_{q}})}\]
choices determine market share where:
\[ s_{j}(\mathbf{\color{blue}{\beta}}) = P(\color{green}{u_{j}} > \color{green}{u_{k}}; ∀k ̸= j) \]
The model is traditionally estimated with maximum likelihood caclulations.
\[ L(\color{blue}{\beta}) = \prod s_{j}(\mathbf{\color{blue}{\beta}}) \]
or taking the log:
\[ l(\color{blue}{\beta}) = \sum log(s_{j}(\mathbf{\color{blue}{\beta}})) \] \[ \text{ We find: } \underset{\color{blue}{\beta}}{\mathrm{argmax}} \text{ } l(\color{blue}{\beta}) \]
Results are often brittle!
To evaluate the integrals in the Bayesian model we use MCMC to estimate conditional probabilities of the joint distribution.
\[\underbrace{\color{blue}{\beta}}_{\text{prior draws}} \sim Normal(0, 1) \]
\[ \underbrace{p(\color{blue}{\beta} | D)}_{\text{posterior draws}} = \frac{p(\mathbb{\color{blue}{\beta}})p(D | \color{blue}{\beta} )}{\int_{i}^{n} p(D | \mathbf{\color{blue}{\beta_{i}}})p(\mathbf{\color{blue}{\beta_{i}}}) } \]
Priors can be used flexibly to regularise and improve reliability of estimation across structural causal models.
with pm.Model(coords=coords) as model_2:
## Priors for the Beta Coefficients
beta_ic = pm.Normal("beta_ic", 0, 1)
beta_oc = pm.Normal("beta_oc", 0, 1)
alphas = pm.Normal("alpha", 0, 1, dims="alts_intercepts")
## Construct Utility matrix and Pivot using an intercept per alternative
u0 = alphas[0] + beta_ic * wide_heating_df["ic.ec"] + beta_oc * wide_heating_df["oc.ec"]
u1 = alphas[1] + beta_ic * wide_heating_df["ic.er"] + beta_oc * wide_heating_df["oc.er"]
u2 = alphas[2] + beta_ic * wide_heating_df["ic.gc"] + beta_oc * wide_heating_df["oc.gc"]
u3 = alphas[3] + beta_ic * wide_heating_df["ic.gr"] + beta_oc * wide_heating_df["oc.gr"]
u4 = np.zeros(N) # Outside Good
s = pm.math.stack([u0, u1, u2, u3, u4]).T
## Apply Softmax Transform
p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs"))
## Likelihood
choice_obs = pm.Categorical("y_cat", p=p_, observed=observed, dims="obs")
The beta coefficients in the model are interpreted as weights of utility. However, the precision in these latent terms is relative to the variance of unobserved factors.
The utility scale is not fixed, but the ratio \(\frac{\beta_{ic}}{\beta_{oc}}\) is invariant.
Model tries to account for pathological patterns of sustitution between goods by incorporating covariance structures.
\[ \alpha_{i} \sim Normal(\mathbf{0}, \color{brown}{\Gamma}) \]
\[ \begin{split} \begin{pmatrix} \color{red}{u_{gc}} \\ \color{purple}{u_{gr}} \\ \color{orange}{u_{ec}} \\ \color{teal}{u_{er}} \\ 0 \end{pmatrix} = \begin{pmatrix} \color{red}{\alpha_{gc}} + \color{blue}{\beta_{ic}}gc_{ic} + \color{blue}{\beta_{oc}}gc_{oc} \\ \color{purple}{\alpha_{gr}} + \color{blue}{\beta_{ic}}gr_{ic} + \color{blue}{\beta_{oc}}gr_{oc} \\ \color{orange}{\alpha_{ec}} + \color{blue}{\beta_{ic}}ec_{ic} + \color{blue}{\beta_{oc}}ec_{oc} \\ \color{teal}{\alpha_{er}} + \color{blue}{\beta_{ic}}er_{ic} + \color{blue}{\beta_{oc}}er_{oc} \\ 0 + 0 + 0 \end{pmatrix} \end{split} \]
\[ \begin{split} \color{brown}{\Gamma} = \begin{pmatrix} \color{red}{1} , \gamma , \gamma , \gamma \\ \gamma , \color{blue}{1} , \gamma , \gamma \\ \gamma , \gamma , \color{orange}{1} , \gamma \\ \gamma , \gamma , \gamma , \color{teal}{1} \end{pmatrix} \end{split} \]
with pm.Model(coords=coords) as model_3:
beta_ic = pm.Normal("beta_ic", 0, 1)
beta_oc = pm.Normal("beta_oc", 0, 1)
beta_income = pm.Normal("beta_income", 0, 1 dims="alts_intercepts")
chol, corr, stds = pm.LKJCholeskyCov(
"chol", n=4, eta=2.0,
sd_dist=pm.Exponential.dist(1.0, shape=4)
)
alphas = pm.MvNormal("alpha", mu=0, chol=chol, dims="alts_intercepts")
u0 = (
alphas[0]
+ beta_ic * wide_heating_df["ic.gc"]
+ beta_oc * wide_heating_df["oc.gc"]
+ beta_income[0] * wide_heating_df["income"]
)
u1 = (
alphas[1]
+ beta_ic * wide_heating_df["ic.gc"]
+ beta_oc * wide_heating_df["oc.gc"]
+ beta_income[1] * wide_heating_df["income"]
)
u2 = (
alphas[2]
+ beta_ic * wide_heating_df["ic.gc"]
+ beta_oc * wide_heating_df["oc.gc"]
+ beta_income[2] * wide_heating_df["income"]
)
u3 = (
alphas[3]
+ beta_ic * wide_heating_df["ic.gr"]
+ beta_oc * wide_heating_df["oc.gr"]
+ beta_income[3] * wide_heating_df["income"]
)
u4 = np.zeros(N) # pivot
s = pm.math.stack([u0, u1, u2, u3, u4]).T
p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs"))
choice_obs = pm.Categorical("y_cat", p=p_, observed=observed, dims="obs")
Models as laboratories for experimentation
With a fitted PyMC model we can counterfactually reset the values for the input data and regenerate the posterior predictive distribution holding else equal in the data generating process.
# update values of predictors with new 20%
# price increase in operating costs for electrical options
with model_3:
pm.set_data({"oc_ec": wide_heating_df["oc.ec"] * 1.2,
"oc_er": wide_heating_df["oc.er"] * 1.2})
# use the updated values and predict outcomes and probabilities:
idata_new_policy = pm.sample_posterior_predictive(
idata_m3,
var_names=["p", "y_cat"],
return_inferencedata=True,
predictions=True,
extend_inferencedata=False,
random_seed=100,
)
idata_new_policy
Inducing covariance structures by allowing individual deviations from mean utilities.
person_id | choice_id | chosen | nabisco_price | keebler_price |
---|---|---|---|---|
1 | 1 | nabisco | 3.40 | 2.00 |
1 | 2 | nabisco | 3.45 | 2.50 |
1 | 3 | keebler | 3.60 | 2.70 |
2 | 1 | keebler | 3.48 | 2.20 |
2 | 2 | keebler | 3.30 | 2.25 |
\[ \begin{split} \begin{pmatrix} \color{red}{u_{i, nb}} \\ \color{purple}{u_{i, kb}} \\ \color{orange}{u_{i, sun}} \\ 0 \end{pmatrix} = \begin{pmatrix} (\color{red}{\alpha_{nb}} + \beta_{i}) + \color{blue}{\beta_{p}}p_{nb} + \color{green}{\beta_{disp}}d_{nb} \\ (\color{purple}{\alpha_{kb}} + \beta_{i}) + \color{blue}{\beta_{p}}p_{kb} + \color{green}{\beta_{disp}}d_{kb} \\ (\color{orange}{\alpha_{sun}} + \beta_{i}) + \color{blue}{\beta_{p}}p_{sun} + \color{green}{\beta_{disp}}d_{sun} \\ 0 + 0 + 0 \end{pmatrix} \end{split} \]
with pm.Model(coords=coords) as model_4:
beta_feat = pm.TruncatedNormal("beta_feat", 0, 1, upper=10, lower=0)
beta_disp = pm.TruncatedNormal("beta_disp", 0, 1, upper=10, lower=0)
## Stronger Prior on Price to ensure
## an increase in price negatively impacts utility
beta_price = pm.TruncatedNormal("beta_price", 0, 1, upper=0, lower=-10)
alphas = pm.Normal("alpha", 0, 1, dims="alts_intercepts")
beta_individual = pm.Normal("beta_individual", 0, 0.05,
dims=("individuals", "alts_intercepts"))
u0 = (
(alphas[0] + beta_individual[person_indx, 0])
+ beta_disp * c_df["disp.sunshine"]
+ beta_feat * c_df["feat.sunshine"]
+ beta_price * c_df["price.sunshine"]
)
u1 = (
(alphas[1] + beta_individual[person_indx, 1])
+ beta_disp * c_df["disp.keebler"]
+ beta_feat * c_df["feat.keebler"]
+ beta_price * c_df["price.keebler"]
)
u2 = (
(alphas[2] + beta_individual[person_indx, 2])
+ beta_disp * c_df["disp.nabisco"]
+ beta_feat * c_df["feat.nabisco"]
+ beta_price * c_df["price.nabisco"]
)
u3 = np.zeros(N) # Outside Good
s = pm.math.stack([u0, u1, u2, u3]).T
# Reconstruct the total data
## Apply Softmax Transform
p_ = pm.Deterministic("p", pm.math.softmax(s, axis=1), dims=("obs", "alts_probs"))
## Likelihood
choice_obs = pm.Categorical("y_cat", p=p_, observed=observed, dims="obs")
“Models… [are] like sonnets for the poet, [a] means to express accounts of life in exact, short form using languages that may easily abstract or analogise, and involve imaginative choices and even a certain degree of playfulness in expression” - Mary Morgan in The World in the Model
Discrete Choice with PyMC