Discrete Choice Models in PyMC-Marketing

A Choose-Your-Own-Adventure in Bayesian Consumer Choice Modeling

Nathaniel Forde

Preliminaries

Who am I?

Code or it didn’t Happen

The worked examples used here can be found here

My Website

The Pitch: Choice Matters

Consumer choice is everywhere — from cereal to cars to climate systems. Business success hinges on understanding these choices: pricing, product design, market segmentation.

Classic models fall short with unrealistic assumptions and pricing experiments can be costly.

Use better models with PyMC-Marketing and simulate pricing interventions safely.

Agenda

  • The Choice Setting and Bayesian Background
  • Choice Data and PyMC-Marketing
  • Mulitnomial Logit and the IIA Problem
  • Nested Logit and Market Structure
  • Choose your own Adventure

The Choice Setting and Bayesian Background

The Adventure Begins:

Choosing our Own Path

  • 21 Discrete Possible Endings!
  • The Luxury of being able to explore all paths
  • Deterministic Routing and preferred Endings
  • Peeking ahead and “cheating” fate.

The Scaling Difficulty

Genuine Uncertainty and the Multiplicity of Paths

Growing up is grappling with uncertainty
  • Continuum Possible Endings
  • Impossibility of precise Survey and Oversight
  • Probabilistic Choice over Paths
  • Certainty (if any) only in Expectation and Breadth of Possibility

Sampling the Path Trajectories

Bayesian Inference over Unrealised Worlds

Inference: What is the most plausible world given the data?

\[ p(\theta_{w_{i}} | Y) = \dfrac{p(\theta_{w_{i}})p(Y | \theta_{i})}{\sum_{j}^{N} p(\theta_{w_j})p(Y | \theta_{w_j}) }\]

Counterfactual Inference: What plausibly happens in nearby worlds?

  • \(\mathbf{\theta_{w_{1}}} \rightsquigarrow\)

  • \(\mathbf{\theta_{w_{2}}} \rightsquigarrow\)

  • \(\mathbf{\theta_{w_{3}}} \rightsquigarrow\)

  • \(f(\alpha_{w_1}, \beta_{w_1}^{0}, \beta_{w_1}^{1}) \rightsquigarrow\)

  • \(f(\alpha_{w_2}, \beta_{w_2}^{0}, \beta_{w_2}^{1}) \rightsquigarrow\)

  • \(f(\alpha_{w_3}, \beta_{w_3}^{0}, \beta_{w_3}^{1}) \rightsquigarrow\)

  • \(p(Y |w_1)\) Small image \(\downarrow\)
  • \(p(Y | w_2)\) Small image \(\downarrow\)
  • \(p(Y | w3)\) Small image\(\downarrow\)

The Path Before Us

The Consumer’s Dilemma: What drives us to Choose in the face such vast Possibility?

Revealed Preference

Utility driven Choice

Choice

The Consumer’s Dilemma

“What drives human choice?”

  • Utility maximization
  • Rational decision-making
  • Expected value calculations
  • Impulse and advertising?

Each purchase tells a story of preference…

World State Informs Choice

The Mathematical Foundation of Decision

The utility function forms the cornerstone of choice modeling:

\[\color{red}U_{ij} = \color{blue}\alpha_{ij} + \color{blue}\beta_{ij}^{1} \color{black}\cdot X_{ij}^{1} + \color{blue} \beta_{ij}^{2} \color{black}\cdot X_{ij}^{2} \]

\[P_{ij} = \frac{\exp(\color{red}U_{ij})}{\sum_{k=1}^{J} \exp(\color{red}U_{ik})} \]

Where:

  • \(U_{ij}\) represents the systematic utility for individual \(i\) choosing alternative \(j\) at a particular world state \(w = \{ \color{blue}\alpha_{ij}, \color{blue}\beta_{ij}^{1}, \color{blue}\beta_{ij}^{2} X_{ij}^{1}, X_{ij}^{2} \}\)
  • \(X_{ij}\) represents observed covariates (product attributes) and \(\color{blue}\theta\) structural parameters within the world \(w\).
  • Collectively, the mathematical structure and world-state are combined to make a theory of the data generating process for the world \(w\).

Relative Utility and Marginal Effects

Paths Diverge

Do you value the company of others? Do you fear it? What about the average cave dweller?

Choice

\[ u(\text{Light shaft + Silence}) - u(\text{Glowing Fire + Conversational Echoes}) > 0?\]

Choice Data and PyMC-Marketing

PyMC-Marketing

Choice

Bayesian Marketing Analytics

  • Uncertainty-aware decisions: model not just what people choose, but how sure we are.
  • Causal inference ready: run interventions, not just predictions.
  • Modern Bayesian engine for scalability, flexibility, and transparency.
  • Intuitive syntax with powerful underlying math.

Choice Data

Choice Scenarios specified with attributes and choice outcomes for each discrete alternative

Utility Matrix

Intuitive Formula Specification

\[ \begin{split} \begin{split} \begin{pmatrix} u_{gc} \\ u_{gr} \\ u_{ec} \\ u_{er} \\ u_{hp} \\ \end{pmatrix} = \begin{pmatrix} gc_{ic} & gc_{oc} \\ gr_{ic} & gr_{oc} \\ ec_{ic} & ec_{oc} \\ er_{ic} & er_{oc} \\ hp_{ic} & hp_{oc} \\ \end{pmatrix} \begin{pmatrix} \color{blue}\beta_{ic} \\ \color{blue}\beta_{oc} \\ \end{pmatrix} \end{split} \end{split} \]


utility_formulas = [
    "gc ~ ic_gc + oc_gc ",
    "gr ~ ic_gr + oc_gr ",
    "ec ~ ic_ec + oc_ec ",
    "er ~ ic_er + oc_er ",
    "hp ~ ic_hp + oc_hp ",
]

mnl = MNLogit(df, utility_formulas, "depvar", covariates=["ic", "oc"])
mnl

Multinomial Logit and the IIA Problem

Multinomial Logit: The Simple Path Forward

The probability of choosing alternative \(j\) follows the elegant logistic form:

\[\frac{\exp(\color{red}U_{ij})}{\sum_{k=1}^{J} \exp(\color{red}U_{ik})} = P_{ij} \Rightarrow s_{j}(\color{blue}\theta_{w})=P(u_{j}>u_{k};\forall_{k̸=j})\]

Key Aspects

  • New Model Class: MNLogit with Wilkinson-style formula interface
  • Alternative-Specific Attributes: Price, quality, brand effects
  • Individual Fixed Attributes: Income, demographics, preferences
  • Causal Inference Ready: Built-in counterfactual analysis

PyMC-Marketing Implementation

Trace Paths through Possible Worlds


utility_formulas = [
    "gc ~ ic_gc + oc_gc | income + rooms + agehed",
    "gr ~ ic_gr + oc_gr | income + rooms + agehed",
    "ec ~ ic_ec + oc_ec | income + rooms + agehed",
    "er ~ ic_er + oc_er | income + rooms + agehed",
    "hp ~ ic_hp + oc_hp | income + rooms + agehed",
]

mnl = MNLogit(df, utility_formulas, "depvar", covariates=["ic", "oc"])
mnl.sample()

The IIA Problem - A Plot Twist

  • 50% Red Bus 50% Car
  • What happens if we introduce a Blue Bus?
  • 33% Red Bus, 33% Blue Bus, 33% Car

The Multinomial Logit enforces the Indepdence of Irrelevant Alternatives property into preference calculations.

\[\dfrac{P_{j}}{P_{i}} = \dfrac{ \dfrac{e^{U_{j}}}{\sum_{i}^{n}e^{U_{k}}}}{\dfrac{e^{U_{i}}}{\sum_{i}^{n}e^{U_{k}}}} = \dfrac{e^{U_{j}}}{e^{U_{i}}} = e^{U_{j} - U_{k}}\]

Key Take-away: The Model Ignores Market Structure

Counterfactual Inference

Counterfactual Plausibility as Criteria of Adequacy

new_policy_df = df.copy()
new_policy_df[["ic_ec", "ic_er"]] = new_policy_df[["ic_ec", "ic_er"]] * 1.5

## Posterior Predictive Forecast under counterfactual setting
idata_new_policy = mnl.apply_intervention(new_choice_df=new_policy_df)

## Compare Old and New Policy Settings
change_df = mnl.calculate_share_change(mnl.idata, mnl.intervention_idata)
change_df

Proportional Substitution Patterns

Product Switching follows prior Market share irrespective of product type

Here be Dragons

Choice

The Nested Logit and Market Structure

Considered Choice

Branching Probability Trees

  • \(U = Y + W\)

  • \(P(i) \text{ when } i \in Alts\)

  • \(P(\text{choose nest B}) \cdot P(\text{choose i} | \text{ i} \in \text{B})\)

  • \(P(\text{choose nest B}) = \dfrac{e^{W + \lambda_{k}I_{k}}}{\sum_{l=1}^{K} e^{W + \lambda_{l}I_{l}}}\)

  • \(P(\text{choose i} | \text{ i} \in \text{B}) = \dfrac{e^{Y_{i} / \lambda_{k}}}{\sum_{j \in B_{k}} e^{Y_{j} / \lambda_{k}}}\)

\(I_{k} = ln \sum_{j \in B_{k}} e^{Y_{j} / \lambda_{k}} \\ \text{ and } \lambda_{k} \sim Beta(1, 1)\)

The log-sum component allows for the utility of any alternatives within a nest to “bubble up” and influence the attractiveness of the overall nest.

Choice

Nesting Structure in PyMC-Marketing

utility_formulas = [
    "gc ~ ic_gc + oc_gc | income + rooms ",
    "ec ~ ic_ec + oc_ec | income + rooms ",
    "gr ~ ic_gr + oc_gr | income + rooms ",
    "er ~ ic_er + oc_er | income + rooms ",
    "hp ~ ic_hp + oc_hp | income + rooms ",
]

nesting_structure = {"central": ["gc", "ec"], "room": ["hp", "gr", "er"]}

nstL_1 = NestedLogit(
    df,
    utility_formulas,
    "depvar",
    covariates=["ic", "oc"],
    nesting_structure=nesting_structure,
    model_config={
        "alphas_": Prior("Normal", mu=0, sigma=5, dims="alts"),
        "betas": Prior("Normal", mu=0, sigma=1, dims="alt_covariates"),
        "betas_fixed_": Prior("Normal", mu=0, sigma=1, dims="fixed_covariates"),
        "lambdas_nests": Prior("Beta", alpha=2, beta=2, dims="nests"),
    },
)
nstL_1

Behaviourial Insight in Product Preference

The relative importance of product attributes implied by our observed data

The relative importance of installation costs versus operating costs might suggest where to impose a novel pricing strategy?

Credible Counterfactuals

Non-Proportional Substitution

new_policy_df = df.copy()
new_policy_df[["ic_ec", "ic_er"]] = new_policy_df[["ic_ec", "ic_er"]] * 1.5

idata_new_policy_1 = nstL_1.apply_intervention(new_choice_df=new_policy_df)
change_df_1 = nstL_1.calculate_share_change(nstL_1.idata, nstL_1.intervention_idata)
change_df_1

Nested Logit allows for patterns of Non-Proportional Substitution under counterfactual settings

Market Interventions and Implied Worlds

Conclusion: Explore the Branching Worlds

\[ w = \{ \alpha, \beta^{1}, \beta_{2}, X^{1}, X^{2} \} \\ \Rightarrow w^{*} = \{ \alpha, \beta^{*}, \beta_{2}, X^{*}, X^{2} \} \]

with pm.do(
    model,
    {"X1": np.ones(len(df)), 
    "beta1": 0.5},
    prune_vars=True,
) as counterfactual_model:
    idata_trt = pm.sample_posterior_predictive(idata, 
    var_names=["like", "p"])

Causal Inference with the Do-Operator modifies world-state and data alike.

  • Bayesian Models offer a data-informed tool for simulation experiments for ceteris-paribus inference.
  • Plausible counterfactual inference depends on compelling world structure encoded in the model.
  • Nested Logit models in PyMC-Marketing allows us to encode market structures for compelling intervention studies.
  • Revealing behaviorial insights and unlocking new marketing strategies for your adventure!