A Choose-Your-Own-Adventure in Bayesian Consumer Choice Modeling
Who am I?
Code or it didn’t Happen
The worked examples used here can be found here
My Website
Consumer choice is everywhere — from cereal to cars to climate systems. Business success hinges on understanding these choices are driven by pricing, product design, market segmentation strategies.
PyMC-Marketing allows you to simulate product interventions safely, and learn the expected impact of new product strategies
Inference: What is the most plausible world given the data?
\[ p(\theta_{w_{i}} | Y) = \dfrac{p(\theta_{w_{i}})p(Y | \theta_{i})}{\sum_{j}^{N} p(\theta_{w_j})p(Y | \theta_{w_j}) }\]
Counterfactual Inference: What plausibly happens in nearby worlds?
\(\mathbf{\theta_{w_{1}}} \rightsquigarrow\)
\(\mathbf{\theta_{w_{2}}} \rightsquigarrow\)
\(\mathbf{\theta_{w_{3}}} \rightsquigarrow\)
\(f(\alpha_{w_1}, \beta_{w_1}^{0}, \beta_{w_1}^{1}) \rightsquigarrow\)
\(f(\alpha_{w_2}, \beta_{w_2}^{0}, \beta_{w_2}^{1}) \rightsquigarrow\)
\(f(\alpha_{w_3}, \beta_{w_3}^{0}, \beta_{w_3}^{1}) \rightsquigarrow\)
Worlds can be ranked in terms of:
probability
desirability
Learning the drivers of desirability helps determine the probability of human choice and action.
Fixing the attributes of different alternatives allows us to estimate their desirability and course of probable choice.
“How can we learn what drives human choice?”
Each path taken tells a story of preference and reveals something about how the attributes of each alternative tempt or repel the chooser
The utility function forms the cornerstone of choice modeling:
\[\color{red}U_{ij} = \color{blue}\alpha_{ij} + \color{blue}\beta_{ij}^{1} \color{black}\cdot X_{ij}^{1} + \color{blue} \beta_{ij}^{2} \color{black}\cdot X_{ij}^{2} \]
\[P_{ij} = \frac{\exp(\color{red}U_{ij})}{\sum_{k=1}^{J} \exp(\color{red}U_{ik})} \Rightarrow Y_{ij} \sim \text{Categorical}(P_{ij}) \]
Where:
Do you value the company of others? Do you fear it? What about the average cave dweller?
\[ u(\text{Light shaft + Silence}) - u(\text{Glowing Fire + Conversational Echoes}) > 0?\]
Choice Scenarios specified with attributes and choice outcomes for each discrete alternative
\[ \begin{split} \begin{split} \begin{pmatrix} u_{gc} \\ u_{gr} \\ u_{ec} \\ u_{er} \\ u_{hp} \\ \end{pmatrix} = \begin{pmatrix} gc_{ic} & gc_{oc} \\ gr_{ic} & gr_{oc} \\ ec_{ic} & ec_{oc} \\ er_{ic} & er_{oc} \\ hp_{ic} & hp_{oc} \\ \end{pmatrix} \begin{pmatrix} \color{blue}\beta_{ic} \\ \color{blue}\beta_{oc} \\ \end{pmatrix} \end{split} \end{split} \]
MNLogit
with Wilkinson-style formula interfaceThe probability of choosing alternative \(j\) follows the elegant logistic form:
\[\frac{\exp(\color{red}U_{ij})}{\sum_{k=1}^{J} \exp(\color{red}U_{ik})} = P_{ij} \Rightarrow s_{j}(\color{blue}\theta_{w})=P(u_{j}>u_{k};\forall_{k̸=j})\]
A simple model with a compelling interpretation. Too simple?
utility_formulas = [
"gc ~ ic_gc + oc_gc | income + rooms + agehed",
"gr ~ ic_gr + oc_gr | income + rooms + agehed",
"ec ~ ic_ec + oc_ec | income + rooms + agehed",
"er ~ ic_er + oc_er | income + rooms + agehed",
"hp ~ ic_hp + oc_hp | income + rooms + agehed",
]
mnl = MNLogit(df, utility_formulas, "depvar", covariates=["ic", "oc"])
mnl.sample()
The Multinomial Logit enforces the Indepdence of Irrelevant Alternatives property into preference calculations.
\[\dfrac{P_{j}}{P_{i}} = \dfrac{ \dfrac{e^{U_{j}}}{\sum_{i}^{n}e^{U_{k}}}}{\dfrac{e^{U_{i}}}{\sum_{i}^{n}e^{U_{k}}}} = \dfrac{e^{U_{j}}}{e^{U_{i}}} = e^{U_{j} - U_{k}}\]
Key Take-away: The Model Ignores Market Structure
new_policy_df = df.copy()
new_policy_df[["ic_ec", "ic_er"]] = new_policy_df[["ic_ec", "ic_er"]] * 1.5
## Posterior Predictive Forecast under counterfactual setting
idata_new_policy = mnl.apply_intervention(new_choice_df=new_policy_df)
## Compare Old and New Policy Settings
change_df = mnl.calculate_share_change(mnl.idata, mnl.intervention_idata)
change_df
\(P(i) \text{ when } i \in Alts\)
\(P(\text{choose nest B}) \cdot P(\text{choose i} | \text{ i} \in \text{B})\)
\(I_{k} = ln \sum_{j \in B_{k}} e^{Y_{j} / \lambda_{k}} \\ \text{ and } \lambda_{k} \sim Beta(1, 1)\)
The log-sum component allows for the utility of any alternatives within a nest to “bubble up” and influence the attractiveness of the overall nest.
utility_formulas = [
"gc ~ ic_gc + oc_gc | income + rooms ",
"ec ~ ic_ec + oc_ec | income + rooms ",
"gr ~ ic_gr + oc_gr | income + rooms ",
"er ~ ic_er + oc_er | income + rooms ",
"hp ~ ic_hp + oc_hp | income + rooms ",
]
nesting_structure = {"central": ["gc", "ec"], "room": ["hp", "gr", "er"]}
nstL_1 = NestedLogit(
df,
utility_formulas,
"depvar",
covariates=["ic", "oc"],
nesting_structure=nesting_structure,
model_config={
"alphas_": Prior("Normal", mu=0, sigma=5, dims="alts"),
"betas": Prior("Normal", mu=0, sigma=1, dims="alt_covariates"),
"betas_fixed_": Prior("Normal", mu=0, sigma=1, dims="fixed_covariates"),
"lambdas_nests": Prior("Beta", alpha=2, beta=2, dims="nests"),
},
)
nstL_1
The relative importance of product attributes implied by our observed data
The relative importance of installation costs versus operating costs might suggest where to impose a novel pricing strategy?
new_policy_df = df.copy()
new_policy_df[["ic_ec", "ic_er"]] = new_policy_df[["ic_ec", "ic_er"]] * 1.5
idata_new_policy_1 = nstL_1.apply_intervention(new_choice_df=new_policy_df)
change_df_1 = nstL_1.calculate_share_change(nstL_1.idata, nstL_1.intervention_idata)
change_df_1
Nested Logit allows for patterns of Non-Proportional Substitution under counterfactual settings
with pm.do(
model,
{"X1": np.ones(len(df)),
"beta1": 0.5},
prune_vars=True,
) as counterfactual_model:
idata_trt = pm.sample_posterior_predictive(idata,
var_names=["like", "p"])
Causal Inference with the Do-Operator modifies world-state and data alike allowing for compelling intervention studies about consumer behaviour
\[ w = \{ \alpha, \beta^{1}, \beta_{2}, X^{1}, X^{2} \} \\ \Rightarrow w^{*} = \{ \alpha, \beta^{*}, \beta_{2}, X^{*}, X^{2} \} \]