Structuring Your Work with Structural Equation Models
Probabilistic programming languages (PPLs) are structures for articulating assumptions and relationships, and a scaffold for exploring uncertainty.
Structural Equation models (SEMs) formalize scientific theory as a system of statistical relationships.
The Bayesian workflow binds these together into a compelling practice: an iterative conversation between theory and data disciplined by rigours of probabilistic programming
Agenda
The Idea of a Workflow:
Craft versus Checklist
Job Satisfaction
Bayesian Workflow with SEMs:
Confirmatory Factor Structures
Adding Structural Relations
Adding Covariance Structure
Sensitivity Analysis:
Adding Hierarchical Structure
Parameter Recovery and Model Validation
Conclusion
Craft and Statistical Workflow
The Idea of a Workflow
Craft in Statistical Modelling
Embraces process, imperfection, and iteration.
Aims at the acquisition of scientific knowledge
Supports generalisable findings and solutions
Restore ownership: you shape, test, and refine — the model carries your imprint.
Checklists in Statistical Modelling
Reduce inquiry to compliance: ticking boxes replaces genuine understanding.
Create the illusion of rigor while bypassing uncertainty and context.
Confuse progress with throughput: more boxes checked ≠ better science.
Promotes shallow levels of engagement, infantalises the management class. Hinders effective decision making.
Strips away ownership: you don’t make something, you just complete a task.
Job Satisfaction Data
Constructive Thought Strategies (CTS): Thought patterns that are positive or helpful, such as:
Self-Talk (positive internal dialogue): ST
Mental Imagery (visualizing successful performance or outcomes): MI
Richly Parameterised Regressions with Expressive Encodings of Measurement Error and Latent Constructs
The SEM workflow
Start with Confirmatory Factor Analysis (CFA):
Validate that our measurement model holds.
Ensure latent constructs are reliably represented by observed indicators.
Layer Structural Paths:
Add theoretically-motivated regressions between constructs.
Assess whether hypothesized relationships improve model fit.
Refine with Residual Covariances:
Account for specific shared variance not captured by factors.
Keep structure transparent while improving realism.
Iterative Validation:
Each step asks: Does this addition honor theory? Improve fit?
Workflow = constant negotiation between parsimony and fidelity.
Job Satisfaction Data
JW1
JW2
JW3
UF1
UF2
FOR
DA1
DA2
DA3
EBA
ST
MI
0
-1.046719
-1.472334
-1.649844
-0.740886
-0.573890
-0.992347
1.269461
1.805128
1.230402
-0.039732
1.618562
-0.169659
1
-1.649857
-1.908889
-1.841327
0.120929
-0.939917
0.401440
0.177058
0.126041
-0.004604
-0.806541
0.930899
-0.438887
2
0.429099
1.826533
0.341107
1.033988
1.287623
0.490457
-0.627370
-0.717461
-0.246633
0.261212
0.913639
0.496846
3
0.257582
-0.315831
1.258474
0.241065
-0.548987
-0.247273
0.858847
0.964730
1.233870
-0.251100
0.466743
0.169622
4
-0.875969
-0.263046
-0.947966
-0.231731
-0.850588
0.860900
0.989963
0.671778
0.438236
-0.129382
2.266723
-0.951899
The Data for SEM modelling is a multivariate data structure with natural theory-driven categories of variables which reflect some mis-measured latent factor.
Complex models require proper validation methods. Parameter recovery methods are the best way to test the model’s ability to identify the correct effects.
Workflow and Craft
Craft as Discovery
“Abandon the idea of predetermination, the shaping force of your intentions…rely less on the priority of your intentions and more on the immediacy of writing… You’ll see that some of your sentences are still conjectural… start noticing the thoughts and implications surrounding them.” - Verlyn Klinkenborg in Several Short Sentences about Writing”
Modeling, like writing, is an act of exploration.
Expect surprises and anomalies—they teach more than preconceptions.
Embrace uncertainty; allow the data to guide the process.
Craft as Discipline
” [T]he goal is to represent the systematic relationships between the variables and between the variables and the parameters … Discrepancies between the model and data can be used to learn about the ways in which the model is inadequate for the scientific purposes at hand, and thus to motivate expansions and changes to the model … a model is a story of how the data could have been generated; the fitted model should therefore be able to generate synthetic data that look like the real data; failures to do so in important ways indicate faults in the model.” - Gelman & Shalizi in Philosophy and the practice of Bayesian statistics
Building statistical models is inherently iterative and expansionary
Assumptions are encoded transparently and their implications are assessed for cogency
Where our assumptions fail, they are revised or rejected. Building confidence and clarity.
The process yields compelling, justifiable conclusions worthy of your work.
Conclusion: Workflow as Craft
“Here, in short, is what i want to tell you. Know what each sentence says, What it doesn’t say, And what it implies. Of these, the hardest is know what each sentence actually says” - V. Klinkenborg
In modelling, as in writing, clarity emerges through revision.
The Bayesian workflow with PyMC teaches us to listen to our models — to read them aloud through simulation, recovery, and critique.
Each iteration reveals what the model truly says, what it hides, and what it implies.
Craft lies in that attention — in resisting flattening automation, and choosing understanding over throughput.
Through this care, our models become not only more compelling, but more robust — resilient to noise, misfit, and misuse.