We begin our workshop by providing a conceptual overview of what exponential random graph models (ERGMs) are and what they do.
- Statistically Modelling Relational Systems
- What are ERGMs?
Our task is to study a set of relational outcomes that’s part of a social or political (or other) system.
- hypothesis testing
- outcome prediction
An interdependent set of relationships/interactions among units in a system.
- conflict among states or other actors in the international system
- collaboration among policy actors in a region, country, or the world
- citations between academic papers or judicial court cases
- friendship, influence, and learning among students in a classroom
We can understand and represent these systems as networks
This is a simple network plot of some conflicting actors in the post-Cold War Levant region.
How would we begin to analyze this system? What predicts conflict tie formation?
Let’s use Israel and Syria as an example:
- geographical proximity
- history of conflict
- mixed regime dyad
But what about the role of Lebanon in all this?
Think about the concept of preferential attachment:
- ties are more likely to form where there are already ties
- in conflict studies, this is sometimes called the “pile-on” effect (e.g. sanctions)
- it becomes less costly to attack/sanction a target when others are already doing it
- when the ties are positive, this is sometimes called the popularity effect (esp. when ties are directed inwards)
- preferential attachment is just one example of this kind of interdependence
Think in terms of the logistic regression.
- what are the variables we can put into the model?
- Everything on the dyad (i.e. node- or dyad-level variables)
In this case, the predictors is a vector of unit-level (i.e. dyad/directed dyad) variables.
- We are missing parts of the data generating process: Omitted variable bias.
- We are not accounting for interdependence: Overrejection from overly small standard errors.
We need a statistical model that can:
- model the interdependence between units in these systems that affect observed outcomes
- account for the lack of independence between units when fitting the model
This brings us to the exponential random graph model.
The exponential random graph model is a family of models used for statistically modelling the generative features of an observed network.
The ERGM can be expressed in two ways:
As a network:
These are equivalent. Let’s go through each of the different components of the expressions with a focus on the “relational outcomes” and the “generative features”.
These are our outcome variables. They are the conflicts, collaboration, citations, friendship, influence, or learning we want to study.
One of the important aspect of the ERGM is that it treats the set of outcomes as a multivariate distribution.
- This is how it deals with the interdependence between units problem
- This also means in some senses we only have one observation
- Assumption: The observed network is the expected outcome given the model (this is similar to the assumption from classical regression that the observed values in a data set is representative of the population)
Generative features are factors/effects/variables that contribute to the formation of a tie on the network.
In ERGMs, they are specified as local network configurations.
network: total count over the network
dyad: number of configurations the dyad contributes to/is a part of
They are the observable manifestations of tie combinations given your theorized social process.
- ERGMs are generative models, meaning that it can be used to simulate the systems that share the (modelled) generative features of the observed system.
- Assumption: networks with the same counts of local network configurations in the specified model will have equal probability of being observed (this is equivalent to the assumption from classifical regression analysis that there are no omitted variables - we have properly captured the generative process in our model)
ERGMs can include actor- and dyad-level factors just like the traditional logistic regression.
- node: democratic regimes
- dyad: joint-democracy
Endogeneous effects are generative features that go beyond the dyad. The are commonly called “network effects.”
- network: more than one dyad (or directed dyad) involved in the social process.
- dyad: more than nodes i and j matter to what happens between i and j.
If j → i, then i → j.
Are actors likely to reciprocate ties? (Only works for directed networks.)
If j – k, then i – j.
Are actors with more ties more likely to get even more ties?
If i – k and j – k, then i – j.
Are actors with shared partners more likely to form a tie?
|Network Effect||Local Network Configuration|
|Preferential Attachment||k-star (usually 2)|
|Reciprocity||Reciprocal ties on the dyad|
These are effectively interaction terms between local network configurations and exogeneous covariates.
For example, We can also have conditional triadic closure.
- we only consider triangles (i.e. local network configurations) if they comprise democracies