# install.packages('network', .libPaths(), repos='http://cran.us.r-project.org')
# install.packages('ergm', .libPaths(), repos='http://cran.us.r-project.org')
# install.packages('devtools', .libPaths(), repos='http://cran.us.r-project.org')
devtools::install_github('tedhchen/ergmWorkshopTools')
Downloading GitHub repo tedhchen/ergmWorkshopTools@HEAD
✔ checking for file ‘/tmp/RtmpQbcSQg/remotes1ec57f9fac2/tedhchen-ergmWorkshopTools-d00ab89/DESCRIPTION’ (424ms) ─ preparing ‘ergmWorkshopTools’: ✔ checking DESCRIPTION meta-information ─ checking for LF line-endings in source and make files and shell scripts ─ checking for empty or unneeded directories ─ building ‘ergmWorkshopTools_0.0.1.tar.gz’
library(ergm)
library(ergmWorkshopTools)
Loading required package: network ‘network’ 1.17.1 (2021-06-12), part of the Statnet Project * ‘news(package="network")’ for changes since last version * ‘citation("network")’ for citation information * ‘https://statnet.org’ for help, support, and other information ‘ergm’ 4.0.0 (2021-06-14), part of the Statnet Project * ‘news(package="ergm")’ for changes since last version * ‘citation("ergm")’ for citation information * ‘https://statnet.org’ for help, support, and other information ‘ergm’ 4 is a major release that introduces a fair number of backwards-incompatible changes. Please type ‘news(package="ergm")’ for a list of major changes. ====================== Package: ergmWorkshopTools Version: 0.0.1 Date: 2021-06-14 Author: Ted Hsuan Yun Chen (Aalto University and University of Helsinki) This package contains data and functions needed for the 2021 PolNet ERGM workshop.
As you can see from the output, loading the ergm
package also loaded the network
package.
There are also these references to the statnet
project. They are the main developers of ERGM functionalities in R
.
data(mid_mat)
?mid_mat
checkmat()
is a utility function that plots the supplied adjacency matrix.
Note about plotting: You can ignore the plotting options
if you are not using a jupyter notebook.
options(repr.plot.width=10, repr.plot.height=10)
checkmat(mid_mat)
network
function to create a network object.¶?network
nw <- network(mid_mat, directed = F)
nw
Network attributes: vertices = 189 directed = FALSE hyper = FALSE loops = FALSE multiple = FALSE bipartite = FALSE total edges= 134 missing edges= 0 non-missing edges= 134 Vertex attribute names: vertex.names No edge attributes
options(repr.plot.width=10, repr.plot.height=10)
par(mfrow = c(1, 1))
set.seed(210615); plot(nw)
The adjacency matrix format is generally the easiest to work with, but it can take a lot of space.
Sometimes, we will have edgelists instead. In fact, unless you have pre-prepared network data, edgelists are usually what you'll have.
data(mid_edgelist)
nw.b <- network(mid_edgelist, directed = F)
nw.b
Network attributes: vertices = 120 directed = FALSE hyper = FALSE loops = FALSE multiple = FALSE bipartite = FALSE total edges= 134 missing edges= 0 non-missing edges= 134 Vertex attribute names: vertex.names No edge attributes
options(repr.plot.width=20, repr.plot.height=10)
par(mfrow = c(1, 2))
set.seed(210615); plot(nw); plot(nw.b)
data(mid_node_attr)
head(mid_node_attr)
?mid_node_attr
name | intrastate | dem | cinc | trans | |
---|---|---|---|---|---|
<chr> | <dbl> | <dbl> | <dbl> | <dbl> | |
2 | USA | 0 | 1 | 0.149933 | 0 |
20 | CAN | 0 | 1 | 0.012238 | 0 |
31 | BHM | 0 | 1 | 0.000044 | 0 |
40 | CUB | 1 | 0 | 0.002783 | 0 |
41 | HAI | 1 | 0 | 0.000443 | 0 |
42 | DOM | 0 | 0 | 0.000748 | 0 |
mat <- matrix(0, ncol = 189, nrow = 189, dimnames = list(row.names(mid_node_attr), row.names(mid_node_attr)))
mat[1:10, 1:10]
2 | 20 | 31 | 40 | 41 | 42 | 51 | 52 | 53 | 54 | |
---|---|---|---|---|---|---|---|---|---|---|
2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
20 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
31 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
40 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
41 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
42 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
51 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
52 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
53 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
54 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
for(i in 1:nrow(mid_edgelist)){
mat[mid_edgelist[i, 1], mid_edgelist[i, 2]] <- 1
mat[mid_edgelist[i, 2], mid_edgelist[i, 1]] <- 1
}
options(repr.plot.width=14, repr.plot.height=7)
par(mfrow = c(1, 2))
checkmat(mat);checkmat(mid_mat)
nw
Network attributes: vertices = 189 directed = FALSE hyper = FALSE loops = FALSE multiple = FALSE bipartite = FALSE total edges= 134 missing edges= 0 non-missing edges= 134 Vertex attribute names: vertex.names No edge attributes
The %v%
operator is how to access vertex/node variables from the network object.
nw%v%'vertex.names'
nw%v%'dem' <- mid_node_attr$'dem'
nw
Network attributes: vertices = 189 directed = FALSE hyper = FALSE loops = FALSE multiple = FALSE bipartite = FALSE total edges= 134 missing edges= 0 non-missing edges= 134 Vertex attribute names: dem vertex.names No edge attributes
nw%v%'dem'
options(repr.plot.width=10, repr.plot.height=10)
set.seed(210615); plot(nw, vertex.col = ifelse(nw%v%'dem', 2, 1)); legend('bottomright', legend = c('Democracy', 'Nondemocracy'), fill = c(2, 1), bty = 'n', cex = 1.1)
These are the geographical contiguity and joint-democracy networks.
data(contig); data(joint_dem)
options(repr.plot.width=16, repr.plot.height=8)
par(mfrow = c(1, 2))
set.seed(210615); plot(network(contig, directed = F), sub = 'Contiguity', cex.sub = 2); plot(network(joint_dem, directed = F), sub = 'Joint-Democracy', cex.sub = 2)
?`ergm-terms'
to look at all the existing terms in the ergm
packagesearch.ergmTerms
function also works well?`ergm-terms`
search.ergmTerms(keyword = 'transitive')
Found 15 matching ergm terms: ddsp(d, type="OTP") desp(d, type="OTP") dgwdsp(decay, fixed=FALSE, cutoff=30, type="OTP") dgwesp(decay, fixed=FALSE, cutoff=30, type="OTP") dgwnsp(decay, fixed=FALSE, cutoff=30, type="OTP") dnsp(d, type="OTP") intransitive() Intransitive triads intransitive() Intransitive triads transitive() Transitive triads transitiveties(attr=NULL, levels=NULL) Transitive ties transitiveties(threshold=0) Transitive ties transitiveweights(twopath="min",combine="max",affect="min") Transitive weights triangle(attr=NULL, diff=FALSE, levels=NULL) Triangles ttriple(attr=NULL, diff=FALSE, levels=NULL) Transitive triples ttriad() Transitive triples
m0 <- ergm(nw ~ edges + nodefactor('dem') + edgecov(joint_dem))
summary(m0)
Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. Maximizing the pseudolikelihood. Finished MPLE. Stopping at the initial estimate. Evaluating log-likelihood at the estimate.
Call: ergm(formula = nw ~ edges + nodefactor("dem") + edgecov(joint_dem)) Maximum Likelihood Results: Estimate Std. Error MCMC % z value Pr(>|z|) edges -4.6678 0.1286 0 -36.287 <1e-04 *** nodefactor.dem.1 -0.2632 0.1818 0 -1.447 0.148 edgecov.joint_dem -0.2179 0.4078 0 -0.534 0.593 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Null Deviance: 24629 on 17766 degrees of freedom Residual Deviance: 1570 on 17763 degrees of freedom AIC: 1576 BIC: 1599 (Smaller is better. MC Std. Err. = 0)
dyad_df <- ergmMPLE(nw ~ edges + nodefactor('dem') + edgecov(joint_dem))
dyad_df$predictor
edges | nodefactor.dem.1 | edgecov.joint_dem |
---|---|---|
1 | 2 | 1 |
1 | 2 | 1 |
1 | 1 | 0 |
1 | 0 | 0 |
1 | 0 | 0 |
1 | 1 | 0 |
summary(glm(dyad_df$response ~ dyad_df$predictor - 1, weights = dyad_df$weights, family = 'binomial'))
Call: glm(formula = dyad_df$response ~ dyad_df$predictor - 1, family = "binomial", weights = dyad_df$weights) Deviance Residuals: 1 2 3 4 5 6 11.402 -4.894 24.545 23.887 -11.020 -11.025 Coefficients: Estimate Std. Error z value Pr(>|z|) dyad_df$predictoredges -4.6678 0.1286 -36.287 <2e-16 *** dyad_df$predictornodefactor.dem.1 -0.2632 0.1818 -1.447 0.148 dyad_df$predictoredgecov.joint_dem -0.2179 0.4078 -0.534 0.593 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 (Dispersion parameter for binomial family taken to be 1) Null deviance: 24629 on 6 degrees of freedom Residual deviance: 1570 on 3 degrees of freedom AIC: 1576 Number of Fisher Scoring iterations: 7
kstar(2)
(which is the 2-star term) is the basic term for thism1 <- ergm(nw ~ edges + nodefactor('dem') + edgecov(joint_dem) + kstar(2), control = control.ergm(seed = 210616))
Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. Maximizing the pseudolikelihood. Finished MPLE. Starting Monte Carlo maximum likelihood estimation (MCMLE): Iteration 1 of at most 60:
Error in ergm.MCMLE(init, nw, model, initialfit = (initialfit <- NULL), : Number of edges in a simulated network exceeds that in the observed by a factor of more than 20. This is a strong indicator of model degeneracy or a very poor starting parameter configuration. If you are reasonably certain that neither of these is the case, increase the MCMLE.density.guard control.ergm() parameter. Traceback: 1. ergm(nw ~ edges + nodefactor("dem") + edgecov(joint_dem) + kstar(2), . control = control.ergm(seed = 210616)) 2. ergm.MCMLE(init, nw, model, initialfit = (initialfit <- NULL), . control = control, proposal = proposal, proposal.obs = proposal.obs, . verbose = verbose, ...) 3. stop("Number of edges in a simulated network exceeds that in the observed by a factor of more than ", . floor(control$MCMLE.density.guard), ". This is a strong indicator of model degeneracy or a very poor starting parameter configuration. If you are reasonably certain that neither of these is the case, increase the MCMLE.density.guard control.ergm() parameter.")
ergm
package, it is called gwdegree
.Hunter, DR et al. 2008. "ergm: A package to fit, simulate and diagnose exponentiall-family models for networks." Journal of Statistical Software.
Snijders, T et al. 2006. "New specifications for exponential random graph models." Sociological Methodology.
m1 <- ergm(nw ~ edges + nodefactor('dem') + edgecov(joint_dem) + gwdegree(decay = 0, fixed = T), control = control.ergm(seed = 210616))
Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. Maximizing the pseudolikelihood. Finished MPLE. Starting Monte Carlo maximum likelihood estimation (MCMLE): Iteration 1 of at most 60: Optimizing with step length 0.5407. The log-likelihood improved by 1.6896. Estimating equations are not within tolerance region. Iteration 2 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.4623. Estimating equations are not within tolerance region. Iteration 3 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.1125. Estimating equations are not within tolerance region. Iteration 4 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0849. Convergence test p-value: 0.7687. Not converged with 99% confidence; increasing sample size. Iteration 5 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0420. Convergence test p-value: 0.3443. Not converged with 99% confidence; increasing sample size. Iteration 6 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0100. Convergence test p-value: 0.0007. Converged with 99% confidence. Finished MCMLE. Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel... Bridging between the dyad-independent submodel and the full model... Setting up bridge sampling... Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 . Bridging finished. This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
summary(m1)
Call: ergm(formula = nw ~ edges + nodefactor("dem") + edgecov(joint_dem) + gwdegree(decay = 0, fixed = T), control = control.ergm(seed = 210616)) Monte Carlo Maximum Likelihood Results: Estimate Std. Error MCMC % z value Pr(>|z|) edges -4.1573 0.1528 0 -27.206 <1e-04 *** nodefactor.dem.1 -0.1599 0.1738 0 -0.920 0.357 edgecov.joint_dem -0.2259 0.3969 0 -0.569 0.569 gwdeg.fixed.0 -1.1605 0.2355 0 -4.928 <1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Null Deviance: 24629 on 17766 degrees of freedom Residual Deviance: 1544 on 17762 degrees of freedom AIC: 1552 BIC: 1583 (Smaller is better. MC Std. Err. = 0.4484)
gwdeg.fixed.0
term has coefficient of -1.16m2 <- ergm(nw ~ edges + nodefactor('dem') + edgecov(joint_dem) + gwdegree(decay = 0, fixed = T) + triangle, control = control.ergm(seed = 210616))
Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. Maximizing the pseudolikelihood. Finished MPLE. Starting Monte Carlo maximum likelihood estimation (MCMLE): Iteration 1 of at most 60: Optimizing with step length 0.3570. The log-likelihood improved by 1.7058. Estimating equations are not within tolerance region. Iteration 2 of at most 60:
Error in ergm.MCMLE(init, nw, model, initialfit = (initialfit <- NULL), : Number of edges in a simulated network exceeds that in the observed by a factor of more than 20. This is a strong indicator of model degeneracy or a very poor starting parameter configuration. If you are reasonably certain that neither of these is the case, increase the MCMLE.density.guard control.ergm() parameter. Traceback: 1. ergm(nw ~ edges + nodefactor("dem") + edgecov(joint_dem) + gwdegree(decay = 0, . fixed = T) + triangle, control = control.ergm(seed = 210616)) 2. ergm.MCMLE(init, nw, model, initialfit = (initialfit <- NULL), . control = control, proposal = proposal, proposal.obs = proposal.obs, . verbose = verbose, ...) 3. stop("Number of edges in a simulated network exceeds that in the observed by a factor of more than ", . floor(control$MCMLE.density.guard), ". This is a strong indicator of model degeneracy or a very poor starting parameter configuration. If you are reasonably certain that neither of these is the case, increase the MCMLE.density.guard control.ergm() parameter.")
ergm
package, it is called gwesp
m2 <- ergm(nw ~ edges + nodefactor('dem') + edgecov(joint_dem) + gwdegree(decay = 0, fixed = T) + gwesp(decay = 0, fixed = T), control = control.ergm(seed = 210616))
Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. Maximizing the pseudolikelihood. Finished MPLE. Starting Monte Carlo maximum likelihood estimation (MCMLE): Iteration 1 of at most 60: Optimizing with step length 0.4429. The log-likelihood improved by 1.9503. Estimating equations are not within tolerance region. Iteration 2 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.8813. Estimating equations are not within tolerance region. Iteration 3 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.3267. Estimating equations are not within tolerance region. Iteration 4 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.2839. Estimating equations are not within tolerance region. Iteration 5 of at most 60: Optimizing with step length 0.9045. The log-likelihood improved by 1.5003. Estimating equations are not within tolerance region. Iteration 6 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 1.5620. Estimating equations are not within tolerance region. Iteration 7 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.1756. Estimating equations are not within tolerance region. Iteration 8 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.1829. Estimating equations are not within tolerance region. Iteration 9 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.4080. Estimating equations are not within tolerance region. Iteration 10 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0691. Convergence test p-value: 0.9569. Not converged with 99% confidence; increasing sample size. Iteration 11 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.1327. Estimating equations are not within tolerance region. Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size. Iteration 12 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0895. Convergence test p-value: 0.5342. Not converged with 99% confidence; increasing sample size. Iteration 13 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0194. Convergence test p-value: 0.0808. Not converged with 99% confidence; increasing sample size. Iteration 14 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0232. Convergence test p-value: 0.0165. Not converged with 99% confidence; increasing sample size. Iteration 15 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0152. Convergence test p-value: < 0.0001. Converged with 99% confidence. Finished MCMLE. Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel... Bridging between the dyad-independent submodel and the full model... Setting up bridge sampling... Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 . Bridging finished. This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
summary(m2)
Call: ergm(formula = nw ~ edges + nodefactor("dem") + edgecov(joint_dem) + gwdegree(decay = 0, fixed = T) + gwesp(decay = 0, fixed = T), control = control.ergm(seed = 210616)) Monte Carlo Maximum Likelihood Results: Estimate Std. Error MCMC % z value Pr(>|z|) edges -4.6965 0.1702 0 -27.594 < 1e-04 *** nodefactor.dem.1 -0.1392 0.1595 0 -0.873 0.38291 edgecov.joint_dem -0.2355 0.4113 0 -0.573 0.56692 gwdeg.fixed.0 -0.6418 0.2284 0 -2.810 0.00495 ** gwesp.fixed.0 1.1526 0.1415 0 8.148 < 1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Null Deviance: 24629 on 17766 degrees of freedom Residual Deviance: 1494 on 17761 degrees of freedom AIC: 1504 BIC: 1543 (Smaller is better. MC Std. Err. = 0.4262)
gwesp.fixed.0
is positive heregwdegree
and gwesp
termsgwesp
)Levy, M. 2016. "gwdegree: Improving interpretation of geometrically-weighted degree estimates in exponential random graph models." Journal of Open Source Software.
gwesp
.¶tenclique <- network(matrix(1, ncol = 10, nrow = 10), directed = T)
options(repr.plot.width=7, repr.plot.height=7)
set.seed(210615); plot(tenclique)
The summary
function is very useful when provided with an ERGM model.
summary(network ~ ergm-terms)
summary(tenclique ~ edges + gwesp(0, fixed = T) + gwesp(0.5, fixed = T) + gwesp(1, fixed = T) + gwesp(2, fixed = T) + gwesp(5, fixed = T) + gwesp(10, fixed = T) + ttriple)
gwesp
with decay set to 0 means every edge has "one or more" shared partner - this makes it the same value as the edge
count termttriple
term, which is a count of all edgewise shared partners of all edges (i.e. $10\times 9\times 8 = 720$)edges
and gwesp
are the same) and on the other end, you will run into degeneracy issuesm2.b <- ergm(nw ~ edges + nodefactor('dem') + edgecov(joint_dem) + gwdegree(decay = 0.2, fixed = T) + gwesp(decay = 0, fixed = T), control = control.ergm(seed = 210616))
Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. Maximizing the pseudolikelihood. Finished MPLE. Starting Monte Carlo maximum likelihood estimation (MCMLE): Iteration 1 of at most 60: Optimizing with step length 0.3570. The log-likelihood improved by 1.3288. Estimating equations are not within tolerance region. Iteration 2 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 1.5190. Estimating equations are not within tolerance region. Iteration 3 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.4394. Estimating equations are not within tolerance region. Iteration 4 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0368. Convergence test p-value: 0.9732. Not converged with 99% confidence; increasing sample size. Iteration 5 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.4939. Estimating equations are not within tolerance region. Iteration 6 of at most 60: Optimizing with step length 0.9045. The log-likelihood improved by 1.8995. Estimating equations are not within tolerance region. Estimating equations did not move closer to tolerance region more than 1 time(s) in 4 steps; increasing sample size. Iteration 7 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.3506. Estimating equations are not within tolerance region. Iteration 8 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0215. Convergence test p-value: 0.4942. Not converged with 99% confidence; increasing sample size. Iteration 9 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0223. Convergence test p-value: 0.0003. Converged with 99% confidence. Finished MCMLE. Evaluating log-likelihood at the estimate. Fitting the dyad-independent submodel... Bridging between the dyad-independent submodel and the full model... Setting up bridge sampling... Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 . Bridging finished. This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
summary(m2.b)
Call: ergm(formula = nw ~ edges + nodefactor("dem") + edgecov(joint_dem) + gwdegree(decay = 0.2, fixed = T) + gwesp(decay = 0, fixed = T), control = control.ergm(seed = 210616)) Monte Carlo Maximum Likelihood Results: Estimate Std. Error MCMC % z value Pr(>|z|) edges -4.4797 0.1931 0 -23.195 < 1e-04 *** nodefactor.dem.1 -0.1274 0.1739 0 -0.732 0.463893 edgecov.joint_dem -0.2045 0.4263 0 -0.480 0.631492 gwdeg.fixed.0.2 -0.8503 0.2271 0 -3.744 0.000181 *** gwesp.fixed.0 1.0601 0.1468 0 7.221 < 1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Null Deviance: 24629 on 17766 degrees of freedom Residual Deviance: 1487 on 17761 degrees of freedom AIC: 1497 BIC: 1536 (Smaller is better. MC Std. Err. = 0.5284)
gof
function is what we usem2fit <- gof(m2, control = control.gof.ergm(seed = 210616))
m2fit
Goodness-of-fit for degree obs min mean max MC p-value degree0 69 49 68.12 94 0.88 degree1 66 25 44.72 63 0.00 degree2 23 22 34.25 51 0.02 degree3 13 11 21.67 36 0.10 degree4 7 5 11.32 23 0.30 degree5 3 0 5.44 15 0.52 degree6 2 0 2.23 6 1.00 degree7 1 0 0.77 5 0.92 degree8 2 0 0.34 3 0.12 degree9 0 0 0.06 2 1.00 degree10 1 0 0.06 2 0.10 degree11 1 0 0.02 1 0.04 degree18 1 0 0.00 0 0.00 Goodness-of-fit for edgewise shared partner obs min mean max MC p-value esp0 95 68 95.38 122 0.98 esp1 32 3 38.62 73 0.60 esp2 5 0 2.68 11 0.40 esp3 2 0 0.10 2 0.02 Goodness-of-fit for minimum geodesic distance obs min mean max MC p-value 1 134 134 134 134 1 2 419 419 419 419 1 3 658 658 658 658 1 4 765 765 765 765 1 5 737 737 737 737 1 6 489 489 489 489 1 7 359 359 359 359 1 8 203 203 203 203 1 9 90 90 90 90 1 10 28 28 28 28 1 11 6 6 6 6 1 12 1 1 1 1 1 Inf 13877 13877 13877 13877 1 Goodness-of-fit for model statistics obs min mean max MC p-value edges 134 88 136.78 199 1.00 nodefactor.dem.1 85 54 85.74 137 0.98 edgecov.joint_dem 12 4 11.72 22 1.00 gwdeg.fixed.0 120 95 120.88 140 0.88 gwesp.fixed.0 39 3 41.40 82 0.94
options(repr.plot.width=32, repr.plot.height=8)
par(mfrow = c(1, 4))
plot(m2fit)
m1fit <- gof(m1, control = control.gof.ergm(seed = 210616))
options(repr.plot.width=32, repr.plot.height=8)
par(mfrow = c(1, 4))
plot(m1fit)
summary(m2)
Call: ergm(formula = nw ~ edges + nodefactor("dem") + edgecov(joint_dem) + gwdegree(decay = 0, fixed = T) + gwesp(decay = 0, fixed = T), control = control.ergm(seed = 210616)) Monte Carlo Maximum Likelihood Results: Estimate Std. Error MCMC % z value Pr(>|z|) edges -4.6965 0.1702 0 -27.594 < 1e-04 *** nodefactor.dem.1 -0.1392 0.1595 0 -0.873 0.38291 edgecov.joint_dem -0.2355 0.4113 0 -0.573 0.56692 gwdeg.fixed.0 -0.6418 0.2284 0 -2.810 0.00495 ** gwesp.fixed.0 1.1526 0.1415 0 8.148 < 1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Null Deviance: 24629 on 17766 degrees of freedom Residual Deviance: 1494 on 17761 degrees of freedom AIC: 1504 BIC: 1543 (Smaller is better. MC Std. Err. = 0.4262)
gwesp
means triadic closure is a feature of the networknodefactor.dem
will be 0 if two nondemocracies, 1 if mixed regime, and 2 if joint democraciesgwesp.fixed.0
will be 1 if the dyad is part of at least one triangle$P(y_{ij} = 1 | Y, \mathbf{\theta}) = logit ^{-1}(\sum^k_{r=1}\theta_r \delta_r^{(ij)}Y)$¶
- conditional log odds of a tie given a one unit change in the statistic
- for the less complicated terms, it becomes the conditional log odds of a tie given that it is part of the local configuration one more time
Just a simple three node network with two ties to show tie formation on the empty dyad
tri <- matrix(c(0, 1, 1,
1, 0, 0,
1, 0, 0), byrow = T, ncol = 3)
nw.tri <- network(tri, directed = F)
nw.tri%v%'dem' <- c(0, 1, 1)
jd.tri <- matrix(c(0, 0, 0,
0, 0, 1,
0, 1, 0), byrow = T, ncol = 3)
options(repr.plot.width=10, repr.plot.height=10)
set.seed(210615); plot(nw.tri, vertex.col = ifelse(nw.tri%v%'dem', 2, 1), displaylabels = T)
ergmMPLE
function can help with this, especially with more complicated terms like the geometrically weighted onesergmMPLE(nw.tri ~ edges + nodefactor('dem') + edgecov(jd.tri) + gwdegree(decay = 0, fixed = T) + gwesp(decay = 0, fixed = T))$predictor
edges | nodefactor.dem.1 | edgecov.jd.tri | gwdeg.fixed.0 | gwesp.fixed.0 |
---|---|---|---|---|
1 | 1 | 0 | 1 | 0 |
1 | 2 | 1 | 0 | 3 |
gwdegree.0
termgwesp.0
termgwdegree
not change?¶Let's look at the degree distribution:
nodes with degree: | 0 | 1 | 2 |
---|---|---|---|
w/o (2,3) edge | 0 | 2 | 1 |
w/ (2,3) edge | 0 | 0 | 3 |
gwdegree.0
does not changeround(coefficients(m2), 2)
plogis(-1.76)
Desmarais, BA & Cranmer, SJ. 2012. "Micro-level interpretations of exponential random graph models with applications to estuary networks." Policy Studies Journal.
eval.loglik
if you are exploring large and complex modelsMCMC.burnin
, MCMC.samplesize
, and MCMC.interval
for higher quality estimatesparallel
to speed up estimation when working with larger modelsSee all the settings with
?control.ergm
m2.c <- ergm(nw ~ edges + nodefactor('dem') + edgecov(joint_dem) + gwdegree(decay = 0, fixed = T) + gwesp(decay = 0, fixed = T),
eval.loglik = F,
control = control.ergm(seed = 210615,
MCMC.burnin = 50000,
MCMC.samplesize = 2500,
MCMC.interval = 2500,
parallel = 0))
Starting maximum pseudolikelihood estimation (MPLE): Evaluating the predictor and response matrix. Maximizing the pseudolikelihood. Finished MPLE. Starting Monte Carlo maximum likelihood estimation (MCMLE): Iteration 1 of at most 60: Optimizing with step length 0.5407. The log-likelihood improved by 4.6038. Estimating equations are not within tolerance region. Iteration 2 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 1.9610. Estimating equations are not within tolerance region. Iteration 3 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.1193. Estimating equations are not within tolerance region. Iteration 4 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0281. Convergence test p-value: 0.0482. Not converged with 99% confidence; increasing sample size. Iteration 5 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0140. Convergence test p-value: 0.2140. Not converged with 99% confidence; increasing sample size. Iteration 6 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0189. Convergence test p-value: 0.2765. Not converged with 99% confidence; increasing sample size. Iteration 7 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0132. Convergence test p-value: 0.0821. Not converged with 99% confidence; increasing sample size. Iteration 8 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0145. Convergence test p-value: 0.1721. Not converged with 99% confidence; increasing sample size. Iteration 9 of at most 60: Optimizing with step length 1.0000. The log-likelihood improved by 0.0029. Convergence test p-value: < 0.0001. Converged with 99% confidence. Finished MCMLE. This model was fit using MCMC. To examine model diagnostics and check for degeneracy, use the mcmc.diagnostics() function.
summary(m2.c)
Log-likelihood was not estimated for this fit. To get deviances, AIC, and/or BIC, use ‘*fit* <-logLik(*fit*, add=TRUE)’ to add it to the object or rerun this function with eval.loglik=TRUE.
Call: ergm(formula = nw ~ edges + nodefactor("dem") + edgecov(joint_dem) + gwdegree(decay = 0, fixed = T) + gwesp(decay = 0, fixed = T), eval.loglik = F, control = control.ergm(seed = 210615, MCMC.burnin = 50000, MCMC.samplesize = 2500, MCMC.interval = 2500, parallel = 0)) Monte Carlo Maximum Likelihood Results: Estimate Std. Error MCMC % z value Pr(>|z|) edges -4.6948 0.1771 0 -26.510 <1e-04 *** nodefactor.dem.1 -0.1390 0.1664 0 -0.835 0.4036 edgecov.joint_dem -0.2336 0.4088 0 -0.572 0.5676 gwdeg.fixed.0 -0.6391 0.2317 0 -2.759 0.0058 ** gwesp.fixed.0 1.1460 0.1468 0 7.809 <1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
m2.c <- logLik(m2.c, add = T)
Fitting the dyad-independent submodel... Bridging between the dyad-independent submodel and the full model... Setting up bridge sampling... Using 16 bridges: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 . Bridging finished.
summary(m2.c)
Call: ergm(formula = nw ~ edges + nodefactor("dem") + edgecov(joint_dem) + gwdegree(decay = 0, fixed = T) + gwesp(decay = 0, fixed = T), eval.loglik = F, control = control.ergm(seed = 210615, MCMC.burnin = 50000, MCMC.samplesize = 2500, MCMC.interval = 2500, parallel = 0)) Monte Carlo Maximum Likelihood Results: Estimate Std. Error MCMC % z value Pr(>|z|) edges -4.6948 0.1771 0 -26.510 <1e-04 *** nodefactor.dem.1 -0.1390 0.1664 0 -0.835 0.4036 edgecov.joint_dem -0.2336 0.4088 0 -0.572 0.5676 gwdeg.fixed.0 -0.6391 0.2317 0 -2.759 0.0058 ** gwesp.fixed.0 1.1460 0.1468 0 7.809 <1e-04 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Null Deviance: 24629 on 17766 degrees of freedom Residual Deviance: 1495 on 17761 degrees of freedom AIC: 1505 BIC: 1544 (Smaller is better. MC Std. Err. = 0.3209)
options(repr.plot.width=20, repr.plot.height=10)
mcmc.diagnostics(m2)
Sample statistics summary: Iterations = 2803712:6039552 Thinning interval = 2048 Number of chains = 1 Sample size per chain = 1581 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE edges 2.8741 19.864 0.4996 0.9986 nodefactor.dem.1 1.4187 15.633 0.3932 0.6645 edgecov.joint_dem 0.1954 4.089 0.1028 0.1352 gwdeg.fixed.0 0.8299 9.662 0.2430 0.3857 gwesp.fixed.0 2.2815 13.649 0.3433 0.7790 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% edges -33.5 -10 2 16 45 nodefactor.dem.1 -27.0 -10 1 12 35 edgecov.joint_dem -7.0 -3 0 3 9 gwdeg.fixed.0 -20.0 -5 1 8 18 gwesp.fixed.0 -21.5 -7 2 11 31 Are sample statistics significantly different from observed? edges nodefactor.dem.1 edgecov.joint_dem gwdeg.fixed.0 diff. 2.874130297 1.41872233 0.1954459 0.82985452 test stat. 2.878250816 2.13509251 1.4456122 2.15181337 P-val. 0.003998871 0.03275345 0.1482860 0.03141206 gwesp.fixed.0 Overall (Chi^2) diff. 2.281467426 NA test stat. 2.928565083 10.74797367 P-val. 0.003405305 0.05852964 Sample statistics cross-correlations: edges nodefactor.dem.1 edgecov.joint_dem gwdeg.fixed.0 edges 1.0000000 0.7989286 0.4602394 0.8505776 nodefactor.dem.1 0.7989286 1.0000000 0.7638296 0.7340096 edgecov.joint_dem 0.4602394 0.7638296 1.0000000 0.4590853 gwdeg.fixed.0 0.8505776 0.7340096 0.4590853 1.0000000 gwesp.fixed.0 0.8038153 0.6072574 0.3261639 0.5303066 gwesp.fixed.0 edges 0.8038153 nodefactor.dem.1 0.6072574 edgecov.joint_dem 0.3261639 gwdeg.fixed.0 0.5303066 gwesp.fixed.0 1.0000000 Sample statistics auto-correlation: Chain 1 edges nodefactor.dem.1 edgecov.joint_dem gwdeg.fixed.0 Lag 0 1.00000000 1.00000000 1.000000000 1.00000000 Lag 2048 0.48734776 0.32828093 0.184489694 0.24289768 Lag 4096 0.35325989 0.19724345 0.061385594 0.17426045 Lag 6144 0.25131376 0.16371725 0.071877375 0.15011261 Lag 8192 0.14326239 0.09728171 0.018212371 0.07119263 Lag 10240 0.08689039 0.04923721 -0.004524255 0.06458949 gwesp.fixed.0 Lag 0 1.0000000 Lag 2048 0.6746308 Lag 4096 0.4618212 Lag 6144 0.3014276 Lag 8192 0.2107175 Lag 10240 0.1333792 Sample statistics burn-in diagnostic (Geweke): Chain 1 Fraction in 1st window = 0.1 Fraction in 2nd window = 0.5 edges nodefactor.dem.1 edgecov.joint_dem gwdeg.fixed.0 -1.143 -2.195 -2.068 -1.622 gwesp.fixed.0 -1.211 Individual P-values (lower = worse): edges nodefactor.dem.1 edgecov.joint_dem gwdeg.fixed.0 0.25285063 0.02818016 0.03863393 0.10486900 gwesp.fixed.0 0.22606896 Joint P-value (lower = worse): 0.212401 .
MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).
options(repr.plot.width=20, repr.plot.height=10)
mcmc.diagnostics(m2.c)
Sample statistics summary: Iterations = 290784:3852576 Thinning interval = 312 Number of chains = 1 Sample size per chain = 11417 1. Empirical mean and standard deviation for each variable, plus standard error of the mean: Mean SD Naive SE Time-series SE edges -0.7379 19.435 0.18189 0.8781 nodefactor.dem.1 -0.2601 15.128 0.14158 0.5677 edgecov.joint_dem -0.1223 4.007 0.03750 0.1165 gwdeg.fixed.0 -0.4866 9.534 0.08922 0.2897 gwesp.fixed.0 -0.2658 13.134 0.12292 0.6814 2. Quantiles for each variable: 2.5% 25% 50% 75% 97.5% edges -37 -14 -1 12 40 nodefactor.dem.1 -28 -11 -1 9 31 edgecov.joint_dem -7 -3 0 2 8 gwdeg.fixed.0 -20 -7 0 6 17 gwesp.fixed.0 -22 -10 -1 8 29 Are sample statistics significantly different from observed? edges nodefactor.dem.1 edgecov.joint_dem gwdeg.fixed.0 diff. -0.7379347 -0.2600508 -0.1222738 -0.48664273 test stat. -0.8404137 -0.4580759 -1.0492226 -1.68004510 P-val. 0.4006765 0.6468979 0.2940757 0.09294854 gwesp.fixed.0 Overall (Chi^2) diff. -0.2658317 NA test stat. -0.3901499 18.586465455 P-val. 0.6964257 0.002400931 Sample statistics cross-correlations: edges nodefactor.dem.1 edgecov.joint_dem gwdeg.fixed.0 edges 1.0000000 0.8064376 0.4286277 0.8502422 nodefactor.dem.1 0.8064376 1.0000000 0.7393320 0.7305253 edgecov.joint_dem 0.4286277 0.7393320 1.0000000 0.4130171 gwdeg.fixed.0 0.8502422 0.7305253 0.4130171 1.0000000 gwesp.fixed.0 0.8001978 0.6119440 0.3105163 0.5223755 gwesp.fixed.0 edges 0.8001978 nodefactor.dem.1 0.6119440 edgecov.joint_dem 0.3105163 gwdeg.fixed.0 0.5223755 gwesp.fixed.0 1.0000000 Sample statistics auto-correlation: Chain 1 edges nodefactor.dem.1 edgecov.joint_dem gwdeg.fixed.0 Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 Lag 312 0.7957683 0.7150144 0.5801672 0.5846007 Lag 624 0.6806121 0.5726600 0.3964272 0.4140633 Lag 936 0.6074834 0.4928086 0.3206351 0.3200706 Lag 1248 0.5605824 0.4460110 0.2760459 0.2849391 Lag 1560 0.5214302 0.4071122 0.2389041 0.2600111 gwesp.fixed.0 Lag 0 1.0000000 Lag 312 0.9331801 Lag 624 0.8747046 Lag 936 0.8199298 Lag 1248 0.7689267 Lag 1560 0.7218895 Sample statistics burn-in diagnostic (Geweke): Chain 1 Fraction in 1st window = 0.1 Fraction in 2nd window = 0.5 edges nodefactor.dem.1 edgecov.joint_dem gwdeg.fixed.0 -0.5416 -0.2782 0.7456 -0.4388 gwesp.fixed.0 -0.6521 Individual P-values (lower = worse): edges nodefactor.dem.1 edgecov.joint_dem gwdeg.fixed.0 0.5880912 0.7808294 0.4559219 0.6608426 gwesp.fixed.0 0.5143141 Joint P-value (lower = worse): 0.4488022 .
MCMC diagnostics shown here are from the last round of simulation, prior to computation of final parameter estimates. Because the final estimates are refinements of those used for this simulation run, these diagnostics may understate model performance. To directly assess the performance of the final model on in-model statistics, please use the GOF command: gof(ergmFitObject, GOF=~model).
statnet
suite of packages have lots of extensions of the ERGM and other network methods
xergm
package: lots of extensions to the ERGM
gwdegree
package: helps with understanding gwdegree terms (https://github.com/michaellevy/gwdegree)
Desmarais, BA & Cranmer, SJ. 2012. "Micro-level interpretations of exponential random graph models with applications to estuary networks." Policy Studies Journal.
Hunter, DR et al. 2008. "ergm: A package to fit, simulate and diagnose exponentiall-family models for networks." Journal of Statistical Software.
Levy, M. 2016. "gwdegree: Improving interpretation of geometrically-weighted degree estimates in exponential random graph models." Journal of Open Source Software.
Snijders, T et al. 2006. "New specifications for exponential random graph models." Sociological Methodology.