Title: | Exploratory Graph Analysis – a Framework for Estimating the Number of Dimensions in Multivariate Data using Network Psychometrics |
---|---|
Description: | Implements the Exploratory Graph Analysis (EGA) framework for dimensionality and psychometric assessment. EGA estimates the number of dimensions in psychological data using network estimation methods and community detection algorithms. A bootstrap method is provided to assess the stability of dimensions and items. Fit is evaluated using the Entropy Fit family of indices. Unique Variable Analysis evaluates the extent to which items are locally dependent (or redundant). Network loadings provide similar information to factor loadings and can be used to compute network scores. A bootstrap and permutation approach are available to assess configural and metric invariance. Hierarchical structures can be detected using Hierarchical EGA. Time series and intensive longitudinal data can be analyzed using Dynamic EGA, supporting individual, group, and population level assessments. |
Authors: | Hudson Golino [aut, cre] , Alexander Christensen [aut] , Robert Moulder [ctb] , Luis E. Garrido [ctb] , Laura Jamison [ctb] , Dingjing Shi [ctb] |
Maintainer: | Hudson Golino <[email protected]> |
License: | GPL (>= 3.0) |
Version: | 2.1.0 |
Built: | 2024-11-09 19:24:52 UTC |
Source: | https://github.com/hfgolino/eganet |
Implements the Exploratory Graph Analysis (EGA) framework for dimensionality and psychometric assessment. EGA estimates the number of dimensions in psychological data using network estimation methods and community detection algorithms. A bootstrap method is provided to assess the stability of dimensions and items. Fit is evaluated using the Entropy Fit family of indices. Unique Variable Analysis evaluates the extent to which items are locally dependent (or redundant). Network loadings provide similar information to factor loadings and can be used to compute network scores. A bootstrap and permutation approach are available to assess configural and metric invariance. Hierarchical structures can be detected using Hierarchical EGA. Time series and intensive longitudinal data can be analyzed using Dynamic EGA, supporting individual, group, and population level assessments.
Hudson Golino <[email protected]> and Alexander P. Christensen <[email protected]>
Christensen, A. P. (2023).
Unidimensional community detection: A Monte Carlo simulation, grid search, and comparison.
PsyArXiv.
# Related functions: community.unidimensional
Christensen, A. P., Garrido, L. E., & Golino, H. (2023).
Unique variable analysis: A network psychometrics method to detect local dependence.
Multivariate Behavioral Research.
# Related functions: UVA
Christensen, A. P., Garrido, L. E., Guerra-Pena, K., & Golino, H. (2023).
Comparing community detection algorithms in psychometric networks: A Monte Carlo simulation.
Behavior Research Methods.
# Related functions: EGA
Christensen, A. P., & Golino, H. (2021a).
Estimating the stability of the number of factors via Bootstrap Exploratory Graph Analysis: A tutorial.
Psych, 3(3), 479-500.
# Related functions: bootEGA
, dimensionStability
,
# and itemStability
Christensen, A. P., & Golino, H. (2021b).
Factor or network model? Predictions from neural networks.
Journal of Behavioral Data Science, 1(1), 85-126.
# Related functions: LCT
Christensen, A. P., & Golino, H. (2021c).
On the equivalency of factor and network loadings.
Behavior Research Methods, 53, 1563-1580.
# Related functions: LCT
and net.loads
Christensen, A. P., Golino, H., & Silvia, P. J. (2020).
A psychometric network perspective on the validity and validation of personality trait questionnaires.
European Journal of Personality, 34, 1095-1108.
# Related functions: bootEGA
, dimensionStability
,
# EGA
, itemStability
, and UVA
Christensen, A. P., Gross, G. M., Golino, H., Silvia, P. J., & Kwapil, T. R. (2019).
Exploratory graph analysis of the Multidimensional Schizotypy Scale.
Schizophrenia Research, 206, 43-51.
# Related functions: CFA
and EGA
Golino, H., Christensen, A. P., Moulder, R., Kim, S., & Boker, S. M. (2021).
Modeling latent topics in social media using Dynamic Exploratory Graph Analysis: The case of the right-wing and left-wing trolls in the 2016 US elections.
Psychometrika.
# Related functions: dynEGA
and simDFM
Golino, H., & Demetriou, A. (2017).
Estimating the dimensionality of intelligence like data using Exploratory Graph Analysis.
Intelligence, 62, 54-70.
# Related functions: EGA
Golino, H., & Epskamp, S. (2017).
Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research.
PLoS ONE, 12, e0174035.
# Related functions: CFA
, EGA
, and bootEGA
Golino, H., Moulder, R., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Nesselroade, J., Sadana, R., Thiyagarajan, J. A., & Boker, S. M. (2020).
Entropy fit indices: New fit measures for assessing the structure and dimensionality of multiple latent variables.
Multivariate Behavioral Research.
# Related functions: entropyFit
, tefi
, and vn.entropy
Golino, H., Nesselroade, J. R., & Christensen, A. P. (2022).
Towards a psychology of individuals: The ergodicity information index and a bottom-up approach for finding generalizations.
PsyArXiv.
# Related functions: boot.ergoInfo
, ergoInfo
,
jsd
, and infoCluster
Golino, H., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Sadana, R., Thiyagarajan, J. A., & Martinez-Molina, A. (2020).
Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors:
A simulation and tutorial.
Psychological Methods, 25, 292-320.
# Related functions: EGA
Golino, H., Thiyagarajan, J. A., Sadana, M., Teles, M., Christensen, A. P., & Boker, S. M. (2020).
Investigating the broad domains of intrinsic capacity, functional ability, and environment:
An exploratory graph analysis approach for improving analytical methodologies for measuring healthy aging.
PsyArXiv.
# Related functions: EGA.fit
and tefi
Jamison, L., Christensen, A. P., & Golino, H. (2021).
Optimizing Walktrap's community detection in networks using the Total Entropy Fit Index.
PsyArXiv.
# Related functions: EGA.fit
and tefi
Jamison, L., Golino, H., & Christensen, A. P. (2023).
Metric invariance in exploratory graph analysis via permutation testing.
PsyArXiv.
# Related functions: invariance
Shi, D., Christensen, A. P., Day, E., Golino, H., & Garrido, L. E. (2023).
A Bayesian approach for dimensionality assessment in psychological networks.
PsyArXiv
# Related functions: EGA
Useful links:
Report bugs at https://github.com/hfgolino/EGAnet/issues
This wrapper is similar to cor_auto
. There
are some minor adjustments that make this function simpler and to
function within EGAnet
. NA
values are not treated
as categories (this behavior differs from cor_auto
)
auto.correlate( data, corr = c("cosine", "kendall", "pearson", "spearman"), ordinal.categories = 7, forcePD = TRUE, na.data = c("pairwise", "listwise"), empty.method = c("none", "zero", "all"), empty.value = c("none", "point_five", "one_over"), verbose = FALSE, ... )
auto.correlate( data, corr = c("cosine", "kendall", "pearson", "spearman"), ordinal.categories = 7, forcePD = TRUE, na.data = c("pairwise", "listwise"), empty.method = c("none", "zero", "all"), empty.value = c("none", "point_five", "one_over"), verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
corr |
Character (length = 1).
The standard correlation method to be used.
Defaults to |
ordinal.categories |
Numeric (length = 1).
Up to the number of categories before a variable is considered continuous.
Defaults to |
forcePD |
Boolean (length = 1).
Whether positive definite matrix should be enforced.
Defaults to |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
empty.method |
Character (length = 1).
Method for empty cell correction in
|
empty.value |
Character (length = 1).
Value to add to the joint frequency table cells in
|
verbose |
Boolean (length = 1).
Whether messages should be printed.
Defaults to |
... |
Not actually used but makes it easier for general functionality in the package |
Alexander P. Christensen <[email protected]>
# Load data wmt <- wmt2[,7:24] # Obtain correlations wmt_corr <- auto.correlate(wmt)
# Load data wmt <- wmt2[,7:24] # Obtain correlations wmt_corr <- auto.correlate(wmt)
Tests the Ergodicity Information Index obtained in the empirical sample with a distribution of EII obtained by a variant of bootstrap sampling (see Details for the procedure)
boot.ergoInfo( dynEGA.object, EII, use = c("edge.list", "unweighted", "weighted"), shuffles = 5000, iter = 100, ncores, verbose = TRUE )
boot.ergoInfo( dynEGA.object, EII, use = c("edge.list", "unweighted", "weighted"), shuffles = 5000, iter = 100, ncores, verbose = TRUE )
dynEGA.object |
A |
EII |
A |
use |
Character (length = 1).
A string indicating what network element will be used
to compute the algorithm complexity, the list of edges or the weights of the network.
Defaults to
|
shuffles |
Numeric.
Number of shuffles used to compute the Kolmogorov complexity.
Defaults to |
iter |
Numeric (length = 1).
Number of replica samples to generate from the bootstrap analysis.
Defaults to |
ncores |
Numeric (length = 1).
Number of cores to use in computing results.
Defaults to If you're unsure how many cores your computer has,
then type: |
verbose |
Boolean (length = 1).
Should progress be displayed?
Defaults to |
In traditional bootstrap sampling, individual participants are resampled
with replacement from the empirical sample. This process is time consuming
when carried out across v number of variables, n number of
participants, t number of time points, and i number of iterations.
Instead, boot.ergoInfo
uses the premise of an ergodic process to
establish more efficient test that works directly on the sample's networks.
With an ergodic process, the expectation is that all individuals will have
a systematic relationship with the population. Destroying this relationship
should result in a significant loss of information. Following this conjecture,
boot.ergoInfo
shuffles a random subset of edges that exist in the
population that is equal to the number of shared edges
it has with an individual. An individual's unique edges remain the same,
controlling for their unique information. The result is a replicate individual
that contains the same total number of edges as the actual individual but
its shared information with the population has been scrambled.
This process is repeated over each individual to create a replicate sample and is repeated for X iterations (e.g., 100). This approach creates a sampling distribution that represents the expected information between the population and individuals when a random process generates the shared information between them. If the shared information between the population and individuals in the empirical sample is sufficiently meaningful, then this process should result in significant information loss.
How to interpret the results: the result of boot.ergoInfo
is a sampling
distribution of EII values that would be expected if the process was random
(null distribution). If the empirical EII value is greater than or
not significantly different from the null distribution, then the empirical
data can be expected to be generated from an nonergodic process and the
population structure is not sufficient to describe all individuals. If the
empirical EII value is significantly lower than the null distribution,
then the empirical data can be described by the population structure – the
population structure is sufficient to describe all individuals.
Returns a list containing:
empirical.ergoInfo |
Empirical Ergodicity Information Index |
boot.ergoInfo |
The values of the Ergodicity Information Index obtained in the bootstrap |
p.value |
The two-sided p-value of the bootstrap test for the Ergodicity Information Index. The null hypothesis is that the empirical Ergodicity Information index is equal to or greater than the expected value of the EII with small variation in the population structure |
effect |
Indicates wheter the empirical EII is greater or less then the bootstrap distribution of EII. |
interpretation |
How you can interpret the result of the test in plain English |
Hudson Golino <hfg9s at virginia.edu> & Alexander P. Christensen <alexander.christensen at Vanderbilt.Edu>
Original Implementation
Golino, H., Nesselroade, J. R., & Christensen, A. P. (2022).
Toward a psychology of individuals: The ergodicity information index and a bottom-up approach for finding generalizations.
PsyArXiv.
plot.EGAnet
for plot usage in EGAnet
# Obtain simulated data sim.data <- sim.dynEGA ## Not run: # Dynamic EGA individual and population structures dyn1 <- dynEGA.ind.pop( data = sim.dynEGA[,-26], n.embed = 5, tau = 1, delta = 1, id = 25, use.derivatives = 1, model = "glasso", ncores = 2, corr = "pearson" ) # Empirical Ergodicity Information Index eii1 <- ergoInfo(dynEGA.object = dyn1, use = "unweighted") # Bootstrap Test for Ergodicity Information Index testing.ergoinfo <- boot.ergoInfo( dynEGA.object = dyn1, EII = eii1, ncores = 2, use = "unweighted" ) # Plot result plot(testing.ergoinfo) # Example using `dynEGA` dyn2 <- dynEGA( data = sim.dynEGA, n.embed = 5, tau = 1, delta = 1, use.derivatives = 1, ncores = 2, level = c("individual", "population") ) # Empirical Ergodicity Information Index eii2 <- ergoInfo(dynEGA.object = dyn2, use = "unweighted") # Bootstrap Test for Ergodicity Information Index testing.ergoinfo2 <- boot.ergoInfo( dynEGA.object = dyn2, EII = eii2, ncores = 2 ) # Plot result plot(testing.ergoinfo2) ## End(Not run)
# Obtain simulated data sim.data <- sim.dynEGA ## Not run: # Dynamic EGA individual and population structures dyn1 <- dynEGA.ind.pop( data = sim.dynEGA[,-26], n.embed = 5, tau = 1, delta = 1, id = 25, use.derivatives = 1, model = "glasso", ncores = 2, corr = "pearson" ) # Empirical Ergodicity Information Index eii1 <- ergoInfo(dynEGA.object = dyn1, use = "unweighted") # Bootstrap Test for Ergodicity Information Index testing.ergoinfo <- boot.ergoInfo( dynEGA.object = dyn1, EII = eii1, ncores = 2, use = "unweighted" ) # Plot result plot(testing.ergoinfo) # Example using `dynEGA` dyn2 <- dynEGA( data = sim.dynEGA, n.embed = 5, tau = 1, delta = 1, use.derivatives = 1, ncores = 2, level = c("individual", "population") ) # Empirical Ergodicity Information Index eii2 <- ergoInfo(dynEGA.object = dyn2, use = "unweighted") # Bootstrap Test for Ergodicity Information Index testing.ergoinfo2 <- boot.ergoInfo( dynEGA.object = dyn2, EII = eii2, ncores = 2 ) # Plot result plot(testing.ergoinfo2) ## End(Not run)
bootEGA
Results of wmt2
DatabootEGA
results from boot.wmt <- bootEGA(wmt2[,7:24], seed = 1234)
data(boot.wmt)
data(boot.wmt)
A list with 12 objects (see Value in bootEGA
)
data("boot.wmt")
data("boot.wmt")
bootEGA
Estimates the number of dimensions of iter
bootstraps
using the empirical zero-order correlation matrix ("parametric"
) or
"resampling"
from the empirical dataset (non-parametric). bootEGA
estimates a typical median network structure, which is formed by the median or
mean pairwise (partial) correlations over the iter bootstraps (see
Details for information about the typical median network structure).
bootEGA( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), iter = 500, type = c("parametric", "resampling"), ncores, EGA.type = c("EGA", "EGA.fit", "hierEGA", "riEGA"), plot.itemStability = TRUE, typicalStructure = FALSE, plot.typicalStructure = FALSE, seed = NULL, verbose = TRUE, ... )
bootEGA( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), iter = 500, type = c("parametric", "resampling"), ncores, EGA.type = c("EGA", "EGA.fit", "hierEGA", "riEGA"), plot.itemStability = TRUE, typicalStructure = FALSE, plot.typicalStructure = FALSE, seed = NULL, verbose = TRUE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
n |
Numeric (length = 1).
Sample size if |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
algorithm |
Character or
|
uni.method |
Character (length = 1).
What unidimensionality method should be used?
Defaults to
|
iter |
Numeric (length = 1).
Number of replica samples to generate from the bootstrap analysis.
Defaults to |
type |
Character (length = 1).
What type of bootstrap should be performed?
Defaults to
|
ncores |
Numeric (length = 1).
Number of cores to use in computing results.
Defaults to If you're unsure how many cores your computer has,
then type: |
EGA.type |
Character (length = 1).
Type of EGA model to use.
Defaults to
Arguments for |
plot.itemStability |
Boolean (length = 1).
Should the plot be produced for |
typicalStructure |
Boolean (length = 1).
If |
plot.typicalStructure |
Boolean (length = 1).
If |
seed |
Numeric (length = 1).
Defaults to |
verbose |
Boolean (length = 1).
Should progress be displayed?
Defaults to |
... |
Additional arguments that can be passed on to
|
The typical network structure is derived from the median (or mean) value of each pairwise relationship. These values tend to reflect the "typical" value taken by an edge across the bootstrap networks. Afterward, the same community detection algorithm is applied to the typical network as the bootstrap networks.
Because the community detection algorithm is applied to the typical network structure,
there is a possibility that the community algorithm determines
a different number of dimensions than the median number derived from the bootstraps.
The typical network structure (and number of dimensions) may not
match the empirical EGA
number of dimensions or
the median number of dimensions from the bootstrap. This result is known
and not a bug.
Returns a list containing:
iter |
Number of replica samples in bootstrap |
bootGraphs |
A list containing the networks of each replica sample |
boot.wc |
A matrix of membership assignments for each replica network with variables down the columns and replicas across the rows |
boot.ndim |
Number of dimensions identified in each replica sample |
summary.table |
A data frame containing number of replica samples, median, standard deviation, standard error, 95% confidence intervals, and quantiles (lower = 2.5% and upper = 97.5%) |
frequency |
A data frame containing the proportion of times the number of dimensions was identified (e.g., .85 of 1,000 = 850 times that specific number of dimensions was found) |
TEFI |
|
type |
Type of bootstrap used |
EGA |
Output of the empirical EGA results
(output will vary based on |
EGA.type |
Type of |
typicalGraph |
A list containing:
|
plot.typical.ega |
Plot output if |
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Original implementation of bootEGA
Christensen, A. P., & Golino, H. (2021).
Estimating the stability of the number of factors via Bootstrap Exploratory Graph Analysis: A tutorial.
Psych, 3(3), 479-500.
itemStability
to estimate the stability of
the variables in the empirical dimensions and
dimensionStability
to estimate the stability of
the dimensions (structural consistency)
# Load data wmt <- wmt2[,7:24] ## Not run: # Standard EGA parametric example boot.wmt <- bootEGA( data = wmt, iter = 500, type = "parametric", ncores = 2 ) # Standard resampling example boot.wmt <- bootEGA( data = wmt, iter = 500, type = "resampling", ncores = 2 ) # Example using {igraph} `cluster_*` function boot.wmt.spinglass <- bootEGA( data = wmt, iter = 500, algorithm = igraph::cluster_spinglass, # use any function from {igraph} type = "parametric", ncores = 2 ) # EGA fit example boot.wmt.fit <- bootEGA( data = wmt, iter = 500, EGA.type = "EGA.fit", type = "parametric", ncores = 2 ) # Hierarchical EGA example boot.wmt.hier <- bootEGA( data = wmt, iter = 500, EGA.type = "hierEGA", type = "parametric", ncores = 2 ) # Random-intercept EGA example boot.wmt.ri <- bootEGA( data = wmt, iter = 500, EGA.type = "riEGA", type = "parametric", ncores = 2 ) ## End(Not run)
# Load data wmt <- wmt2[,7:24] ## Not run: # Standard EGA parametric example boot.wmt <- bootEGA( data = wmt, iter = 500, type = "parametric", ncores = 2 ) # Standard resampling example boot.wmt <- bootEGA( data = wmt, iter = 500, type = "resampling", ncores = 2 ) # Example using {igraph} `cluster_*` function boot.wmt.spinglass <- bootEGA( data = wmt, iter = 500, algorithm = igraph::cluster_spinglass, # use any function from {igraph} type = "parametric", ncores = 2 ) # EGA fit example boot.wmt.fit <- bootEGA( data = wmt, iter = 500, EGA.type = "EGA.fit", type = "parametric", ncores = 2 ) # Hierarchical EGA example boot.wmt.hier <- bootEGA( data = wmt, iter = 500, EGA.type = "hierEGA", type = "parametric", ncores = 2 ) # Random-intercept EGA example boot.wmt.ri <- bootEGA( data = wmt, iter = 500, EGA.type = "riEGA", type = "parametric", ncores = 2 ) ## End(Not run)
EGA
or hierEGA
StructureVerifies the fit of the structure suggested by
EGA
or by hierEGA
using
confirmatory factor analysis
CFA(ega.obj, data, estimator, plot.CFA = TRUE, layout = "spring", ...)
CFA(ega.obj, data, estimator, plot.CFA = TRUE, layout = "spring", ...)
ega.obj |
|
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
estimator |
The estimator used in the confirmatory factor analysis.
'WLSMV' is the estimator of choice for ordinal variables.
'ML' or 'WLS' for interval variables.
See |
plot.CFA |
Logical. Should the CFA structure with its standardized loadings be plot? Defaults to TRUE |
layout |
Layout of plot (see |
... |
Arguments passed to |
Returns a list containing:
fit |
Output from |
summary |
Summary output from |
fit.measures |
Fit measures: chi-squared,
degrees of freedom, p-value, CFI, RMSEA, GFI, and NFI.
Additional fit measures can be applied using the
|
Hudson F. Golino <hfg9s at virginia.edu>
Demonstrative use
Christensen, A. P., Gross, G. M., Golino, H., Silvia, P. J., & Kwapil, T. R. (2019).
Exploratory graph analysis of the Multidimensional Schizotypy Scale.
Schizophrenia Research, 206, 43-51.
Initial implementation
Golino, H., & Epskamp, S. (2017).
Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research.
PLoS ONE, 12, e0174035.
# Load data wmt <- wmt2[,7:24] ## Not run: # Estimate EGA ega.wmt <- EGA( data = wmt, plot.EGA = FALSE # No plot for CRAN checks ) # Fit CFA model to EGA results cfa.wmt <- CFA( ega.obj = ega.wmt, estimator = "WLSMV", plot.CFA = FALSE, # No plot for CRAN checks data = wmt ) # Additional fit measures lavaan::fitMeasures(cfa.wmt$fit, fit.measures = "all") ## End(Not run)
# Load data wmt <- wmt2[,7:24] ## Not run: # Estimate EGA ega.wmt <- EGA( data = wmt, plot.EGA = FALSE # No plot for CRAN checks ) # Fit CFA model to EGA results cfa.wmt <- CFA( ega.obj = ega.wmt, estimator = "WLSMV", plot.CFA = FALSE, # No plot for CRAN checks data = wmt ) # Additional fit measures lavaan::fitMeasures(cfa.wmt$fit, fit.measures = "all") ## End(Not run)
EGA
Color PalettesColor palettes for plotting ggnet2
EGA
network plots
color_palette_EGA( name = c("polychrome", "blue.ridge1", "blue.ridge2", "rainbow", "rio", "itacare", "grayscale"), wc, sorted = FALSE )
color_palette_EGA( name = c("polychrome", "blue.ridge1", "blue.ridge2", "rainbow", "rio", "itacare", "grayscale"), wc, sorted = FALSE )
name |
Character.
Name of color scheme (see
For custom colors, enter HEX codes for each dimension in a vector |
wc |
Numeric vector.
A vector representing the community (dimension) membership
of each node in the network. |
sorted |
Boolean.
Should colors be sorted by |
Vector of colors for community memberships
Hudson Golino <hfg9s at virginia.edu>, Alexander P. Christensen <alexpaulchristensen at gmail.com>
plot.EGAnet
for plot usage in EGAnet
# Default color_palette_EGA(name = "polychrome", wc = ega.wmt$wc) # Blue Ridge Moutains 1 color_palette_EGA(name = "blue.ridge1", wc = ega.wmt$wc) # Custom color_palette_EGA(name = c("#7FD1B9", "#24547e"), wc = ega.wmt$wc)
# Default color_palette_EGA(name = "polychrome", wc = ega.wmt$wc) # Blue Ridge Moutains 1 color_palette_EGA(name = "blue.ridge1", wc = ega.wmt$wc) # Custom color_palette_EGA(name = c("#7FD1B9", "#24547e"), wc = ega.wmt$wc)
A permutation implementation to determine statistical significance of whether the community comparison measure is different from zero
community.compare( base, comparison, method = c("vi", "nmi", "split.join", "rand", "adjusted.rand"), iter = 1000, shuffle.base = TRUE, verbose = TRUE, seed = NULL )
community.compare( base, comparison, method = c("vi", "nmi", "split.join", "rand", "adjusted.rand"), iter = 1000, shuffle.base = TRUE, verbose = TRUE, seed = NULL )
base |
Character or numeric vector. A vector of characters or numbers that are treated as the baseline communities |
comparison |
Character or numeric vector (length = |
method |
Character (length = 1).
Comparison metrics from
|
iter |
Numeric (length = 1).
Number of permutations to perform.
Defaults to |
shuffle.base |
Boolean (length = 1).
Whether the |
verbose |
Boolean (length = 1).
Should progress be displayed?
Defaults to |
seed |
Numeric (length = 1).
Defaults to |
Returns data frame containing method used (Method
), empirical or observed
value (Empirical
), and p-value based on the permutation test (p.value
)
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Implementation of Permutation Test
Qannari, E. M., Courcoux, P., & Faye, P. (2014).
Significance test of the adjusted Rand index. Application to the free sorting task.
Food Quality and Preference, 32, 93–97.
Variation of Information
Meila, M. (2003, August).
Comparing clusterings by the variation of information.
In Learning Theory and Kernel Machines: 16th Annual Conference on Learning Theory and 7th Kernel Workshop,
COLT/Kernel 2003, Washington, DC, USA, August 24-27, 2003. Proceedings (pp. 173-187). Berlin, DE: Springer Berlin Heidelberg.
Normalized Mutual Information
Danon, L., Diaz-Guilera, A., Duch, J., & Arenas, A. (2005).
Comparing community structure identification.
Journal of Statistical Mechanics: Theory and Experiment, 2005(09), P09008.
Split-join Distance
Dongen, S. (2000).
Performance criteria for graph clustering and Markov cluster experiments.
CWI (Centre for Mathematics and Computer Science).
Rand Index
Rand, W. M. (1971).
Objective criteria for the evaluation of clustering methods.
Journal of the American Statistical Association, 66(336), 846-850.
Adjusted Rand Index
Hubert, L., & Arabie, P. (1985).
Comparing partitions.
Journal of Classification, 2, 193-218.
Steinley, D. (2004). Properties of the Hubert-Arabie adjusted rand index. Psychological Methods, 9(3), 386.
# Load data wmt <- wmt2[,7:24] # Estimate network network <- EBICglasso.qgraph(data = wmt) # Compute Edge Betweenness edge_between <- community.detection(network, algorithm = "edge_betweenness") # Compute Fast Greedy fast_greedy <- community.detection(network, algorithm = "fast_greedy") # Perform permutation test community.compare(edge_between, fast_greedy)
# Load data wmt <- wmt2[,7:24] # Estimate network network <- EBICglasso.qgraph(data = wmt) # Compute Edge Betweenness edge_between <- community.detection(network, algorithm = "edge_betweenness") # Compute Fast Greedy fast_greedy <- community.detection(network, algorithm = "fast_greedy") # Perform permutation test community.compare(edge_between, fast_greedy)
Applies the consensus clustering method introduced by (Lancichinetti & Fortunato, 2012). The original implementation of this method applies a community detection algorithm repeatedly to the same network. With stochastic networks, the algorithm is likely to identify different community solutions with many repeated applications.
community.consensus( network, order = c("lower", "higher"), resolution = 1, consensus.method = c("highest_modularity", "iterative", "most_common", "lowest_tefi"), consensus.iter = 1000, correlation.matrix = NULL, allow.singleton = FALSE, membership.only = TRUE, ... )
community.consensus( network, order = c("lower", "higher"), resolution = 1, consensus.method = c("highest_modularity", "iterative", "most_common", "lowest_tefi"), consensus.iter = 1000, correlation.matrix = NULL, allow.singleton = FALSE, membership.only = TRUE, ... )
network |
Matrix or |
order |
Character (length = 1).
Defaults to |
resolution |
Numeric (length = 1).
A parameter that adjusts modularity to allow the algorithm to
prefer smaller ( |
consensus.method |
Character (length = 1).
Defaults to
|
consensus.iter |
Numeric (length = 1).
Number of algorithm applications to the network.
Defaults to |
correlation.matrix |
Symmetric matrix.
Used for computation of |
allow.singleton |
Boolean (length = 1).
Whether singleton or single node communities should be allowed.
Defaults to |
membership.only |
Boolean.
Whether the memberships only should be output.
Defaults to |
... |
Not actually used but makes it easier for general functionality in the package |
The goal of the consensus clustering method is to identify a stable solution across algorithm applications to derive a "consensus" clustering. The standard or "iterative" approach is to apply the community detection algorithm N times. Then, a co-occurrence matrix is created representing how often each pair of nodes co-occurred across the applications. Based on some cut-off value (e.g., 0.30), co-occurrences below this value are set to zero, forming a "new" sparse network. The procedure proceeds until all nodes co-occur with all other nodes in their community (or a proportion of 1.00).
Variations of this procedure are also available in this package but are experimental. Use these experimental procedures with caution. More work is necessary before these experimental procedures are validated
At this time, seed setting for consensus clustering is not supported
Returns either a vector with the selected solution
or a list when membership.only = FALSE
:
selected_solution |
Resulting solution from the consensus method |
memberships |
Matrix of memberships across the consensus iterations |
proportion_table |
For methods that use frequency, a table that reports those frequencies alongside their corresponding memberships |
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Louvain algorithm
Blondel, V. D., Guillaume, J.-L., Lambiotte, R., & Lefebvre, E. (2008).
Fast unfolding of communities in large networks.
Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.
Consensus clustering
Lancichinetti, A., & Fortunato, S. (2012).
Consensus clustering in complex networks.
Scientific Reports, 2(1), 1–7.
Entropy fit indices
Golino, H., Moulder, R. G., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Nesselroade, J., Sadana, R., Thiyagarajan, J. A., & Boker, S. M. (2020).
Entropy fit indices: New fit measures for assessing the structure and dimensionality of multiple latent variables.
Multivariate Behavioral Research.
# Load data wmt <- wmt2[,7:24] # Estimate correlation matrix correlation.matrix <- auto.correlate(wmt) # Estimate network network <- EBICglasso.qgraph(data = wmt) # Compute standard Louvain with highest modularity approach community.consensus( network, consensus.method = "highest_modularity" ) # Compute standard Louvain with iterative (original) approach community.consensus( network, consensus.method = "iterative" ) # Compute standard Louvain with most common approach community.consensus( network, consensus.method = "most_common" ) # Compute standard Louvain with lowest TEFI approach community.consensus( network, consensus.method = "lowest_tefi", correlation.matrix = correlation.matrix )
# Load data wmt <- wmt2[,7:24] # Estimate correlation matrix correlation.matrix <- auto.correlate(wmt) # Estimate network network <- EBICglasso.qgraph(data = wmt) # Compute standard Louvain with highest modularity approach community.consensus( network, consensus.method = "highest_modularity" ) # Compute standard Louvain with iterative (original) approach community.consensus( network, consensus.method = "iterative" ) # Compute standard Louvain with most common approach community.consensus( network, consensus.method = "most_common" ) # Compute standard Louvain with lowest TEFI approach community.consensus( network, consensus.method = "lowest_tefi", correlation.matrix = correlation.matrix )
General function to apply community detection algorithms available in
igraph
. Follows the EGAnet
approach of setting
singleton and disconnected nodes to missing (NA
)
community.detection( network, algorithm = c("edge_betweenness", "fast_greedy", "fluid", "infomap", "label_prop", "leading_eigen", "leiden", "louvain", "optimal", "spinglass", "walktrap"), allow.singleton = FALSE, membership.only = TRUE, ... )
community.detection( network, algorithm = c("edge_betweenness", "fast_greedy", "fluid", "infomap", "label_prop", "leading_eigen", "leiden", "louvain", "optimal", "spinglass", "walktrap"), allow.singleton = FALSE, membership.only = TRUE, ... )
network |
Matrix or |
algorithm |
Character or
|
allow.singleton |
Boolean (length = 1).
Whether singleton or single node communities should be allowed.
Defaults to |
membership.only |
Boolean (length = 1).
Whether the memberships only should be output.
Defaults to |
... |
Additional arguments to be passed on to
|
Returns memberships from a community detection algorithm
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Csardi, G., & Nepusz, T. (2006). The igraph software package for complex network research. InterJournal, Complex Systems, 1695.
# Load data wmt <- wmt2[,7:24] # Estimate network network <- EBICglasso.qgraph(data = wmt) # Compute Edge Betweenness community.detection(network, algorithm = "edge_betweenness") # Compute Fast Greedy community.detection(network, algorithm = "fast_greedy") # Compute Fluid community.detection( network, algorithm = "fluid", no.of.communities = 2 # needs to be set ) # Compute Infomap community.detection(network, algorithm = "infomap") # Compute Label Propagation community.detection(network, algorithm = "label_prop") # Compute Leading Eigenvector community.detection(network, algorithm = "leading_eigen") # Compute Leiden (with modularity) community.detection( network, algorithm = "leiden", objective_function = "modularity" ) # Compute Leiden (with CPM) community.detection( network, algorithm = "leiden", objective_function = "CPM", resolution_parameter = 0.05 # "edge density" ) # Compute Louvain community.detection(network, algorithm = "louvain") # Compute Optimal (identifies maximum modularity solution) community.detection(network, algorithm = "optimal") # Compute Spinglass community.detection(network, algorithm = "spinglass") # Compute Walktrap community.detection(network, algorithm = "walktrap") # Example with {igraph} network community.detection( convert2igraph(network), algorithm = "walktrap" )
# Load data wmt <- wmt2[,7:24] # Estimate network network <- EBICglasso.qgraph(data = wmt) # Compute Edge Betweenness community.detection(network, algorithm = "edge_betweenness") # Compute Fast Greedy community.detection(network, algorithm = "fast_greedy") # Compute Fluid community.detection( network, algorithm = "fluid", no.of.communities = 2 # needs to be set ) # Compute Infomap community.detection(network, algorithm = "infomap") # Compute Label Propagation community.detection(network, algorithm = "label_prop") # Compute Leading Eigenvector community.detection(network, algorithm = "leading_eigen") # Compute Leiden (with modularity) community.detection( network, algorithm = "leiden", objective_function = "modularity" ) # Compute Leiden (with CPM) community.detection( network, algorithm = "leiden", objective_function = "CPM", resolution_parameter = 0.05 # "edge density" ) # Compute Louvain community.detection(network, algorithm = "louvain") # Compute Optimal (identifies maximum modularity solution) community.detection(network, algorithm = "optimal") # Compute Spinglass community.detection(network, algorithm = "spinglass") # Compute Walktrap community.detection(network, algorithm = "walktrap") # Example with {igraph} network community.detection( convert2igraph(network), algorithm = "walktrap" )
Memberships from community detection algorithms do not always
align numerically. This function seeks to homogenize
community memberships between a target membership (the
membership to homogenize toward) and one or more other
memberships. This function is the core of the
dimensionStability
and
itemStability
functions
community.homogenize(target.membership, convert.membership)
community.homogenize(target.membership, convert.membership)
target.membership |
Vector, matrix, or data frame.
The target memberships that all other memberships input into
|
convert.membership |
Vector, matrix, or data frame.
Either a vector of memberships the same length as
|
Returns a vector or matrix the length or size of
convert.membership
with memberships homogenized toward
target.membership
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Original implementation of bootEGA
Christensen, A. P., & Golino, H. (2021).
Estimating the stability of the number of factors via Bootstrap Exploratory Graph Analysis: A tutorial.
Psych, 3(3), 479-500.
# Get network network <- network.estimation(wmt2[,7:24]) # Apply Walktrap network_walktrap <- community.detection( network, algorithm = "walktrap" ) # Apply Louvain network_louvain <- community.detection( network, algorithm = "louvain" ) # Homogenize toward Walktrap community.homogenize(network_walktrap, network_louvain)
# Get network network <- network.estimation(wmt2[,7:24]) # Apply Walktrap network_walktrap <- community.detection( network, algorithm = "walktrap" ) # Apply Louvain network_louvain <- community.detection( network, algorithm = "louvain" ) # Homogenize toward Walktrap community.homogenize(network_walktrap, network_louvain)
A function to apply several approaches to detect a unidimensional community in
networks. There have many different approaches recently such as expanding
the correlation matrix to have orthogonal correlations ("expand"
),
applying the Leading Eigenvalue community detection algorithm
cluster_leading_eigen
to the correlation matrix
("LE"
), and applying the Louvain community detection algorithm
cluster_louvain
to the correlation matrix ("louvain"
).
Not necessarily intended for individual use – it's better to use EGA
community.unidimensional( data, n = NULL, corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), uni.method = c("expand", "LE", "louvain"), verbose = FALSE, ... )
community.unidimensional( data, n = NULL, corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), uni.method = c("expand", "LE", "louvain"), verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables that are desired to be in analysis |
n |
Numeric (length = 1).
Sample size if |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
uni.method |
Character (length = 1).
What unidimensionality method should be used?
Defaults to
|
verbose |
Boolean.
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Additional arguments to be passed on to
|
Returns the memberships of the community detection algorithm. The memberships will output regardless of whether the network is unidimensional
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Expand approach
Golino, H., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Sadana, R., Thiyagarajan, J. A., & Martinez-Molina, A. (2020).
Investigating the performance of exploratory graph analysis and traditional techniques to identify the number of latent factors:
A simulation and tutorial.
Psychological Methods, 25, 292-320.
Leading Eigenvector approach
Christensen, A. P., Garrido, L. E., Guerra-Pena, K., & Golino, H. (2023).
Comparing community detection algorithms in psychometric networks: A Monte Carlo simulation.
Behavior Research Methods.
Louvain approach
Christensen, A. P. (2023).
Unidimensional community detection: A Monte Carlo simulation, grid search, and comparison.
PsyArXiv.
# Load data wmt <- wmt2[,7:24] # Louvain with Consensus Clustering (default) community.unidimensional(wmt) # Leading Eigenvector community.unidimensional(wmt, uni.method = "LE") # Expand community.unidimensional(wmt, uni.method = "expand")
# Load data wmt <- wmt2[,7:24] # Louvain with Consensus Clustering (default) community.unidimensional(wmt) # Leading Eigenvector community.unidimensional(wmt, uni.method = "LE") # Expand community.unidimensional(wmt, uni.method = "expand")
EGAnet
plotsOrganizes EGA plots for comparison. Ensures that nodes are placed in the same layout to maximize comparison
compare.EGA.plots( ..., input.list = NULL, base = 1, labels = NULL, rows = NULL, columns = NULL, plot.all = TRUE )
compare.EGA.plots( ..., input.list = NULL, base = 1, labels = NULL, rows = NULL, columns = NULL, plot.all = TRUE )
... |
Handles multiple arguments:
|
input.list |
List.
Bypasses |
base |
Numeric (length = 1).
Plot to be used as the base for the configuration of the networks.
Uses the number of the order in which the plots are input.
Defaults to |
labels |
Character (same length as input).
Labels for each |
rows |
Numeric (length = 1). Number of rows to spread plots across |
columns |
Numeric (length = 1). Number of columns to spread plots down |
plot.all |
Boolean (length = 1).
Whether plot should be produced or just output.
Defaults to |
Visual comparison of EGAnet
objects
Alexander Christensen <[email protected]>
plot.EGAnet
for plot usage in EGAnet
# Obtain WMT-2 data wmt <- wmt2[,7:24] # Draw random samples of 300 cases sample1 <- wmt[sample(1:nrow(wmt), 300),] sample2 <- wmt[sample(1:nrow(wmt), 300),] # Estimate EGAs ega1 <- EGA(sample1) ega2 <- EGA(sample2) # Compare EGAs via plot compare.EGA.plots( ega1, ega2, base = 1, # use "ega1" as base for comparison labels = c("Sample 1", "Sample 2"), rows = 1, columns = 2 ) # Change layout to circle plots compare.EGA.plots( ega1, ega2, labels = c("Sample 1", "Sample 2"), mode = "circle" )
# Obtain WMT-2 data wmt <- wmt2[,7:24] # Draw random samples of 300 cases sample1 <- wmt[sample(1:nrow(wmt), 300),] sample2 <- wmt[sample(1:nrow(wmt), 300),] # Estimate EGAs ega1 <- EGA(sample1) ega2 <- EGA(sample2) # Compare EGAs via plot compare.EGA.plots( ega1, ega2, base = 1, # use "ega1" as base for comparison labels = c("Sample 1", "Sample 2"), rows = 1, columns = 2 ) # Change layout to circle plots compare.EGA.plots( ega1, ega2, labels = c("Sample 1", "Sample 2"), mode = "circle" )
igraph
Converts networks to igraph
format
convert2igraph(A, diagonal = 0)
convert2igraph(A, diagonal = 0)
A |
Matrix or data frame. N x N matrix where N is the number of nodes |
diagonal |
Numeric.
Value to be placed on the diagonal of |
Returns a network in the igraph
format
Hudson Golino <hfg9s at virginia.edu> & Alexander P. Christensen <alexander.christensen at Vanderbilt.Edu>
convert2igraph(ega.wmt$network)
convert2igraph(ega.wmt$network)
tidygraph
Converts networks to tidygraph
format
convert2tidygraph(EGA.object)
convert2tidygraph(EGA.object)
EGA.object |
A single |
Returns a network in the tidygraph
format
Dominique Makowski, Hudson Golino <hfg9s at virginia.edu>, & Alexander P. Christensen <alexander.christensen at Vanderbilt.Edu>
convert2tidygraph(ega.wmt)
convert2tidygraph(ega.wmt)
Computes cosine similarity
cosine(x, y = NULL, ...)
cosine(x, y = NULL, ...)
x |
Numeric vector, matrix, or data frame.
If |
y |
Numeric vector, matrix, or data frame.
Only used if |
... |
Not actually used but makes it easier for general functionality in the package |
On missing values: 0
will be used to replace missing values.
When using (matrix) multiplication, the 0
value cancels out the
product rendering the missing value as "not counting" in the sums
Alexander P. Christensen <[email protected]>
# Load data wmt <- wmt2[,7:24] # Obtain cosines wmt_cosine <- cosine(wmt)
# Load data wmt <- wmt2[,7:24] # Obtain cosines wmt_cosine <- cosine(wmt)
A response matrix (n = 574) of the Beck Depression Inventory, Beck Anxiety Inventory, and the Athens Insomnia Scale.
data(depression)
data(depression)
A 574x78 response matrix
data("depression")
data("depression")
bootEGA
Based on the bootEGA
results,
this function computes the stability of dimensions. Stability is
computed by assessing the proportion of times the
original dimension is exactly replicated in across bootstrap samples
dimensionStability(bootega.obj, IS.plot = TRUE, structure = NULL, ...)
dimensionStability(bootega.obj, IS.plot = TRUE, structure = NULL, ...)
bootega.obj |
A |
IS.plot |
Boolean (length = 1).
Should the plot be produced for |
structure |
Numeric (length = number of variables).
A theoretical or pre-defined structure.
Defaults to |
... |
Additional arguments.
Used for deprecated arguments from previous versions of |
Returns a list containing:
dimension.stability |
A list containing: |
item.stability |
Results from |
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Original implementation of bootEGA
Christensen, A. P., & Golino, H. (2021).
Estimating the stability of the number of factors via Bootstrap Exploratory Graph Analysis: A tutorial.
Psych, 3(3), 479-500.
Conceptual introduction
Christensen, A. P., Golino, H., & Silvia, P. J. (2020).
A psychometric network perspective on the validity and validation of personality trait questionnaires.
European Journal of Personality, 34(6), 1095-1108.
# Load data wmt <- wmt2[,7:24] ## Not run: # Estimate bootstrap EGA boot.wmt <- bootEGA( data = wmt, iter = 500, type = "parametric", ncores = 2 ) ## End(Not run) # Estimate stability statistics dimensionStability(boot.wmt)
# Load data wmt <- wmt2[,7:24] ## Not run: # Estimate bootstrap EGA boot.wmt <- bootEGA( data = wmt, iter = 500, type = "parametric", ncores = 2 ) ## End(Not run) # Estimate stability statistics dimensionStability(boot.wmt)
A list of weights from four different neural network models:
random vs. non-random model (r_nr_weights
),
low correlation factor vs. network model (lf_n_weights
),
high correlation with variables less than or equal to factors vs. network model (hlf_n_weights
), and
high correlation with variables greater than factors vs. network model (hgf_n_weights
)
data(dnn.weights)
data(dnn.weights)
A list of with a length of 4
data("dnn.weights")
data("dnn.weights")
Estimates dynamic communities in multivariate time series (e.g., panel data, longitudinal data, intensive longitudinal data) at multiple time scales and at different levels of analysis: individuals (intraindividual structure), groups, and population (interindividual structure)
dynEGA( data, id = NULL, group = NULL, n.embed = 5, tau = 1, delta = 1, use.derivatives = 1, level = c("individual", "group", "population"), corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), ncores, verbose = TRUE, ... )
dynEGA( data, id = NULL, group = NULL, n.embed = 5, tau = 1, delta = 1, use.derivatives = 1, level = c("individual", "group", "population"), corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), ncores, verbose = TRUE, ... )
data |
Matrix or data frame. Participants and variable should be in long format such that row t represents observations for all variables at time point t for a participant. The next row, t + 1, represents the next measurement occasion for that same participant. The next participant's data should immediately follow, in the same pattern, after the previous participant
For groups, Arguments A measurement occasion variable is not necessary and should be removed from the data before proceeding with the analysis |
id |
Numeric or character (length = 1).
Number or name of the column identifying each individual.
Defaults to |
group |
Numeric or character (length = 1).
Number of the column identifying group membership.
Defaults to |
n.embed |
Numeric (length = 1).
Defaults to |
tau |
Numeric (length = 1).
Defaults to |
delta |
Numeric (length = 1).
Defaults to |
use.derivatives |
Numeric (length = 1).
Defaults to
Generally recommended to leave "as is" |
level |
Character vector (up to length of 3). A character vector indicating which level(s) to estimate: |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
algorithm |
Character or
|
uni.method |
Character (length = 1).
What unidimensionality method should be used?
Defaults to
|
ncores |
Numeric (length = 1).
Number of cores to use in computing results.
Defaults to If you're unsure how many cores your computer has,
then type: |
verbose |
Boolean (length = 1).
Should progress be displayed?
Defaults to |
... |
Additional arguments to be passed on to
|
Derivatives for each variable's time series for each participant are
estimated using generalized local linear approximation (see glla
).
EGA
is then applied to these derivatives to model how variables
are changing together over time. Variables that change together over time are detected
as communities
A list containing:
Derivatives |
A list containing:
|
dynEGA |
A list containing: |
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Generalized local linear approximation
Boker, S. M., Deboeck, P. R., Edler, C., & Keel, P. K. (2010)
Generalized local linear approximation of derivatives from time series. In S.-M. Chow, E. Ferrer, & F. Hsieh (Eds.),
The Notre Dame series on quantitative methodology. Statistical methods for modeling human dynamics: An interdisciplinary dialogue,
(p. 161-178). Routledge/Taylor & Francis Group.
Deboeck, P. R., Montpetit, M. A., Bergeman, C. S., & Boker, S. M. (2009) Using derivative estimates to describe intraindividual variability at multiple time scales. Psychological Methods, 14(4), 367-386.
Original dynamic EGA implementation
Golino, H., Christensen, A. P., Moulder, R. G., Kim, S., & Boker, S. M. (2021).
Modeling latent topics in social media using Dynamic Exploratory Graph Analysis: The case of the right-wing and left-wing trolls in the 2016 US elections.
Psychometrika.
Time delay embedding procedure
Savitzky, A., & Golay, M. J. (1964).
Smoothing and differentiation of data by simplified least squares procedures.
Analytical Chemistry, 36(8), 1627-1639.
plot.EGAnet
for plot usage in EGAnet
# Population structure simulated_population <- dynEGA( data = sim.dynEGA, level = "population" # uses simulated data in package # useful to understand how data should be structured ) # Group structure simulated_group <- dynEGA( data = sim.dynEGA, level = "group" # uses simulated data in package # useful to understand how data should be structured ) ## Not run: # Individual structure simulated_individual <- dynEGA( data = sim.dynEGA, level = "individual", ncores = 2, # use more for quicker results verbose = TRUE # progress bar ) # Population, group, and individual structure simulated_all <- dynEGA( data = sim.dynEGA, level = c("individual", "group", "population"), ncores = 2, # use more for quicker results verbose = TRUE # progress bar ) # Plot population plot(simulated_all$dynEGA$population) # Plot groups plot(simulated_all$dynEGA$group) # Plot individual plot(simulated_all$dynEGA$individual, id = 1) # Step through all plots # Unless `id` is specified, 4 random IDs # will be drawn from individuals plot(simulated_all) ## End(Not run)
# Population structure simulated_population <- dynEGA( data = sim.dynEGA, level = "population" # uses simulated data in package # useful to understand how data should be structured ) # Group structure simulated_group <- dynEGA( data = sim.dynEGA, level = "group" # uses simulated data in package # useful to understand how data should be structured ) ## Not run: # Individual structure simulated_individual <- dynEGA( data = sim.dynEGA, level = "individual", ncores = 2, # use more for quicker results verbose = TRUE # progress bar ) # Population, group, and individual structure simulated_all <- dynEGA( data = sim.dynEGA, level = c("individual", "group", "population"), ncores = 2, # use more for quicker results verbose = TRUE # progress bar ) # Plot population plot(simulated_all$dynEGA$population) # Plot groups plot(simulated_all$dynEGA$group) # Plot individual plot(simulated_all$dynEGA$individual, id = 1) # Step through all plots # Unless `id` is specified, 4 random IDs # will be drawn from individuals plot(simulated_all) ## End(Not run)
dynEGA
A wrapper function to estimate both intraindividiual
(level = "individual"
) and interindividual (level = "population"
)
structures using dynEGA
dynEGA.ind.pop( data, id = NULL, n.embed = 5, tau = 1, delta = 1, use.derivatives = 1, corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), ncores, verbose = TRUE, ... )
dynEGA.ind.pop( data, id = NULL, n.embed = 5, tau = 1, delta = 1, use.derivatives = 1, corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), ncores, verbose = TRUE, ... )
data |
Matrix or data frame. Participants and variable should be in long format such that row t represents observations for all variables at time point t for a participant. The next row, t + 1, represents the next measurement occasion for that same participant. The next participant's data should immediately follow, in the same pattern, after the previous participant
For groups, Arguments A measurement occasion variable is not necessary and should be removed from the data before proceeding with the analysis |
id |
Numeric or character (length = 1).
Number or name of the column identifying each individual.
Defaults to |
n.embed |
Numeric (length = 1).
Defaults to |
tau |
Numeric (length = 1).
Defaults to |
delta |
Numeric (length = 1).
Defaults to |
use.derivatives |
Numeric (length = 1).
Defaults to
Generally recommended to leave "as is" |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
algorithm |
Character or
|
uni.method |
Character (length = 1).
What unidimensionality method should be used?
Defaults to
|
ncores |
Numeric (length = 1).
Number of cores to use in computing results.
Defaults to If you're unsure how many cores your computer has,
then type: |
verbose |
Boolean (length = 1).
Should progress be displayed?
Defaults to |
... |
Additional arguments to be passed on to
|
Same output as EGAnet{dynEGA}
returning list
objects for level = "individual"
and level = "population"
Hudson Golino <hfg9s at virginia.edu>
plot.EGAnet
for plot usage in EGAnet
# Obtain data sim.dynEGA <- sim.dynEGA # bypasses CRAN checks ## Not run: # Dynamic EGA individual and population structure dyn.ega1 <- dynEGA.ind.pop( data = sim.dynEGA, n.embed = 5, tau = 1, delta = 1, id = 25, use.derivatives = 1, ncores = 2, corr = "pearson" ) ## End(Not run)
# Obtain data sim.dynEGA <- sim.dynEGA # bypasses CRAN checks ## Not run: # Dynamic EGA individual and population structure dyn.ega1 <- dynEGA.ind.pop( data = sim.dynEGA, n.embed = 5, tau = 1, delta = 1, id = 25, use.derivatives = 1, ncores = 2, corr = "pearson" ) ## End(Not run)
EBICglasso
from qgraph
1.4.4This function uses the glasso
package
(Friedman, Hastie and Tibshirani, 2011) to compute a
sparse gaussian graphical model with the graphical lasso
(Friedman, Hastie & Tibshirani, 2008).
The tuning parameter is chosen using the Extended Bayesian Information criterion
(EBIC) described by Foygel & Drton (2010).
EBICglasso.qgraph( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), gamma = 0.5, penalize.diagonal = FALSE, nlambda = 100, lambda.min.ratio = 0.1, returnAllResults = FALSE, penalizeMatrix, countDiagonal = FALSE, refit = FALSE, model.selection = c("EBIC", "JSD"), verbose = FALSE, ... )
EBICglasso.qgraph( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), gamma = 0.5, penalize.diagonal = FALSE, nlambda = 100, lambda.min.ratio = 0.1, returnAllResults = FALSE, penalizeMatrix, countDiagonal = FALSE, refit = FALSE, model.selection = c("EBIC", "JSD"), verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
n |
Numeric (length = 1).
Sample size if |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
gamma |
Numeric (length = 1)
EBIC tuning parameter.
Defaults to |
penalize.diagonal |
Boolean (length = 1).
Should the diagonal be penalized?
Defaults to |
nlambda |
Numeric (length = 1).
Number of lambda values to test.
Defaults to |
lambda.min.ratio |
Numeric (length = 1).
Ratio of lowest lambda value compared to maximal lambda.
Defaults to |
returnAllResults |
Boolean (length = 1).
Whether all results should be returned.
Defaults to |
penalizeMatrix |
Boolean matrix. Optional logical matrix to indicate which elements are penalized |
countDiagonal |
Boolean (length = 1).
Should diagonal be counted in EBIC computation?
Defaults to |
refit |
Boolean (length = 1).
Should the optimal graph be refitted without LASSO regularization?
Defaults to |
model.selection |
Character (length = 1).
How lambda should be selected within GLASSO.
Defaults to |
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Arguments sent to |
The glasso is run for 100 values of the tuning parameter logarithmically
spaced between the maximal value of the tuning parameter at which all edges are zero,
lambda_max, and lambda_max/100. For each of these graphs the EBIC is computed and
the graph with the best EBIC is selected. The partial correlation matrix
is computed using wi2net
and returned.
A partial correlation matrix
Sacha Epskamp; for maintanence, Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen at gmail.com>
Instantiation of GLASSO
Friedman, J., Hastie, T., & Tibshirani, R. (2008).
Sparse inverse covariance estimation with the graphical lasso.
Biostatistics, 9, 432-441.
glasso + EBIC
Foygel, R., & Drton, M. (2010).
Extended Bayesian information criteria for Gaussian graphical models.
In Advances in neural information processing systems (pp. 604-612).
glasso package
Friedman, J., Hastie, T., & Tibshirani, R. (2011).
glasso: Graphical lasso-estimation of Gaussian graphical models.
R package version 1.7.
Tutorial on EBICglasso
Epskamp, S., & Fried, E. I. (2018).
A tutorial on regularized partial correlation networks.
Psychological Methods, 23(4), 617–634.
# Obtain data wmt <- wmt2[,7:24] # Compute graph with tuning = 0 (BIC) BICgraph <- EBICglasso.qgraph(data = wmt, gamma = 0) # Compute graph with tuning = 0.5 (EBIC) EBICgraph <- EBICglasso.qgraph(data = wmt, gamma = 0.5)
# Obtain data wmt <- wmt2[,7:24] # Compute graph with tuning = 0 (BIC) BICgraph <- EBICglasso.qgraph(data = wmt, gamma = 0) # Compute graph with tuning = 0.5 (EBIC) EBICgraph <- EBICglasso.qgraph(data = wmt, gamma = 0.5)
Estimates the number of communities (dimensions) of a dataset or correlation matrix using a network estimation method (Golino & Epskamp, 2017; Golino et al., 2020). After, a community detection algorithm is applied (Christensen et al., 2023) for multidimensional data. A unidimensional check is also applied based on findings from Golino et al. (2020) and Christensen (2023)
EGA( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), plot.EGA = TRUE, verbose = FALSE, ... )
EGA( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), plot.EGA = TRUE, verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis. Can be raw data or a correlation matrix |
n |
Numeric (length = 1).
Sample size if |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
algorithm |
Character or
|
uni.method |
Character (length = 1).
What unidimensionality method should be used?
Defaults to
|
plot.EGA |
Boolean (length = 1).
Defaults to |
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Additional arguments to be passed on to
|
Returns a list containing:
network |
A matrix containing a network estimated using
|
wc |
A vector representing the community (dimension) membership
of each node in the network. |
n.dim |
A scalar of how many total dimensions were identified in the network |
correlation |
The zero-order correlation matrix |
n |
Number of cases in |
dim.variables |
An ordered matrix of item allocation |
TEFI |
|
plot.EGA |
Plot output if |
Hudson Golino <hfg9s at virginia.edu>, Alexander P. Christensen <alexpaulchristensen at gmail.com>, Maria Dolores Nieto <acinodam at gmail.com> and Luis E. Garrido <garrido.luiseduardo at gmail.com>
Original simulation and implementation of EGA
Golino, H. F., & Epskamp, S. (2017).
Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research.
PLoS ONE, 12, e0174035.
Current implementation of EGA, introduced unidimensional checks, continuous and dichotomous data
Golino, H., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Sadana, R., & Thiyagarajan, J. A. (2020).
Investigating the performance of Exploratory Graph Analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial.
Psychological Methods, 25, 292-320.
Compared all igraph community detection algorithms, introduced Louvain algorithm, simulation with continuous and polytomous data
Also implements the Leading Eigenvalue unidimensional method
Christensen, A. P., Garrido, L. E., Pena, K. G., & Golino, H. (2023).
Comparing community detection algorithms in psychological data: A Monte Carlo simulation.
Behavior Research Methods.
Comprehensive unidimensionality simulation
Christensen, A. P. (2023).
Unidimensional community detection: A Monte Carlo simulation, grid search, and comparison.
PsyArXiv.
Compared all igraph
community detection algorithms, simulation with continuous and polytomous data
Christensen, A. P., Garrido, L. E., Guerra-Pena, K., & Golino, H. (2023).
Comparing community detection algorithms in psychometric networks: A Monte Carlo simulation.
Behavior Research Methods.
plot.EGAnet
for plot usage in EGAnet
# Obtain data wmt <- wmt2[,7:24] # Estimate EGA ega.wmt <- EGA( data = wmt, plot.EGA = FALSE # No plot for CRAN checks ) # Print results print(ega.wmt) # Estimate EGAtmfg ega.wmt.tmfg <- EGA( data = wmt, model = "TMFG", plot.EGA = FALSE # No plot for CRAN checks ) # Estimate EGA with Louvain algorithm ega.wmt.louvain <- EGA( data = wmt, algorithm = "louvain", plot.EGA = FALSE # No plot for CRAN checks ) # Estimate EGA with an {igraph} function (Fast-greedy) ega.wmt.greedy <- EGA( data = wmt, algorithm = igraph::cluster_fast_greedy, plot.EGA = FALSE # No plot for CRAN checks )
# Obtain data wmt <- wmt2[,7:24] # Estimate EGA ega.wmt <- EGA( data = wmt, plot.EGA = FALSE # No plot for CRAN checks ) # Print results print(ega.wmt) # Estimate EGAtmfg ega.wmt.tmfg <- EGA( data = wmt, model = "TMFG", plot.EGA = FALSE # No plot for CRAN checks ) # Estimate EGA with Louvain algorithm ega.wmt.louvain <- EGA( data = wmt, algorithm = "louvain", plot.EGA = FALSE # No plot for CRAN checks ) # Estimate EGA with an {igraph} function (Fast-greedy) ega.wmt.greedy <- EGA( data = wmt, algorithm = igraph::cluster_fast_greedy, plot.EGA = FALSE # No plot for CRAN checks )
EGA
for Multidimensional StructuresA basic function to estimate EGA
for multidimensional structures.
This function does not include the unidimensional check and it does not
plot the results. This function can be used as a streamlined approach
for quick EGA
estimation when unidimensionality or visualization
is not a priority
EGA.estimate( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), verbose = FALSE, ... )
EGA.estimate( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
n |
Numeric (length = 1).
Sample size if |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
algorithm |
Character or
|
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Additional arguments to be passed on to
|
Returns a list containing:
network |
A matrix containing a network estimated using
|
wc |
A vector representing the community (dimension) membership
of each node in the network. |
n.dim |
A scalar of how many total dimensions were identified in the network |
cor.data |
The zero-order correlation matrix |
n |
Number of cases in |
Alexander P. Christensen <alexpaulchristensen at gmail.com> and Hudson Golino <hfg9s at virginia.edu>
Original simulation and implementation of EGA
Golino, H. F., & Epskamp, S. (2017).
Exploratory graph analysis: A new approach for estimating the number of dimensions in psychological research.
PLoS ONE, 12, e0174035.
Introduced unidimensional checks, simulation with continuous and dichotomous data
Golino, H., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Sadana, R., & Thiyagarajan, J. A. (2020).
Investigating the performance of Exploratory Graph Analysis and traditional techniques to identify the number of latent factors: A simulation and tutorial.
Psychological Methods, 25, 292-320.
Compared all igraph
community detection algorithms, simulation with continuous and polytomous data
Christensen, A. P., Garrido, L. E., Guerra-Pena, K., & Golino, H. (2023).
Comparing community detection algorithms in psychometric networks: A Monte Carlo simulation.
Behavior Research Methods.
plot.EGAnet
for plot usage in EGAnet
# Obtain data wmt <- wmt2[,7:24] # Estimate EGA ega.wmt <- EGA.estimate(data = wmt) # Estimate EGA with TMFG ega.wmt.tmfg <- EGA.estimate(data = wmt, model = "TMFG") # Estimate EGA with an {igraph} function (Fast-greedy) ega.wmt.greedy <- EGA.estimate( data = wmt, algorithm = igraph::cluster_fast_greedy )
# Obtain data wmt <- wmt2[,7:24] # Estimate EGA ega.wmt <- EGA.estimate(data = wmt) # Estimate EGA with TMFG ega.wmt.tmfg <- EGA.estimate(data = wmt, model = "TMFG") # Estimate EGA with an {igraph} function (Fast-greedy) ega.wmt.greedy <- EGA.estimate( data = wmt, algorithm = igraph::cluster_fast_greedy )
EGA
Optimal Model Fit using the Total Entropy Fit Index (tefi
)Estimates the best fitting model using EGA
.
The number of steps in the cluster_walktrap
detection
algorithm is varied and unique community solutions are compared using
tefi
.
EGA.fit( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), plot.EGA = TRUE, verbose = FALSE, ... )
EGA.fit( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), plot.EGA = TRUE, verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
n |
Numeric (length = 1).
Sample size if |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
algorithm |
Character or
|
plot.EGA |
Boolean.
If |
verbose |
Boolean.
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Additional arguments to be passed on to
|
Returns a list containing:
EGA |
|
EntropyFit |
|
Lowest.EntropyFit |
The best fitting solution based on |
parameter.space |
Parameter values used in search space |
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Entropy fit measures
Golino, H., Moulder, R. G., Shi, D., Christensen, A. P., Garrido, L. E., Neito, M. D., Nesselroade, J., Sadana, R., Thiyagarajan, J. A., & Boker, S. M. (in press).
Entropy fit indices: New fit measures for assessing the structure and dimensionality of multiple latent variables.
Multivariate Behavioral Research.
Simulation for EGA.fit
Jamison, L., Christensen, A. P., & Golino, H. (under review).
Optimizing Walktrap's community detection in networks using the Total Entropy Fit Index.
PsyArXiv.
Leiden algorithm
Traag, V. A., Waltman, L., & Van Eck, N. J. (2019).
From Louvain to Leiden: guaranteeing well-connected communities.
Scientific Reports, 9(1), 1-12.
Louvain algorithm
Blondel, V. D., Guillaume, J. L., Lambiotte, R., & Lefebvre, E. (2008).
Fast unfolding of communities in large networks.
Journal of Statistical Mechanics: Theory and Experiment, 2008(10), P10008.
Walktrap algorithm
Pons, P., & Latapy, M. (2006).
Computing communities in large networks using random walks.
Journal of Graph Algorithms and Applications, 10, 191-218.
plot.EGAnet
for plot usage in EGAnet
# Load data wmt <- wmt2[,7:24] # Estimate optimal EGA with Walktrap fit.walktrap <- EGA.fit( data = wmt, algorithm = "walktrap", steps = 3:8, # default plot.EGA = FALSE # no plot for CRAN checks ) # Estimate optimal EGA with Leiden and CPM fit.leiden <- EGA.fit( data = wmt, algorithm = "leiden", objective_function = "CPM", # default # resolution_parameter = seq.int(0, max(abs(network)), 0.01), # For CPM, the default max resolution parameter # is set to the largest absolute edge in the network plot.EGA = FALSE # no plot for CRAN checks ) # Estimate optimal EGA with Leiden and modularity fit.leiden <- EGA.fit( data = wmt, algorithm = "leiden", objective_function = "modularity", resolution_parameter = seq.int(0, 2, 0.05), # default for modularity plot.EGA = FALSE # no plot for CRAN checks ) ## Not run: # Estimate optimal EGA with Louvain fit.louvain <- EGA.fit( data = wmt, algorithm = "louvain", resolution_parameter = seq.int(0, 2, 0.05), # default plot.EGA = FALSE # no plot for CRAN checks ) ## End(Not run)
# Load data wmt <- wmt2[,7:24] # Estimate optimal EGA with Walktrap fit.walktrap <- EGA.fit( data = wmt, algorithm = "walktrap", steps = 3:8, # default plot.EGA = FALSE # no plot for CRAN checks ) # Estimate optimal EGA with Leiden and CPM fit.leiden <- EGA.fit( data = wmt, algorithm = "leiden", objective_function = "CPM", # default # resolution_parameter = seq.int(0, max(abs(network)), 0.01), # For CPM, the default max resolution parameter # is set to the largest absolute edge in the network plot.EGA = FALSE # no plot for CRAN checks ) # Estimate optimal EGA with Leiden and modularity fit.leiden <- EGA.fit( data = wmt, algorithm = "leiden", objective_function = "modularity", resolution_parameter = seq.int(0, 2, 0.05), # default for modularity plot.EGA = FALSE # no plot for CRAN checks ) ## Not run: # Estimate optimal EGA with Louvain fit.louvain <- EGA.fit( data = wmt, algorithm = "louvain", resolution_parameter = seq.int(0, 2, 0.05), # default plot.EGA = FALSE # no plot for CRAN checks ) ## End(Not run)
EGA
Network of wmt2
DataEGA
results from ega.wmt <- EGA(wmt2[,7:24])
for the Wiener Matrizen-Test (WMT-2)
data(ega.wmt)
data(ega.wmt)
A list with 8 objects (see Value in EGA
)
data("ega.wmt")
data("ega.wmt")
EGAnet
General usage for plots created by EGAnet
's S3 methods.
Plots across the EGAnet
package leverage GGally
's ggnet2
and ggplot2
's ggplot
.
Most plots allow the full usage of the gg*
series functionality and therefore
plotting arguments should be referenced through those packages rather than here in
EGAnet
.
The sections below list the functions and their usage for the S3 plot methods.
The plot methods are intended to be generic and without many arguments so that
nearly all arguments are passed to ggnet2
and ggplot
.
There are some constraints placed on certain plots to keep the EGAnet
style
throughout the (network) plots in the package, so be aware that if some settings are
not changing your plot output, then these settings might be fixed
to maintain the EGAnet
style
plot(x, ...) plot.dynEGA(x, base = 1, id = NULL, ...) plot.dynEGA.Group(x, base = 1, ...) plot.dynEGA.Individual(x, base = 1, id = NULL, ...) plot.hierEGA( x, plot.type = c("multilevel", "separate"), color.match = FALSE, ... ) plot.invariance(x, p_type = c("p", "p_BH"), p_value = 0.05, ...)
x
— EGAnet
object with available S3 plot method
(see full list below)
color.palette
— Character (vector). Either a character (length = 1) from the
pre-defined palettes in color_palette_EGA
or character (length = total number of communities) using
HEX codes (see Color Palettes and Examples sections)
layout
— Character (length = 1). Layouts can be set using gplot.layout
and the
ending layout name; for example, gplot.layout.circle
can be set
in these functions using layout = "circle"
or mode = "circle"
(see Examples)
base
— Numeric (length = 1). Plot to be used as the base for the configuration of the networks.
Uses the number of the order in which the plots are input.
Defaults to 1
or the first plot
id
— Numeric index(es) or character name(s). IDs to use when plotting
dynEGA
level = "individual"
.
Defaults to NULL
or 4 IDs drawn at random
plot.type
— Character (length = 1). Whether hierEGA
networks should plotted in a stacked, "multilevel"
fashion
or as "separate"
plots. Defaults to "multilevel"
color.match
— Boolean (length = 1). Whether lower order community colors in the
hierEGA
plot should be "matched" and used as
the border color for the higher order communities. Defaults to FALSE
p_type
— Character (length = 1). Type of p-value when plotting
invariance
. Defaults to "p"
or
uncorrected p-value. Set to "p_BH"
for the
Benjamini-Hochberg corrected p-value
p_value
— Numeric (length = 1). The p-value to use alongside p_type
when
plotting invariance
. Defaults to 0.05
...
— Additional arguments to pass on to
ggnet2
and gplot.layout
(see Examples)
*EGA
PlotsbootEGA
, dynEGA
,
EGA
, EGA.estimate
,
EGA.fit
, hierEGA
,
invariance
, riEGA
boot.ergoInfo
, bootEGA
,
dynEGA
, dynEGA.Group
, dynEGA.Individual
,
dynEGA.Population
, EGA
,
EGA.estimate
, EGA.fit
,
hierEGA
, infoCluster
,
invariance
, itemStability
,
riEGA
color_palette_EGA
will implement some color palettes in
EGAnet
. The main EGAnet
style palette is "polychrome"
.
This palette currently has 40 colors but there will likely be a need to expand it further
(e.g., hierEGA
demands a lot of colors).
The color.palette
argument will also accept HEX code colors that
are the same length as the number of communities in the plot.
In any network plots, the color.palette
argument can be used to
select color palettes from color_palette_EGA
as well
as those in the color scheme of RColorBrewer
For more worked examples than below, see Plots in {EGAnet}
# Using different arguments in {GGally}'s `ggnet2` plot(ega.wmt, node.size = 6, edge.size = 4) # Using a different layout in {sna}'s `gplot.layout` plot(ega.wmt, layout = "circle") # 'layout' argument plot(ega.wmt, mode = "circle") # 'mode' argument # Using different color palettes with `color_palette_EGA` ## Pre-defined palette plot(ega.wmt, color.palette = "blue.ridge2") ## University of Virginia colors plot(ega.wmt, color.palette = c("#232D4B", "#F84C1E")) ## Vanderbilt University colors ## (with additional {GGally} `ggnet2` argument) plot( ega.wmt, color.palette = c("#FFFFFF", "#866D4B"), label.color = "#000000" )
# Using different arguments in {GGally}'s `ggnet2` plot(ega.wmt, node.size = 6, edge.size = 4) # Using a different layout in {sna}'s `gplot.layout` plot(ega.wmt, layout = "circle") # 'layout' argument plot(ega.wmt, mode = "circle") # 'mode' argument # Using different color palettes with `color_palette_EGA` ## Pre-defined palette plot(ega.wmt, color.palette = "blue.ridge2") ## University of Virginia colors plot(ega.wmt, color.palette = c("#232D4B", "#F84C1E")) ## Vanderbilt University colors ## (with additional {GGally} `ggnet2` argument) plot( ega.wmt, color.palette = c("#FFFFFF", "#866D4B"), label.color = "#000000" )
Function to fit the Exploratory Graph Model
EGM( data, EGM.model = c("standard", "EGA"), communities = NULL, structure = NULL, search = FALSE, p.in = NULL, p.out = NULL, opt = c("AIC", "BIC", "CFI", "chisq", "logLik", "RMSEA", "SRMR", "TEFI", "TEFI.adj", "TLI"), constrained = TRUE, verbose = TRUE, ... )
EGM( data, EGM.model = c("standard", "EGA"), communities = NULL, structure = NULL, search = FALSE, p.in = NULL, p.out = NULL, opt = c("AIC", "BIC", "CFI", "chisq", "logLik", "RMSEA", "SRMR", "TEFI", "TEFI.adj", "TLI"), constrained = TRUE, verbose = TRUE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis. Can be raw data or a correlation matrix |
EGM.model |
Character vector (length = 1).
Sets the procedure to conduct
|
communities |
Numeric vector (length = 1).
Number of communities to use for the |
structure |
Numeric or character vector (length = |
search |
Boolean (length = 1).
Whether a search over parameters should be conducted.
Defaults to |
p.in |
Numeric vector (length = 1).
Probability that a node is randomly linked to other nodes in the same community.
Within community edges are set to zero based on |
p.out |
Numeric vector (length = 1).
Probability that a node is randomly linked to other nodes not in the same community.
Between community edges are set to zero based on |
opt |
Character vector (length = 1).
Fit index used to select from when searching over models
(only applies to
Defaults to |
constrained |
Boolean (length = 1).
Whether memberships of the communities should
be added as a constraint when optimizing the network loadings.
Defaults to |
verbose |
Boolean (length = 1).
Should progress be displayed?
Defaults to |
... |
Additional arguments to be passed on to
|
Hudson F. Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
# Get depression data data <- depression[,24:44] # Estimate EGM (using EGA) egm_ega <- EGM(data) # Estimate EGM (using EGA) specifying communities egm_ega_communities <- EGM(data, communities = 3) # Estimate EGM (using EGA) specifying structure egm_ega_structure <- EGM( data, structure = c( 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 3, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2 ) ) # Estimate EGM (using standard) egm_standard <- EGM( data, EGM.model = "standard", communities = 3, # specify number of communities p.in = 0.95, # probability of edges *in* each community p.out = 0.80 # probability of edges *between* each community ) ## Not run: # Estimate EGM (using EGA search) egm_ega_search <- EGM( data, EGM.model = "EGA", search = TRUE ) # Estimate EGM (using EGA search and AIC criterion) egm_ega_search_AIC <- EGM( data, EGM.model = "EGA", search = TRUE, opt = "AIC" ) # Estimate EGM (using search) egm_search <- EGM( data, EGM.model = "standard", search = TRUE, communities = 3, # need communities or structure p.in = 0.95 # only need 'p.in' ) ## End(Not run)
# Get depression data data <- depression[,24:44] # Estimate EGM (using EGA) egm_ega <- EGM(data) # Estimate EGM (using EGA) specifying communities egm_ega_communities <- EGM(data, communities = 3) # Estimate EGM (using EGA) specifying structure egm_ega_structure <- EGM( data, structure = c( 1, 1, 1, 2, 1, 1, 1, 1, 1, 1, 3, 2, 2, 2, 2, 3, 3, 3, 3, 3, 2 ) ) # Estimate EGM (using standard) egm_standard <- EGM( data, EGM.model = "standard", communities = 3, # specify number of communities p.in = 0.95, # probability of edges *in* each community p.out = 0.80 # probability of edges *between* each community ) ## Not run: # Estimate EGM (using EGA search) egm_ega_search <- EGM( data, EGM.model = "EGA", search = TRUE ) # Estimate EGM (using EGA search and AIC criterion) egm_ega_search_AIC <- EGM( data, EGM.model = "EGA", search = TRUE, opt = "AIC" ) # Estimate EGM (using search) egm_search <- EGM( data, EGM.model = "standard", search = TRUE, communities = 3, # need communities or structure p.in = 0.95 # only need 'p.in' ) ## End(Not run)
EGM
with EFAEstimates an EGM
based on EGA
and
uses the number of communities as the number of dimensions in exploratory factor analysis
(EFA) using fa
EGM.compare(data, constrained = FALSE, rotation = "geominQ", ...)
EGM.compare(data, constrained = FALSE, rotation = "geominQ", ...)
data |
Matrix or data frame. Should consist only of variables to be used in the analysis. Can be raw data or a correlation matrix |
constrained |
Boolean (length = 1).
Whether memberships of the communities should
be added as a constraint when optimizing the network loadings.
Defaults to Note: This default differs from |
rotation |
Character.
A rotation to use to obtain a simpler structure for EFA.
For a list of rotations, see |
... |
Additional arguments to be passed on to
|
Hudson F. Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
# Get depression data data <- depression[,24:44] # Compare EGM (using EGA) with EFA ## Not run: results <- EGM.compare(data) # Print summary summary(results) ## End(Not run)
# Get depression data data <- depression[,24:44] # Compare EGM (using EGA) with EFA ## Not run: results <- EGM.compare(data) # Print summary summary(results) ## End(Not run)
Reorganizes a single observed time series into an embedded matrix. The embedded matrix is constructed with replicates of an individual time series that are offset from each other in time. The function requires two parameters, one that specifies the number of observations to be used (i.e., the number of embedded dimensions) and the other that specifies the number of observations to offset successive embeddings
Embed(x, E, tau)
Embed(x, E, tau)
x |
Numeric vector. An observed time series to be reorganized into a time-delayed embedded matrix. |
E |
Numeric (length = 1).
Number of embedded dimensions or the number of observations to
be used. |
tau |
Numeric (length = 1).
Number of observations to offset successive embeddings.
A tau of one uses adjacent observations.
Default is |
Returns a numeric matrix
Pascal Deboeck <pascal.deboeck at psych.utah.edu> and Alexander P. Christensen <[email protected]>
Deboeck, P. R., Montpetit, M. A., Bergeman, C. S., & Boker, S. M. (2009) Using derivative estimates to describe intraindividual variability at multiple time scales. Psychological Methods, 14, 367-386.
# A time series with 8 time points time_series <- 49:56 # Time series embedding Embed(time_series, E = 5, tau = 1)
# A time series with 8 time points time_series <- 49:56 # Time series embedding Embed(time_series, E = 5, tau = 1)
Computes the fit of a dimensionality structure using empirical entropy. Lower values suggest better fit of a structure to the data.
entropyFit(data, structure)
entropyFit(data, structure)
data |
Matrix or data frame. Contains variables to be used in the analysis |
structure |
Numeric or character vector (length = |
Returns a list containing:
Total.Correlation |
The total correlation of the dataset |
Total.Correlation.MM |
Miller-Madow correction for the total correlation of the dataset |
Entropy.Fit |
The Entropy Fit Index |
Entropy.Fit.MM |
Miller-Madow correction for the Entropy Fit Index |
Average.Entropy |
The average entropy of the dataset |
Hudson F. Golino <hfg9s at virginia.edu>, Alexander P. Christensen <[email protected]> and Robert Moulder <[email protected]>
Initial formalization and simulation
Golino, H., Moulder, R. G., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Nesselroade, J., Sadana, R., Thiyagarajan, J. A., & Boker, S. M. (2020).
Entropy fit indices: New fit measures for assessing the structure and dimensionality of multiple latent variables.
Multivariate Behavioral Research.
# Load data wmt <- wmt2[,7:24] ## Not run: # Estimate EGA model ega.wmt <- EGA(data = wmt) ## End(Not run) # Compute entropy indices entropyFit(data = wmt, structure = ega.wmt$wc)
# Load data wmt <- wmt2[,7:24] ## Not run: # Estimate EGA model ega.wmt <- EGA(data = wmt) ## End(Not run) # Compute entropy indices entropyFit(data = wmt, structure = ega.wmt$wc)
Computes the Ergodicity Information Index
ergoInfo( dynEGA.object, use = c("edge.list", "unweighted", "weighted"), shuffles = 5000 )
ergoInfo( dynEGA.object, use = c("edge.list", "unweighted", "weighted"), shuffles = 5000 )
dynEGA.object |
A |
use |
Character (length = 1).
A string indicating what network element will be used
to compute the algorithm complexity, the list of edges or the weights of the network.
Defaults to
|
shuffles |
Numeric.
Number of shuffles used to compute the Kolmogorov complexity.
Defaults to |
Returns a list containing:
PrimeWeight |
The prime-weight encoding of the individual networks |
PrimeWeight.pop |
The prime-weight encoding of the population network |
Kcomp |
The Kolmogorov complexity of the prime-weight encoded individual networks |
Kcomp.pop |
The Kolmogorov complexity of the prime-weight encoded population network |
complexity |
The complexity metric proposed by Santora and Nicosia (2020) |
EII |
The Ergodicity Information Index |
Hudson Golino <hfg9s at virginia.edu> and Alexander Christensen <[email protected]>
Original Implementation
Golino, H., Nesselroade, J. R., & Christensen, A. P. (2022).
Toward a psychology of individuals: The ergodicity information index and a bottom-up approach for finding generalizations.
PsyArXiv.
# Obtain data sim.dynEGA <- sim.dynEGA # bypasses CRAN checks ## Not run: # Dynamic EGA individual and population structure dyn.ega1 <- dynEGA.ind.pop( data = sim.dynEGA[,-26], n.embed = 5, tau = 1, delta = 1, id = 25, use.derivatives = 1, ncores = 2, corr = "pearson" ) # Compute empirical ergodicity information index eii <- ergoInfo(dyn.ega1) ## End(Not run)
# Obtain data sim.dynEGA <- sim.dynEGA # bypasses CRAN checks ## Not run: # Dynamic EGA individual and population structure dyn.ega1 <- dynEGA.ind.pop( data = sim.dynEGA[,-26], n.embed = 5, tau = 1, delta = 1, id = 25, use.derivatives = 1, ncores = 2, corr = "pearson" ) # Compute empirical ergodicity information index eii <- ergoInfo(dyn.ega1) ## End(Not run)
Computes the Frobenius Norm (Ulitzsch et al., 2023)
frobenius(network1, network2)
frobenius(network1, network2)
network1 |
Matrix or data frame. Network to be compared |
network2 |
Matrix or data frame. Second network to be compared |
Returns Frobenius Norm
Hudson Golino <hfg9s at virginia.edu> & Alexander P. Christensen <alexander.christensen at Vanderbilt.Edu>
Simulation Study
Ulitzsch, E., Khanna, S., Rhemtulla, M., & Domingue, B. W. (2023).
A graph theory based similarity metric enables comparison of subpopulation psychometric networks
Psychological Methods.
# Obtain wmt2 data wmt <- wmt2[,7:24] # Set seed (for reproducibility) set.seed(1234) # Split data split1 <- sample( 1:nrow(wmt), floor(nrow(wmt) / 2) ) split2 <- setdiff(1:nrow(wmt), split1) # Obtain split data data1 <- wmt[split1,] data2 <- wmt[split2,] # Perform EBICglasso glas1 <- EBICglasso.qgraph(data1) glas2 <- EBICglasso.qgraph(data2) # Frobenius norm frobenius(glas1, glas2) # 0.7070395
# Obtain wmt2 data wmt <- wmt2[,7:24] # Set seed (for reproducibility) set.seed(1234) # Split data split1 <- sample( 1:nrow(wmt), floor(nrow(wmt) / 2) ) split2 <- setdiff(1:nrow(wmt), split1) # Obtain split data data1 <- wmt[split1,] data2 <- wmt[split2,] # Perform EBICglasso glas1 <- EBICglasso.qgraph(data1) glas2 <- EBICglasso.qgraph(data2) # Frobenius norm frobenius(glas1, glas2) # 0.7070395
Computes the fit (Generalized TEFI) of a hierarchical or correlated bifactor
dimensionality structure (or hierEGA
objects) using Von Neumman's entropy
when the input is a correlation matrix. Lower values suggest better fit of a structure to the data
genTEFI(data, structure = NULL, verbose = TRUE)
genTEFI(data, structure = NULL, verbose = TRUE)
data |
Matrix, data frame, or |
structure |
For high-order and correlated bifactor structures,
|
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
Returns a three-column data frame of the Generalized Total Entropy
Fit Index using Von Neumman's entropy (VN.Entropy.Fit
) (first column), as well as
Lower.Order.VN
- TEFI for the first-order factors (second column), and
Higher.Order.VN
, the equivalent for the second-order factors.
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
# Example using network scores opt.hier <- hierEGA( data = optimism, scores = "network", plot.EGA = FALSE # No plot for CRAN checks ) # Compute the Generalized Total Entropy Fit Index genTEFI(opt.hier)
# Example using network scores opt.hier <- hierEGA( data = optimism, scores = "network", plot.EGA = FALSE # No plot for CRAN checks ) # Compute the Generalized Total Entropy Fit Index genTEFI(opt.hier)
Estimates the derivatives of a time series using generalized local linear approximation (GLLA). GLLA is a filtering method for estimating derivatives from data that uses time delay embedding and a variant of Savitzky-Golay filtering to accomplish the task.
glla(x, n.embed, tau, delta, order)
glla(x, n.embed, tau, delta, order)
x |
Numeric vector. An observed time series |
n.embed |
Numeric (length = 1).
Number of embedded dimensions (the number of observations
to be used in the |
tau |
Numeric (length = 1).
Number of observations to offset successive embeddings in
the |
delta |
Numeric (length = 1).
The time between successive observations in the time series.
Default is |
order |
Numeric (length = 1).
The maximum order of the derivative to be estimated. For example,
|
Returns a matrix containing n columns in which n is one plus the maximum order of the derivatives to be estimated via generalized local linear approximation
Hudson Golino <hfg9s at virginia.edu>
GLLA implementation
Boker, S. M., Deboeck, P. R., Edler, C., & Keel, P. K. (2010)
Generalized local linear approximation of derivatives from time series. In S.-M. Chow, E. Ferrer, & F. Hsieh (Eds.),
The Notre Dame series on quantitative methodology. Statistical methods for modeling human dynamics: An interdisciplinary dialogue,
(p. 161-178). Routledge/Taylor & Francis Group.
Deboeck, P. R., Montpetit, M. A., Bergeman, C. S., & Boker, S. M. (2009) Using derivative estimates to describe intraindividual variability at multiple time scales. Psychological Methods, 14(4), 367-386.
Filtering procedure
Savitzky, A., & Golay, M. J. (1964).
Smoothing and differentiation of data by simplified least squares procedures.
Analytical Chemistry, 36(8), 1627-1639.
# A time series with 8 time points tseries <- 49:56 deriv.tseries <- glla(tseries, n.embed = 4, tau = 1, delta = 1, order = 2)
# A time series with 8 time points tseries <- 49:56 deriv.tseries <- glla(tseries, n.embed = 4, tau = 1, delta = 1, order = 2)
EGA
Estimates EGA using the lower-order solution of the Louvain
algorithm (cluster_louvain
)to identify the lower-order
dimensions and then uses factor or network loadings to estimate factor
or network scores, which are used to estimate the higher-order dimensions
(for more details, see Jiménez et al., 2023)
hierEGA( data, loading.method = c("original", "revised"), rotation = NULL, scores = c("factor", "network"), loading.structure = c("simple", "full"), impute = c("mean", "median", "none"), corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), lower.algorithm = "louvain", higher.algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), plot.EGA = TRUE, verbose = FALSE, ... )
hierEGA( data, loading.method = c("original", "revised"), rotation = NULL, scores = c("factor", "network"), loading.structure = c("simple", "full"), impute = c("mean", "median", "none"), corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), lower.algorithm = "louvain", higher.algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), plot.EGA = TRUE, verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis (does not accept correlation matrices) |
loading.method |
Character (length = 1).
Sets network loading calculation based on implementation
described in |
rotation |
Character.
A rotation to use to obtain a simpler structure.
For a list of rotations, see |
scores |
Character (length = 1).
How should scores for the higher-order structure be estimated?
Defaults to Factor scores use the number of communities from
|
loading.structure |
Character (length = 1).
Whether simple structure or the saturated loading matrix
should be used when computing scores (
Simple structure is the more conservative (established) approach
and is therefore the default. Treat |
impute |
Character (length = 1). If there are any missing data, then imputation can be implemented. Available options:
|
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
lower.algorithm |
Character or
Louvain with consensus clustering is strongly recommended. Using any other algorithm is considered experimental as they have not been designed to capture lower order communities |
higher.algorithm |
Character or
Using |
uni.method |
Character (length = 1).
What unidimensionality method should be used?
Defaults to
|
plot.EGA |
Boolean.
If |
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Additional arguments to be passed on to
|
Returns a list of lists containing:
lower_order |
|
higher_order |
|
parameters |
A list containing |
dim.variables |
A data frame with variable names and their lower and higher order assignments |
TEFI |
Generalized TEFI using |
plot.hierEGA |
Plot output if |
Marcos Jiménez <marcosjnezhquez@gmailcom>, Francisco J. Abad <[email protected]>, Eduardo Garcia-Garzon <[email protected]>, Hudson Golino <[email protected]>, Alexander P. Christensen <[email protected]>, and Luis Eduardo Garrido <[email protected]>
Hierarchical EGA simulation
Jiménez, M., Abad, F. J., Garcia-Garzon, E., Golino, H., Christensen, A. P., & Garrido, L. E. (2023).
Dimensionality assessment in bifactor structures with multiple general factors: A network psychometrics approach.
Psychological Methods.
3+ level hierarchical EGA
Samo, A., Christensen, A. P., Abad, F. J., Garrido, L. E., Garcia-Garzon, E., Golino, H. & McAbee, S. T. (2023). Building the structure of personality from the bottom-up using Hierarchical Exploratory Graph Analysis.
PsyArXiv.
Conceptual implementation
Golino, H., Thiyagarajan, J. A., Sadana, R., Teles, M., Christensen, A. P., & Boker, S. M. (2020).
Investigating the broad domains of intrinsic capacity, functional ability and
environment: An exploratory graph analysis approach for improving analytical
methodologies for measuring healthy aging.
PsyArXiv.
Revised network loadings
Christensen, A. P., Golino, H., Abad, F. J., & Garrido, L. E. (2024).
Revised network loadings.
PsyArXiv.
plot.EGAnet
for plot usage in
# Example using network scores opt.hier <- hierEGA( data = optimism, scores = "network", plot.EGA = FALSE # No plot for CRAN checks ) # Plot multilevel plot plot(opt.hier, plot.type = "multilevel") # Plot multilevel plot with higher order # border color matching the corresponding # lower order color plot(opt.hier, color.match = TRUE) # Plot levels separately plot(opt.hier, plot.type = "separate")
# Example using network scores opt.hier <- hierEGA( data = optimism, scores = "network", plot.EGA = FALSE # No plot for CRAN checks ) # Plot multilevel plot plot(opt.hier, plot.type = "multilevel") # Plot multilevel plot with higher order # border color matching the corresponding # lower order color plot(opt.hier, color.match = TRUE) # Plot levels separately plot(opt.hier, plot.type = "separate")
Converts network to matrix
igraph2matrix(igraph_network, diagonal = 0)
igraph2matrix(igraph_network, diagonal = 0)
igraph_network |
network object |
diagonal |
Numeric (length = 1).
Value to be placed on the diagonal of |
Returns a network in the format
Hudson Golino <hfg9s at virginia.edu> & Alexander P. Christensen <alexander.christensen at Vanderbilt.Edu>
# Convert network to {igraph} igraph_network <- convert2igraph(ega.wmt$network) # Convert network back to matrix igraph2matrix(igraph_network)
# Convert network to {igraph} igraph_network <- convert2igraph(ega.wmt$network) # Convert network back to matrix igraph2matrix(igraph_network)
dynEGA
Performs hierarchical clustering using Jensen-Shannon distance followed by the Louvain algorithm with consensus clustering. The method iteratively identifies smaller and smaller clusters until there is no change in the clusters identified
infoCluster(dynEGA.object, plot.cluster = TRUE, ...)
infoCluster(dynEGA.object, plot.cluster = TRUE, ...)
dynEGA.object |
A |
plot.cluster |
Boolean (length = 1).
Should plot of optimal and hierarchical clusters be output?
Defaults to |
... |
Additional arguments to be passed on to
|
Returns a list containing:
clusters |
A vector corresponding to cluster each participant belongs to |
clusterTree |
The dendogram from |
clusterPlot |
Plot output from results |
JSD |
Jensen-Shannon Distance |
Hudson Golino <hfg9s at virginia.edu> & Alexander P. Christensen <alexander.christensen at Vanderbilt.Edu>
plot.EGAnet
for plot usage in EGAnet
# Obtain data sim.dynEGA <- sim.dynEGA # bypasses CRAN checks ## Not run: # Dynamic EGA individual and population structure dyn.ega1 <- dynEGA.ind.pop( data = sim.dynEGA, n.embed = 5, tau = 1, delta = 1, id = 25, use.derivatives = 1, ncores = 2, corr = "pearson" ) # Perform information-theoretic clustering clust1 <- infoCluster(dynEGA.object = dyn.ega1) ## End(Not run)
# Obtain data sim.dynEGA <- sim.dynEGA # bypasses CRAN checks ## Not run: # Dynamic EGA individual and population structure dyn.ega1 <- dynEGA.ind.pop( data = sim.dynEGA, n.embed = 5, tau = 1, delta = 1, id = 25, use.derivatives = 1, ncores = 2, corr = "pearson" ) # Perform information-theoretic clustering clust1 <- infoCluster(dynEGA.object = dyn.ega1) ## End(Not run)
A general function to compute several different information theory metrics
information( data, base = 2.718282, bins = floor(sqrt(nrow(data)/5)), statistic = c("entropy", "joint.entropy", "conditional.entropy", "total.correlation", "dual.total.correlation", "o.information") )
information( data, base = 2.718282, bins = floor(sqrt(nrow(data)/5)), statistic = c("entropy", "joint.entropy", "conditional.entropy", "total.correlation", "dual.total.correlation", "o.information") )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
base |
Numeric (length = 1). Base of logarithm to use for entropy. Common options include:
Defaults to |
bins |
Numeric (length = 1).
Number of bins if data are not discrete.
Defaults to |
statistic |
Character. Information theory statistics to compute. Available options:
By default, all statistics are computed |
Returns list containing only requested statistic
Hudson F. Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Shannon's entropy
Shannon, C. E. (1948). A mathematical theory of communication.
The Bell System Technical Journal, 27(3), 379-423.
Formalization of total correlation
Watanabe, S. (1960).
Information theoretical analysis of multivariate correlation.
IBM Journal of Research and Development 4, 66-82.
Applied implementation of total correlation
Felix, L. M., Mansur-Alves, M., Teles, M., Jamison, L., & Golino, H. (2021).
Longitudinal impact and effects of booster sessions in a cognitive training program for healthy older adults.
Archives of Gerontology and Geriatrics, 94, 104337.
Formalization of dual total correlation
Te Sun, H. (1978).
Nonnegative entropy measures of multivariate symmetric correlations.
Information and Control, 36, 133-156.
Formalization of O-information
Crutchfield, J. P. (1994). The calculi of emergence: Computation, dynamics and induction.
Physica D: Nonlinear Phenomena, 75(1-3), 11-54.
Applied implementation of O-information
Marinazzo, D., Van Roozendaal, J., Rosas, F. E., Stella, M., Comolatti, R., Colenbier, N., Stramaglia, S., & Rosseel, Y. (2024).
An information-theoretic approach to build hypergraphs in psychometrics.
Behavior Research Methods, 1-23.
# All measures information(wmt2[,7:24]) # One measures information(wmt2[,7:24], statistic = "joint.entropy")
# All measures information(wmt2[,7:24]) # One measures information(wmt2[,7:24], statistic = "joint.entropy")
A response matrix (n = 1152) of the International Cognitive Ability Resource (ICAR) intelligence battery developed by Condon and Revelle (2016).
data(intelligenceBattery)
data(intelligenceBattery)
A 1185x125 response matrix
data("intelligenceBattery")
data("intelligenceBattery")
EGA
StructureEstimates configural invariance using bootEGA
on all data (across groups) first. After configural variance is established,
then metric invariance is tested using the community structure that established
configural invariance (see Details for more information on this process)
invariance( data, groups, structure = NULL, iter = 500, configural.threshold = 0.7, configural.type = c("parametric", "resampling"), corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), ncores, seed = NULL, verbose = TRUE, ... )
invariance( data, groups, structure = NULL, iter = 500, configural.threshold = 0.7, configural.type = c("parametric", "resampling"), corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), ncores, seed = NULL, verbose = TRUE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
groups |
Numeric or character vector (length = |
structure |
Numeric or character vector (length = |
iter |
Numeric (length = 1).
Number of iterations to perform for the permutation.
Defaults to |
configural.threshold |
Numeric (length = 1).
Value to use a threshold in |
configural.type |
Character (length = 1).
Type of bootstrap to use for configural invariance in |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
algorithm |
Character or
|
uni.method |
Character (length = 1).
What unidimensionality method should be used?
Defaults to
|
ncores |
Numeric (length = 1).
Number of cores to use in computing results.
Defaults to If you're unsure how many cores your computer has,
then type: |
seed |
Numeric (length = 1).
Defaults to |
verbose |
Boolean (length = 1).
Should progress be displayed?
Defaults to |
... |
Additional arguments that can be passed on to
|
In traditional psychometrics, measurement invariance is performed in sequential testing from more flexible (more free parameters) to more rigid (fewer free parameters) structures. Measurement invariance in network psychometrics is no different.
Configural Invariance
To establish configural invariance, the data are collapsed across groups
and a common sample structure is identified used bootEGA
and itemStability
. If some variables have a replication
less than 0.70 in their assigned dimension, then they are considered unstable
and therefore not invariant. These variables are removed and this process
is repeated until all items are considered stable (replication values greater
than 0.70) or there are no variables left. If configural invariance cannot be
established, then the last run of results are returned and metric invariance
is not tested (because configural invariance is not met). Importantly, if any
variables are removed, then configural invariance is not met for the
original structure. Any removal would suggest only partial configural invariance
is met.
Metric Invariance
The variables that remain after configural invariance are submitted to metric
invariance. First, each group estimates a network and then network loadings
(net.loads
) are computed using the assigned
community memberships (determined during configural invariance). Then,
the difference between the assigned loadings of the groups is computed. This
difference represents the empirical values. Second, the group memberships
are permutated and networks are estimated based on the these permutated
groups for iter
times. Then, network loadings are computed and
the difference between the assigned loadings of the group is computed, resulting
in a null distribution. The empirical difference is then compared against
the null distribution using a two-tailed p-value based on the number
of null distribution differences that are greater and less than the empirical
differences for each variable. Both uncorrected and false discovery rate
corrected p-values are returned in the results. Uncorrected p-values
are flagged for significance along with the direction of group differences.
Three or More Groups
When there are 3 or more groups, the function performs metric invariance testing by comparing all possible pairs of groups. Specifically:
Pairwise Comparisons: The function generates all possible unique group pairings and computes the differences in network loadings for each pair. The same community structure, derived from configural invariance or provided by the user, is used for all groups.
Permutation Testing: For each group pair, permutation tests are conducted to assess the statistical significance of the observed differences in loadings. p-values are calculated based on the proportion of permuted differences that are greater than or equal to the observed difference.
Result Compilation: The function compiles the results for each pair including
both uncorrected (p
) and FDR-corrected (Benjamini-Hochberg; p_BH
) p-values,
and the direction of differences. It returns a summary of the findings for all pairwise comparisons.
This approach allows for a detailed examination of metric invariance across multiple groups, ensuring that all potential differences are thoroughly assessed while maintaining the ability to identify specific group differences.
For more details, see Jamison, Golino, and Christensen (2023)
Returns a list containing:
configural.results |
|
memberships |
Original memberships provided in |
EGA |
Original |
groups |
A list containing: |
permutation |
A list containing:
|
results |
Data frame of the results (which are printed) |
Laura Jamison <[email protected]>, Hudson F. Golino <hfg9s at virginia.edu>, and Alexander P. Christensen <[email protected]>,
Original implementation
Jamison, L., Christensen, A. P., & Golino, H. F. (2024).
Metric invariance in exploratory graph analysis via permutation testing.
Methodology, 20(2), 144-186.
plot.EGAnet
for plot usage in
# Load data wmt <- wmt2[-1,7:24] # Groups groups <- rep(1:2, each = nrow(wmt) / 2) ## Not run: # Measurement invariance results <- invariance(wmt, groups, ncores = 2) # Plot with uncorrected alpha = 0.05 plot(results, p_type = "p", p_value = 0.05) # Plot with BH-corrected alpha = 0.10 plot(results, p_type = "p_BH", p_value = 0.10) ## End(Not run)
# Load data wmt <- wmt2[-1,7:24] # Groups groups <- rep(1:2, each = nrow(wmt) / 2) ## Not run: # Measurement invariance results <- invariance(wmt, groups, ncores = 2) # Plot with uncorrected alpha = 0.05 plot(results, p_type = "p", p_value = 0.05) # Plot with BH-corrected alpha = 0.10 plot(results, p_type = "p_BH", p_value = 0.10) ## End(Not run)
bootEGA
Based on the bootEGA
results, this function
computes and plots the number of times an variable is estimated
in the same dimension as originally estimated by an empirical
EGA
structure or a theoretical/input structure.
The output also contains each variable's replication frequency (i.e., proportion of
bootstraps that a variable appeared in each dimension
itemStability(bootega.obj, IS.plot = TRUE, structure = NULL, ...)
itemStability(bootega.obj, IS.plot = TRUE, structure = NULL, ...)
bootega.obj |
A |
IS.plot |
Boolean (length = 1).
Should the plot be produced for |
structure |
Numeric (length = number of variables).
A theoretical or pre-defined structure.
Defaults to |
... |
Deprecated arguments from previous versions of |
Returns a list containing:
membership |
A list containing:
|
item.stability |
A list containing:
|
plot |
Plot output if |
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Original implementation of bootEGA
Christensen, A. P., & Golino, H. (2021).
Estimating the stability of the number of factors via Bootstrap Exploratory Graph Analysis: A tutorial.
Psych, 3(3), 479-500.
Conceptual introduction
Christensen, A. P., Golino, H., & Silvia, P. J. (2020).
A psychometric network perspective on the validity and validation of personality trait questionnaires.
European Journal of Personality, 34(6), 1095-1108.
plot.EGAnet
for plot usage in EGAnet
# Load data wmt <- wmt2[,7:24] ## Not run: # Standard EGA example boot.wmt <- bootEGA( data = wmt, iter = 500, type = "parametric", ncores = 2 ) ## End(Not run) # Standard item stability wmt.is <- itemStability(boot.wmt) ## Not run: # EGA fit example boot.wmt.fit <- bootEGA( data = wmt, iter = 500, EGA.type = "EGA.fit", type = "parametric", ncores = 2 ) # EGA fit item stability wmt.is.fit <- itemStability(boot.wmt.fit) # Hierarchical EGA example boot.wmt.hier <- bootEGA( data = wmt, iter = 500, EGA.type = "hierEGA", type = "parametric", ncores = 2 ) # Hierarchical EGA item stability wmt.is.hier <- itemStability(boot.wmt.hier) # Random-intercept EGA example boot.wmt.ri <- bootEGA( data = wmt, iter = 500, EGA.type = "riEGA", type = "parametric", ncores = 2 ) # Random-intercept EGA item stability wmt.is.ri <- itemStability(boot.wmt.ri) ## End(Not run)
# Load data wmt <- wmt2[,7:24] ## Not run: # Standard EGA example boot.wmt <- bootEGA( data = wmt, iter = 500, type = "parametric", ncores = 2 ) ## End(Not run) # Standard item stability wmt.is <- itemStability(boot.wmt) ## Not run: # EGA fit example boot.wmt.fit <- bootEGA( data = wmt, iter = 500, EGA.type = "EGA.fit", type = "parametric", ncores = 2 ) # EGA fit item stability wmt.is.fit <- itemStability(boot.wmt.fit) # Hierarchical EGA example boot.wmt.hier <- bootEGA( data = wmt, iter = 500, EGA.type = "hierEGA", type = "parametric", ncores = 2 ) # Hierarchical EGA item stability wmt.is.hier <- itemStability(boot.wmt.hier) # Random-intercept EGA example boot.wmt.ri <- bootEGA( data = wmt, iter = 500, EGA.type = "riEGA", type = "parametric", ncores = 2 ) # Random-intercept EGA item stability wmt.is.ri <- itemStability(boot.wmt.ri) ## End(Not run)
Computes the Jensen-Shannon Distance between two networks
jsd(network1, network2, method = c("kld", "spectral"), signed = TRUE)
jsd(network1, network2, method = c("kld", "spectral"), signed = TRUE)
network1 |
Matrix or data frame. Network to be compared |
network2 |
Matrix or data frame. Second network to be compared |
method |
Character (length = 1).
Method to compute Jensen-Shannon Distance.
Defaults to
|
signed |
Boolean. (length = 1).
Should networks be remain signed?
Defaults to |
Returns Jensen-Shannon Distance
Hudson Golino <hfg9s at virginia.edu> & Alexander P. Christensen <alexander.christensen at Vanderbilt.Edu>
# Obtain wmt2 data wmt <- wmt2[,7:24] # Set seed (for reproducibility) set.seed(1234) # Split data split1 <- sample( 1:nrow(wmt), floor(nrow(wmt) / 2) ) split2 <- setdiff(1:nrow(wmt), split1) # Obtain split data data1 <- wmt[split1,] data2 <- wmt[split2,] # Perform EBICglasso glas1 <- EBICglasso.qgraph(data1) glas2 <- EBICglasso.qgraph(data2) # Spectral JSD jsd(glas1, glas2) # 0.1595893 # Spectral JSS (similarity) 1 - jsd(glas1, glas2) # 0.8404107 # Jensen-Shannon Divergence jsd(glas1, glas2, method = "kld") # 0.1393621
# Obtain wmt2 data wmt <- wmt2[,7:24] # Set seed (for reproducibility) set.seed(1234) # Split data split1 <- sample( 1:nrow(wmt), floor(nrow(wmt) / 2) ) split2 <- setdiff(1:nrow(wmt), split1) # Obtain split data data1 <- wmt[split1,] data2 <- wmt[split2,] # Perform EBICglasso glas1 <- EBICglasso.qgraph(data1) glas2 <- EBICglasso.qgraph(data2) # Spectral JSD jsd(glas1, glas2) # 0.1595893 # Spectral JSS (similarity) 1 - jsd(glas1, glas2) # 0.8404107 # Jensen-Shannon Divergence jsd(glas1, glas2, method = "kld") # 0.1393621
An algorithm to identify whether data were generated from a factor or network model using factor and network loadings. The algorithm uses heuristics based on theory and simulation. These heuristics were then submitted to several deep learning neural networks with 240,000 samples per model with varying parameters.
LCT( data, n = NULL, corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), iter = 100, seed = NULL, verbose = TRUE, ... )
LCT( data, n = NULL, corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), iter = 100, seed = NULL, verbose = TRUE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis. Can be raw data or a correlation matrix |
n |
Numeric (length = 1).
Sample size if |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
algorithm |
Character or
|
uni.method |
Character (length = 1).
What unidimensionality method should be used?
Defaults to
|
iter |
Numeric (length = 1).
Number of replicate samples to be drawn from a multivariate
normal distribution (uses |
seed |
Numeric (length = 1).
Defaults to |
verbose |
Boolean (length = 1).
Should progress be displayed?
Defaults to |
... |
Additional arguments that can be passed on to
|
Returns a list containing:
empirical |
Prediction of model based on empirical dataset only |
bootstrap |
Prediction of model based on means of the loadings across the bootstrap replicate samples |
proportion |
Proportions of models suggested across bootstraps |
Hudson F. Golino <hfg9s at virginia.edu> and Alexander P. Christensen <alexpaulchristensen at gmail.com>
Model training and validation
Christensen, A. P., & Golino, H. (2021).
Factor or network model? Predictions from neural networks.
Journal of Behavioral Data Science, 1(1), 85-126.
# Get data data <- psych::bfi[,1:25] ## Not run: # Compute LCT ## Factor model LCT(data) ## End(Not run)
# Get data data <- psych::bfi[,1:25] ## Not run: # Compute LCT ## Factor model LCT(data) ## End(Not run)
Computes (signed) modularity statistic given a network and community structure. Allows the resolution parameter to be set
modularity(network, memberships, resolution = 1, signed = FALSE)
modularity(network, memberships, resolution = 1, signed = FALSE)
network |
Matrix or data frame. A symmetric matrix representing a network |
memberships |
Numeric (length = |
resolution |
Numeric (length = 1).
A parameter that adjusts modularity to
prefer smaller ( |
signed |
Boolean (length = 1).
Whether signed or absolute modularity should be computed.
The most common modularity metric is defined by positive values only.
Gomez et al. (2009) introduced a signed version of modularity that
will discount modularity for edges with negative values. This property
isn't always desired for psychometric networks. If |
Returns the modularity statistic
Alexander P. Christensen <[email protected]> with assistance from GPT-4
Gomez, S., Jensen, P., & Arenas, A. (2009). Analysis of community structure in networks of correlated data. Physical Review E, 80(1), 016114.
# Load data wmt <- wmt2[,7:24] # Estimate EGA ega.wmt <- EGA(wmt, model = "glasso") # Compute standard (absolute values) modularity modularity( network = ega.wmt$network, memberships = ega.wmt$wc, signed = FALSE ) # 0.1697952 # Compute signed modularity modularity( network = ega.wmt$network, memberships = ega.wmt$wc, signed = TRUE ) # 0.1701946
# Load data wmt <- wmt2[,7:24] # Estimate EGA ega.wmt <- EGA(wmt, model = "glasso") # Compute standard (absolute values) modularity modularity( network = ega.wmt$network, memberships = ega.wmt$wc, signed = FALSE ) # 0.1697952 # Compute signed modularity modularity( network = ega.wmt$network, memberships = ega.wmt$wc, signed = TRUE ) # 0.1701946
Computes the between- and within-community
strength
of each variable for each community
net.loads( A, wc, loading.method = c("original", "revised"), scaling = 2, rotation = NULL, ... )
net.loads( A, wc, loading.method = c("original", "revised"), scaling = 2, rotation = NULL, ... )
A |
Network matrix, data frame, or |
wc |
Numeric or character vector (length = |
loading.method |
Character (length = 1).
Sets network loading calculation based on implementation
described in |
scaling |
Numeric (length = 1).
Scaling factor for the magnitude of the |
rotation |
Character.
A rotation to use to obtain a simpler structure.
For a list of rotations, see |
... |
Additional arguments to pass on to |
Simulation studies have demonstrated that a node's strength centrality is roughly equivalent to factor loadings (Christensen & Golino, 2021; Hallquist, Wright, & Molenaar, 2019). Hallquist and colleagues (2019) found that node strength represented a combination of dominant and cross-factor loadings. This function computes each node's strength within each specified dimension, providing a rough equivalent to factor loadings (including cross-loadings; Christensen & Golino, 2021).
Returns a list containing:
unstd |
A matrix of the unstandardized within- and between-community strength values for each node |
std |
A matrix of the standardized within- and between-community strength values for each node |
rotated |
|
Alexander P. Christensen <[email protected]> and Hudson Golino <hfg9s at virginia.edu>
Original implementation and simulation
Christensen, A. P., & Golino, H. (2021).
On the equivalency of factor and network loadings.
Behavior Research Methods, 53, 1563-1580.
Demonstration of node strength similarity to CFA loadings
Hallquist, M., Wright, A. C. G., & Molenaar, P. C. M. (2019).
Problems with centrality measures in psychopathology symptom networks: Why network psychometrics cannot escape psychometric theory.
Multivariate Behavioral Research, 1-25.
Revised network loadings
Christensen, A. P., Golino, H., Abad, F. J., & Garrido, L. E. (2024).
Revised network loadings.
PsyArXiv.
# Load data wmt <- wmt2[,7:24] # Estimate EGA ega.wmt <- EGA( data = wmt, plot.EGA = FALSE # No plot for CRAN checks ) # Network loadings net.loads(ega.wmt)
# Load data wmt <- wmt2[,7:24] # Estimate EGA ega.wmt <- EGA( data = wmt, plot.EGA = FALSE # No plot for CRAN checks ) # Network loadings net.loads(ega.wmt)
This function computes network scores computed based on
each node's strength
within each community in the network
(see net.loads
). These values are used as "network loadings"
for the weights of each variable.
Network scores are computed as a formative composite rather than a reflective factor. This composite representation is consistent with no latent factors that psychometric network theory proposes.
Scores can be computed as a "simple" structure, which is equivalent to a weighted sum scores or as a "full" structure, which is equivalent to an EFA approach. Conservatively, the "simple" structure approach is recommended until further validation
net.scores( data, A, wc, loading.method = c("original", "revised"), rotation = NULL, scores = c("Anderson", "Bartlett", "components", "Harman", "network", "tenBerge", "Thurstone"), loading.structure = c("simple", "full"), impute = c("mean", "median", "none"), ... )
net.scores( data, A, wc, loading.method = c("original", "revised"), rotation = NULL, scores = c("Anderson", "Bartlett", "components", "Harman", "network", "tenBerge", "Thurstone"), loading.structure = c("simple", "full"), impute = c("mean", "median", "none"), ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
A |
Network matrix, data frame, or |
wc |
Numeric or character vector (length = |
loading.method |
Character (length = 1).
Sets network loading calculation based on implementation
described in |
rotation |
Character.
A rotation to use to obtain a simpler structure.
For a list of rotations, see |
scores |
Character (length = 1).
How should scores be estimated?
Defaults to |
loading.structure |
Character (length = 1).
Whether simple structure or the saturated loading matrix
should be used when computing scores.
Defaults to
Simple structure is the more "conservative" (established) approach
and is therefore the default. Treat |
impute |
Character (length = 1). If there are any missing data, then imputation can be implemented. Available options:
|
... |
Additional arguments to be passed on to
|
Returns a list containing:
scores |
A list containing the standardized ( |
loadings |
Output from |
Alexander P. Christensen <[email protected]> and Hudson F. Golino <hfg9s at virginia.edu>
Original implementation and simulation for loadings
Christensen, A. P., & Golino, H. (2021).
On the equivalency of factor and network loadings.
Behavior Research Methods, 53, 1563-1580.
Preliminary simulation for scores
Golino, H., Christensen, A. P., Moulder, R., Kim, S., & Boker, S. M. (2021).
Modeling latent topics in social media using Dynamic Exploratory Graph Analysis: The case of the right-wing and left-wing trolls in the 2016 US elections.
Psychometrika.
Revised network loadings
Christensen, A. P., Golino, H., Abad, F. J., & Garrido, L. E. (2024).
Revised network loadings.
PsyArXiv.
# Load data wmt <- wmt2[,7:24] # Estimate EGA ega.wmt <- EGA( data = wmt, plot.EGA = FALSE # No plot for CRAN checks ) # Network scores net.scores(data = wmt, A = ega.wmt)
# Load data wmt <- wmt2[,7:24] # Estimate EGA ega.wmt <- EGA( data = wmt, plot.EGA = FALSE # No plot for CRAN checks ) # Network scores net.scores(data = wmt, A = ega.wmt)
A permutation implementation to determine statistical significance of whether the network structures are different from one another
network.compare( base, comparison, corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), iter = 1000, ncores, verbose = TRUE, seed = NULL, ... )
network.compare( base, comparison, corr = c("auto", "cor_auto", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), iter = 1000, ncores, verbose = TRUE, seed = NULL, ... )
base |
Matrix or data frame. Should consist only of variables to be used in the analysis. First dataset |
comparison |
Matrix or data frame. Should consist only of variables to be used in the analysis. Second dataset |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
iter |
Numeric (length = 1).
Number of permutations to perform.
Defaults to |
ncores |
Numeric (length = 1).
Number of cores to use in computing results.
Defaults to |
verbose |
Boolean (length = 1).
Should progress be displayed?
Defaults to |
seed |
Numeric (length = 1).
Defaults to |
... |
Additional arguments that can be passed on to
|
Returns a list:
network |
Data frame with row names of each measure, empirical value ( |
edges |
List containing matrices of values for empirical values ( |
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Frobenius Norm
Ulitzsch, E., Khanna, S., Rhemtulla, M., & Domingue, B. W. (2023).
A graph theory based similarity metric enables comparison of subpopulation psychometric networks.
Psychological Methods.
Jensen-Shannon Similarity (1 - Distance)
De Domenico, M., Nicosia, V., Arenas, A., & Latora, V. (2015).
Structural reducibility of multilayer networks.
Nature Communications, 6(1), 1–9.
Total Network Strength
van Borkulo, C. D., van Bork, R., Boschloo, L., Kossakowski, J. J., Tio, P., Schoevers, R. A., Borsboom, D., & Waldorp, L. J. (2023).
Comparing network structures on three aspects: A permutation test.
Psychological Methods, 28(6), 1273–1285.
# Load data wmt <- wmt2[,7:24] # Set groups (if necessary) groups <- rep(1:2, each = nrow(wmt) / 2) # Groups group1 <- wmt[groups == 1,] group2 <- wmt[groups == 2,] ## Not run: # Perform comparison results <- network.compare(group1, group2) # Print results print(results) # Plot edge differences plot(results) ## End(Not run)
# Load data wmt <- wmt2[,7:24] # Set groups (if necessary) groups <- rep(1:2, each = nrow(wmt) / 2) # Groups group1 <- wmt[groups == 1,] group2 <- wmt[groups == 2,] ## Not run: # Perform comparison results <- network.compare(group1, group2) # Print results print(results) # Plot edge differences plot(results) ## End(Not run)
General function to apply network estimation methods in EGAnet
network.estimation( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), network.only = TRUE, verbose = FALSE, ... )
network.estimation( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("BGGM", "glasso", "TMFG"), network.only = TRUE, verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
n |
Numeric (length = 1).
Sample size if |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
network.only |
Boolean (length = 1).
Whether the network only should be output.
Defaults to |
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Additional arguments to be passed on to
|
Returns a matrix populated with a network from the input data
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Graphical Least Absolute Shrinkage and Selection Operator (GLASSO)
Friedman, J., Hastie, T., & Tibshirani, R. (2008).
Sparse inverse covariance estimation with the graphical lasso.
Biostatistics, 9(3), 432–441.
GLASSO with Extended Bayesian Information Criterion (EBICglasso)
Epskamp, S., & Fried, E. I. (2018).
A tutorial on regularized partial correlation networks.
Psychological Methods, 23(4), 617–634.
Bayesian Gaussian Graphical Model (BGGM)
Williams, D. R. (2021).
Bayesian estimation for Gaussian graphical models: Structure learning, predictability, and network comparisons.
Multivariate Behavioral Research, 56(2), 336–352.
Triangulated Maximally Filtered Graph (TMFG)
Massara, G. P., Di Matteo, T., & Aste, T. (2016).
Network filtering for big data: Triangulated maximally filtered graph.
Journal of Complex Networks, 5, 161-178.
# Load data wmt <- wmt2[,7:24] # EBICglasso (default for EGA functions) glasso_network <- network.estimation( data = wmt, model = "glasso" ) # TMFG tmfg_network <- network.estimation( data = wmt, model = "TMFG" )
# Load data wmt <- wmt2[,7:24] # EBICglasso (default for EGA functions) glasso_network <- network.estimation( data = wmt, model = "glasso" ) # TMFG tmfg_network <- network.estimation( data = wmt, model = "TMFG" )
General function to compute a network's predictive power on new data, following Haslbeck and Waldorp (2018) and Williams and Rodriguez (2022)
This implementation is different from the predictability
in the mgm
package
(Haslbeck), which is based on (regularized) regression. This implementation uses
the network directly, converting the partial correlations into an implied
precision (inverse covariance) matrix. See Details for more information
network.predictability(network, original.data, newdata, ordinal.categories = 7)
network.predictability(network, original.data, newdata, ordinal.categories = 7)
network |
Matrix or data frame. A partial correlation network |
original.data |
Matrix or data frame.
Must consist only of variables to be used to estimate the |
newdata |
Matrix or data frame.
Must consist of the same variables in the same order as |
ordinal.categories |
Numeric (length = 1).
Up to the number of categories before a variable is considered continuous.
Defaults to |
This implementation of network predictability proceeds in several steps with important assumptions:
1. Network was estimated using (partial) correlations (not regression like the
mgm
package!)
2. Original data that was used to estimate the network in 1. is necessary to apply the original scaling to the new data
3. (Linear) regression-like coefficients are obtained by reserve engineering the
inverse covariance matrix using the network's partial correlations (i.e.,
by setting the diagonal of the network to -1 and computing the inverse
of the opposite signed partial correlation matrix; see EGAnet:::pcor2inv
)
4. Predicted values are obtained by matrix multiplying the new data with these coefficients
5. Dichotomous and polytomous data are given categorical values based on the original data's thresholds and these thresholds are used to convert the continuous predicted values into their corresponding categorical values
6. Evaluation metrics:
dichotomous — "Accuracy"
or the percent correctly predicted for the 0s and 1s
and "Kappa"
or Cohen's Kappa (see cite)
polytomous — "Linear Kappa"
or linearly weighted Kappa and
"Krippendorff's alpha"
(see cite)
continuous — R-squared ("R2"
) and root mean square error ("RMSE"
)
Returns a list containing:
predictions |
Predicted values of |
betas |
Beta coefficients derived from the |
results |
Performance metrics for each variable in |
Hudson Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
Original Implementation of Node Predictability
Haslbeck, J. M., & Waldorp, L. J. (2018).
How well do network models predict observations? On the importance of predictability in network models.
Behavior Research Methods, 50(2), 853–861.
Derivation of Regression Coefficients Used (Formula 3)
Williams, D. R., & Rodriguez, J. E. (2022).
Why overfitting is not (usually) a problem in partial correlation networks.
Psychological Methods, 27(5), 822–840.
Cohen's Kappa
Cohen, J. (1960). A coefficient of agreement for nominal scales.
Educational and Psychological Measurement, 20(1), 37-46.
Cohen, J. (1968). Weighted kappa: nominal scale agreement provision for scaled disagreement or partial credit. Psychological Bulletin, 70(4), 213-220.
Krippendorff's alpha
Krippendorff, K. (2013).
Content analysis: An introduction to its methodology (3rd ed.).
Thousand Oaks, CA: Sage.
# Load data wmt <- wmt2[,7:24] # Set seed (to reproduce results) set.seed(42) # Split data training <- sample( 1:nrow(wmt), round(nrow(wmt) * 0.80) # 80/20 split ) # Set splits wmt_train <- wmt[training,] wmt_test <- wmt[-training,] # EBICglasso (default for EGA functions) glasso_network <- network.estimation( data = wmt_train, model = "glasso" ) # Check predictability network.predictability( network = glasso_network, original.data = wmt_train, newdata = wmt_test )
# Load data wmt <- wmt2[,7:24] # Set seed (to reproduce results) set.seed(42) # Split data training <- sample( 1:nrow(wmt), round(nrow(wmt) * 0.80) # 80/20 split ) # Set splits wmt_train <- wmt[training,] wmt_test <- wmt[-training,] # EBICglasso (default for EGA functions) glasso_network <- network.estimation( data = wmt_train, model = "glasso" ) # Check predictability network.predictability( network = glasso_network, original.data = wmt_train, newdata = wmt_test )
A response matrix (n = 282) containing responses to 10 items of the Revised Life Orientation Test (LOT-R), developed by Scheier, Carver, & Bridges (1994).
data(optimism)
data(optimism)
A 282x10 response matrix
Scheier, M. F., Carver, C. S., & Bridges, M. W. (1994). Distinguishing optimism from neuroticism (and trait anxiety, self-mastery, and self-esteem): a reevaluation of the Life Orientation Test. Journal of Personality and Social Psychology, 67, 1063-1078.
data("optimism")
data("optimism")
A fast implementation of polychoric correlations in C. Uses the Beasley-Springer-Moro algorithm (Boro & Springer, 1977; Moro, 1995) to estimate the inverse univariate normal CDF, the Drezner-Wesolosky approximation (Drezner & Wesolosky, 1990) to estimate the bivariate normal CDF, and Brent's method (Brent, 2013) for optimization of rho
polychoric.matrix( data, na.data = c("pairwise", "listwise"), empty.method = c("none", "zero", "all"), empty.value = c("none", "point_five", "one_over"), ... )
polychoric.matrix( data, na.data = c("pairwise", "listwise"), empty.method = c("none", "zero", "all"), empty.value = c("none", "point_five", "one_over"), ... )
data |
Matrix or data frame.
A dataset with all ordinal values
(rows = cases, columns = variables).
Data are required to be between |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
empty.method |
Character (length = 1). Method for empty cell correction. Available options:
|
empty.value |
Character (length = 1). Value to add to the joint frequency table cells. Accepts numeric values between 0 and 1 or specific methods:
|
... |
Not used but made available for easier argument passing |
Returns a polychoric correlation matrix
Alexander P. Christensen <[email protected]> with assistance from GPT-4
Beasley-Moro-Springer algorithm
Beasley, J. D., & Springer, S. G. (1977).
Algorithm AS 111: The percentage points of the normal distribution.
Journal of the Royal Statistical Society. Series C (Applied Statistics), 26(1), 118-121.
Moro, B. (1995). The full monte. Risk 8 (February), 57-58.
Brent optimization
Brent, R. P. (2013).
Algorithms for minimization without derivatives.
Mineola, NY: Dover Publications, Inc.
Drezner-Wesolowsky bivariate normal approximation
Drezner, Z., & Wesolowsky, G. O. (1990).
On the computation of the bivariate normal integral.
Journal of Statistical Computation and Simulation, 35(1-2), 101-107.
# Load data (ensure matrix for missing data example) wmt <- as.matrix(wmt2[,7:24]) # Compute polychoric correlation matrix correlations <- polychoric.matrix(wmt) # Randomly assign missing data wmt[sample(1:length(wmt), 1000)] <- NA # Compute polychoric correlation matrix # with pairwise missing pairwise_correlations <- polychoric.matrix( wmt, na.data = "pairwise" ) # Compute polychoric correlation matrix # with listwise missing pairwise_correlations <- polychoric.matrix( wmt, na.data = "listwise" )
# Load data (ensure matrix for missing data example) wmt <- as.matrix(wmt2[,7:24]) # Compute polychoric correlation matrix correlations <- polychoric.matrix(wmt) # Randomly assign missing data wmt[sample(1:length(wmt), 1000)] <- NA # Compute polychoric correlation matrix # with pairwise missing pairwise_correlations <- polychoric.matrix( wmt, na.data = "pairwise" ) # Compute polychoric correlation matrix # with listwise missing pairwise_correlations <- polychoric.matrix( wmt, na.data = "listwise" )
Numeric vector of primes generated from the primes package. Used in
the function [EGAnet]{ergoInfo}
. Not for general use
data(prime.num)
data(prime.num)
A 1185x24 response matrix
data("prime.num")
data("prime.num")
EGA
Estimates the number of substantive dimensions after controlling for wording effects. EGA is applied to a residual correlation matrix after subtracting and random intercept factor with equal unstandardized loadings from all the regular and unrecoded reversed items in the database
riEGA( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), plot.EGA = TRUE, verbose = FALSE, ... )
riEGA( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), model = c("glasso", "TMFG"), algorithm = c("leiden", "louvain", "walktrap"), uni.method = c("expand", "LE", "louvain"), plot.EGA = TRUE, verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis. Must be raw data and not a correlation matrix |
n |
Numeric (length = 1).
Sample size if |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
model |
Character (length = 1).
Defaults to
|
algorithm |
Character or
|
uni.method |
Character (length = 1).
What unidimensionality method should be used?
Defaults to
|
plot.EGA |
Boolean (length = 1).
If |
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Additional arguments to be passed on to
|
Returns a list containing:
EGA |
Results from |
RI |
A list containing information about the random-intercept model (if the model converged): |
TEFI |
|
plot.EGA |
Plot output if |
Alejandro Garcia-Pardina <[email protected]>, Francisco J. Abad <[email protected]>, Alexander P. Christensen <[email protected]>, Hudson Golino <hfg9s at virginia.edu>, Luis Eduardo Garrido <[email protected]>, and Robert Moulder <[email protected]>
Selection of CFA Estimator
Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012).
When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions.
Psychological Methods, 17, 354-373.
plot.EGAnet
for plot usage in EGAnet
# Obtain example data wmt <- wmt2[,7:24] # riEGA example riEGA(data = wmt, plot.EGA = FALSE) # no plot for CRAN checks
# Obtain example data wmt <- wmt2[,7:24] # riEGA example riEGA(data = wmt, plot.EGA = FALSE) # no plot for CRAN checks
A simulated (multivariate time series) data with 24 variables, 100 individual observations, 50 time points per individual and 2 groups of individuals
data(sim.dynEGA)
data(sim.dynEGA)
A 5000 x 26 multivariate time series
Data were generated using the simDFM
function
with the following arguments:
Group 1
simDFM(
variab = 12, timep = 50,
nfact = 2, error = 0.125,
dfm = "DAFS",
loadings = EGAnet:::runif_xoshiro(
1, min = 0.50, max = 0.70
), autoreg = 0.80, crossreg = 0.00,
var.shock = 0.36, cov.shock = 0.18
)
Group 2
simDFM(
variab = 8, timep = 50,
nfact = 3, error = 0.125,
dfm = "DAFS",
loadings = EGAnet:::runif_xoshiro(
1, min = 0.50, max = 0.70
), autoreg = 0.80, crossreg = 0.00,
var.shock = 0.36, cov.shock = 0.18
)
data("sim.dynEGA")
data("sim.dynEGA")
Function to simulate data following a dynamic factor model (DFM). Two DFMs are currently available: the direct autoregressive factor score model (Engle & Watson, 1981; Nesselroade, McArdle, Aggen, and Meyers, 2002) and the dynamic factor model with random walk factor scores.
simDFM( variab, timep, nfact, error, dfm = c("DAFS", "RandomWalk"), loadings, autoreg, crossreg, var.shock, cov.shock, burnin = 1000 )
simDFM( variab, timep, nfact, error, dfm = c("DAFS", "RandomWalk"), loadings, autoreg, crossreg, var.shock, cov.shock, burnin = 1000 )
variab |
Number of variables per factor. |
timep |
Number of time points. |
nfact |
Number of factors. |
error |
Value to be used to construct a diagonal matrix Q. This matrix is p x p covariance matrix Q that will generate random errors following a multivariate normal distribution with mean zeros. The value provided is squared before constructing Q. |
dfm |
A string indicating the dynamical factor model to use. Current options are:
|
loadings |
Magnitude of the loadings. |
autoreg |
Magnitude of the autoregression coefficients. |
crossreg |
Magnitude of the cross-regression coefficients. |
var.shock |
Magnitude of the random shock variance. |
cov.shock |
Magnitude of the random shock covariance |
burnin |
Number of n first samples to discard when computing the factor scores. Defaults to 1000. |
Hudson F. Golino <hfg9s at virginia.edu>
Engle, R., & Watson, M. (1981). A one-factor multivariate time series model of metropolitan wage rates. Journal of the American Statistical Association, 76(376), 774-781.
Nesselroade, J. R., McArdle, J. J., Aggen, S. H., & Meyers, J. M. (2002). Dynamic factor analysis models for representing process in multivariate time-series. In D. S. Moskowitz & S. L. Hershberger (Eds.), Multivariate applications book series. Modeling intraindividual variability with repeated measures data: Methods and applications, 235-265.
## Not run: # Estimate EGA network data1 <- simDFM(variab = 5, timep = 50, nfact = 3, error = 0.05, dfm = "DAFS", loadings = 0.7, autoreg = 0.8, crossreg = 0.1, var.shock = 0.36, cov.shock = 0.18, burnin = 1000) ## End(Not run)
## Not run: # Estimate EGA network data1 <- simDFM(variab = 5, timep = 50, nfact = 3, error = 0.05, dfm = "DAFS", loadings = 0.7, autoreg = 0.8, crossreg = 0.1, var.shock = 0.36, cov.shock = 0.18, burnin = 1000) ## End(Not run)
EGM
)Function to simulate data based on EGM
simEGM( communities, variables, loadings, cross.loadings = 0.01, correlations, sample.size, p.in = 0.95, p.out = 0.8, max.iterations = 1000 )
simEGM( communities, variables, loadings, cross.loadings = 0.01, correlations, sample.size, p.in = 0.95, p.out = 0.8, max.iterations = 1000 )
communities |
Numeric (length = 1). Number of communities to generate |
variables |
Numeric vector (length = 1 or |
loadings |
Numeric (length = 1). Magnitude of the assigned network loadings. Uses the same magnitude as factors loadings Uses |
cross.loadings |
Numeric (length = 1).
Standard deviation of a normal distribution with a mean of zero ( |
correlations |
Numeric (length = 1). Magnitude of the community correlations Uses |
sample.size |
Numeric (length = 1). Number of observations to generate |
p.in |
Numeric (length = 1).
Sets the probability of retaining an edge within communities.
Single values are applied to all communities.
Defaults to |
p.out |
Numeric (length = 1 or |
max.iterations |
Numeric (length = 1).
Number of iterations to attempt to get convergence before erroring out.
Defaults to |
Hudson F. Golino <hfg9s at virginia.edu> and Alexander P. Christensen <[email protected]>
simulated <- simEGM( communities = 2, variables = 6, loadings = 0.55, # use standard factor loading sizes correlations = 0.30, sample.size = 1000 )
simulated <- simEGM( communities = 2, variables = 6, loadings = 0.55, # use standard factor loading sizes correlations = 0.30, sample.size = 1000 )
Computes the fit (TEFI) of a dimensionality structure using Von Neumman's entropy when the input is a correlation matrix. Lower values suggest better fit of a structure to the data.
tefi(data, structure = NULL, verbose = TRUE)
tefi(data, structure = NULL, verbose = TRUE)
data |
Matrix, data frame, or |
structure |
Numeric or character vector (length = |
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
Returns a data frame with columns:
Non-hierarchical Structure
VN.Entropy.Fit |
The Total Entropy Fit Index using Von Neumman's entropy |
Total.Correlation |
The total correlation of the dataset |
Average.Entropy |
The average entropy of the dataset |
Hierarchical Structure
VN.Entropy.Fit |
The Generalized Total Entropy Fit Index using Von Neumman's entropy |
Lower.Order.VN |
Lower order (only) Total Entropy Fit Index |
Higher.Order.VN |
Higher order (only) Total Entropy Fit Index |
Hudson Golino <hfg9s at virginia.edu>, Alexander P. Christensen <[email protected]>, and Robert Moulder <[email protected]>
Initial formalization and simulation
Golino, H., Moulder, R. G., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Nesselroade, J., Sadana, R., Thiyagarajan, J. A., & Boker, S. M. (2020).
Entropy fit indices: New fit measures for assessing the structure and dimensionality of multiple latent variables.
Multivariate Behavioral Research.
# Load data wmt <- wmt2[,7:24] # Estimate EGA model ega.wmt <- EGA( data = wmt, model = "glasso", plot.EGA = FALSE # no plot for CRAN checks ) # Compute entropy indices for empirical EGA tefi(ega.wmt) # User-defined structure (with `EGA` object) tefi(ega.wmt, structure = c(rep(1, 5), rep(2, 5), rep(3, 8)))
# Load data wmt <- wmt2[,7:24] # Estimate EGA model ega.wmt <- EGA( data = wmt, model = "glasso", plot.EGA = FALSE # no plot for CRAN checks ) # Compute entropy indices for empirical EGA tefi(ega.wmt) # User-defined structure (with `EGA` object) tefi(ega.wmt, structure = c(rep(1, 5), rep(2, 5), rep(3, 8)))
Applies the Triangulated Maximally Filtered Graph (TMFG) filtering method (see Massara et al., 2016). The TMFG method uses a structural constraint that limits the number of zero-order correlations included in the network (3n - 6; where n is the number of variables). The TMFG algorithm begins by identifying four variables which have the largest sum of correlations to all other variables. Then, it iteratively adds each variable with the largest sum of three correlations to nodes already in the network until all variables have been added to the network. This structure can be associated with the inverse correlation matrix (i.e., precision matrix) to be turned into a GGM (i.e., partial correlation network) by using Local-Global Inversion Method (LoGo; see Barfuss et al., 2016 for more details). See Details for more information
TMFG( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), partial = FALSE, returnAllResults = FALSE, verbose = FALSE, ... )
TMFG( data, n = NULL, corr = c("auto", "cor_auto", "cosine", "pearson", "spearman"), na.data = c("pairwise", "listwise"), partial = FALSE, returnAllResults = FALSE, verbose = FALSE, ... )
data |
Matrix or data frame. Should consist only of variables to be used in the analysis. Can be raw data or correlation matrix |
n |
Numeric (length = 1).
Sample size for when a correlation matrix is input into |
corr |
Character (length = 1).
Method to compute correlations.
Defaults to
For other similarity measures, compute them first and input them
into |
na.data |
Character (length = 1).
How should missing data be handled?
Defaults to
|
partial |
Boolean (length = 1).
Whether partial correlations should be output.
Defaults to |
returnAllResults |
Boolean (length = 1).
Whether all results should be returned.
Defaults to |
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Additional arguments to be passed on to
|
The TMFG method applies a structural constraint on the network, which restrains the network to retain a certain number of edges (3n-6, where n is the number of nodes; Massara et al., 2016). The network is also composed of 3- and 4-node cliques (i.e., sets of connected nodes; a triangle and tetrahedron, respectively). The TMFG method constructs a network using zero-order correlations and the resulting network can be associated with the inverse covariance matrix (yielding a GGM; Barfuss, Massara, Di Matteo, & Aste, 2016). Notably, the TMFG can use any association measure and thus does not assume the data is multivariate normal.
Construction begins by forming a tetrahedron of the four nodes that have the highest sum of correlations that are greater than the average correlation in the correlation matrix. Next, the algorithm iteratively identifies the node that maximizes its sum of correlations to a connected set of three nodes (triangles) already included in the network and then adds that node to the network. The process is completed once every node is connected in the network. In this process, the network automatically generates what's called a planar network. A planar network is a network that could be drawn on a sphere with no edges crossing (often, however, the networks are depicted with edges crossing; Tumminello, Aste, Di Matteo, & Mantegna, 2005).
Returns a network or list containing:
network |
The filtered adjacency matrix |
separators |
The separators (3-cliques) in the network |
cliques |
The cliques (4-cliques) in the network |
Alexander Christensen <[email protected]>
Local-Global Inversion Method
Barfuss, W., Massara, G. P., Di Matteo, T., & Aste, T. (2016).
Parsimonious modeling with information filtering networks.
Physical Review E, 94, 062306.
Psychometric network introduction to TMFG
Christensen, A. P., Kenett, Y. N., Aste, T., Silvia, P. J., & Kwapil, T. R. (2018).
Network structure of the Wisconsin Schizotypy Scales-Short Forms: Examining psychometric network filtering approaches.
Behavior Research Methods, 50, 2531-2550.
Triangulated Maximally Filtered Graph
Massara, G. P., Di Matteo, T., & Aste, T. (2016).
Network filtering for big data: Triangulated maximally filtered graph.
Journal of Complex Networks, 5, 161-178.
# TMFG filtered network TMFG(wmt2[,7:24]) # Partial correlations using the LoGo method TMFG(wmt2[,7:24], partial = TRUE)
# TMFG filtered network TMFG(wmt2[,7:24]) # Partial correlations using the LoGo method TMFG(wmt2[,7:24], partial = TRUE)
Computes the total correlation of a dataset
totalCor(data, base = 2.718282)
totalCor(data, base = 2.718282)
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
base |
Numeric (length = 1).
Base to use for entropy.
Defaults to |
Returns a list containing:
Ind.Entropies |
Individual entropies for each variable |
Joint.Entropy |
The joint entropy of the dataset |
Total.Cor |
The total correlation of the dataset |
Normalized |
Total correlation divided by the sum of the individual entropies minus the maximum of the individual entropies |
Hudson F. Golino <hfg9s at virginia.edu>
Formalization of total correlation
Watanabe, S. (1960).
Information theoretical analysis of multivariate correlation.
IBM Journal of Research and Development 4, 66-82.
Applied implementation
Felix, L. M., Mansur-Alves, M., Teles, M., Jamison, L., & Golino, H. (2021).
Longitudinal impact and effects of booster sessions in a cognitive training program for healthy older adults.
Archives of Gerontology and Geriatrics, 94, 104337.
# Compute total correlation totalCor(wmt2[,7:24])
# Compute total correlation totalCor(wmt2[,7:24])
Computes the pairwise total correlation
(totalCor
) for a dataset
totalCorMat(data, base = 2.718282, normalized = FALSE)
totalCorMat(data, base = 2.718282, normalized = FALSE)
data |
Matrix or data frame. Should consist only of variables to be used in the analysis |
base |
Numeric (length = 1).
Base to use for entropy.
Defaults to |
normalized |
Boolean (length = 1).
Should the normalized total correlation be computed?
Defaults to |
Returns a symmetric matrix with pairwise total correlations
Hudson F. Golino <hfg9s at virginia.edu>
Formalization of total correlation
Watanabe, S. (1960).
Information theoretical analysis of multivariate correlation.
IBM Journal of Research and Development 4, 66-82.
Applied implementation
Felix, L. M., Mansur-Alves, M., Teles, M., Jamison, L., & Golino, H. (2021).
Longitudinal impact and effects of booster sessions in a cognitive training program for healthy older adults.
Archives of Gerontology and Geriatrics, 94, 104337.
# Compute total correlation matrix totalCorMat(wmt2[,7:24])
# Compute total correlation matrix totalCorMat(wmt2[,7:24])
Identifies locally dependent (redundant) variables in a
multivariate dataset using the EBICglasso.qgraph
network estimation method and weighted topological overlap
(see Christensen, Garrido, & Golino, 2023 for more details)
UVA( data = NULL, network = NULL, n = NULL, key = NULL, uva.method = c("MBR", "EJP"), cut.off = 0.25, reduce = TRUE, reduce.method = c("latent", "mean", "remove", "sum"), auto = TRUE, verbose = FALSE, ... )
UVA( data = NULL, network = NULL, n = NULL, key = NULL, uva.method = c("MBR", "EJP"), cut.off = 0.25, reduce = TRUE, reduce.method = c("latent", "mean", "remove", "sum"), auto = TRUE, verbose = FALSE, ... )
data |
Matrix or data frame.
Should consist only of variables to be used in the analysis.
Can be raw data or a correlation matrix.
Defaults to |
network |
Symmetric matrix or data frame.
A symmetric network.
Defaults to If both |
n |
Numeric (length = 1).
Sample size if |
key |
Character vector (length = |
uva.method |
Character (length = 1).
Whether the method described in Christensen, Garrido, and
Golino (2023) publication in Multivariate Behavioral Research
( Based on simulation and accumulating empirical evidence, the methods described in Christensen, Golino, and Silvia (2020) such as adaptive alpha are outdated. Evidence supports using a single cut-off value (regardless of continuous, polytomous, or dichotomous data; Christensen, Garrido, & Golino, 2023) |
cut.off |
Numeric (length = 1).
Cut-off used to determine when pairwise This cut-off value is recommended and based on extensive simulation
(Christensen, Garrido, & Golino, 2023). Printing the result will
provide a gradient of pairwise redundancies in increments of 0.20,
0.25, and 0.30. Use |
reduce |
Logical (length = 1).
Whether redundancies should be reduced in data.
Defaults to |
reduce.method |
Character (length = 1). Method to reduce redundancies. Available options:
|
auto |
Logical (length = 1).
Whether
|
verbose |
Boolean (length = 1).
Whether messages and (insignificant) warnings should be output.
Defaults to |
... |
Additional arguments that should be passed on to
old versions of |
Most recent simulation and implementation
Christensen, A. P., Garrido, L. E., & Golino, H. (2023).
Unique variable analysis: A network psychometrics method to detect local dependence.
Multivariate Behavioral Research.
Conceptual foundation and outdated methods
Christensen, A. P., Golino, H., & Silvia, P. J. (2020).
A psychometric network perspective on the validity and validation of personality trait questionnaires.
European Journal of Personality, 34(6), 1095-1108.
Weighted topological overlap
Nowick, K., Gernat, T., Almaas, E., & Stubbs, L. (2009).
Differences in human and chimpanzee gene expression patterns define an evolving network of transcription factors in brain.
Proceedings of the National Academy of Sciences, 106, 22358-22363.
Selection of CFA Estimator
Rhemtulla, M., Brosseau-Liard, P. E., & Savalei, V. (2012).
When can categorical variables be treated as continuous? A comparison of robust continuous and categorical SEM estimation methods under suboptimal conditions.
Psychological Methods, 17(3), 354-373.
# Perform UVA uva.wmt <- UVA(wmt2[,7:24]) # Show summary summary(uva.wmt)
# Perform UVA uva.wmt <- UVA(wmt2[,7:24]) # Show summary summary(uva.wmt)
Computes the fit of a dimensionality structure using Von Neumman's entropy when the input is a correlation matrix. Lower values suggest better fit of a structure to the data
vn.entropy(data, structure)
vn.entropy(data, structure)
data |
Matrix or data frame. Contains variables to be used in the analysis |
structure |
Numeric or character vector (length = |
Returns a list containing:
VN.Entropy.Fit |
The Entropy Fit Index using Von Neumman's entropy |
Total.Correlation |
The total correlation of the dataset |
Average.Entropy |
The average entropy of the dataset |
Hudson Golino <hfg9s at virginia.edu>, Alexander P. Christensen <[email protected]>, and Robert Moulder <[email protected]>
Initial formalization and simulation
Golino, H., Moulder, R. G., Shi, D., Christensen, A. P., Garrido, L. E., Nieto, M. D., Nesselroade, J., Sadana, R., Thiyagarajan, J. A., & Boker, S. M. (2020).
Entropy fit indices: New fit measures for assessing the structure and dimensionality of multiple latent variables.
Multivariate Behavioral Research.
# Get EGA result ega.wmt <- EGA( data = wmt2[,7:24], model = "glasso", plot.EGA = FALSE # no plot for CRAN checks ) # Compute Von Neumman entropy vn.entropy(ega.wmt$correlation, ega.wmt$wc)
# Get EGA result ega.wmt <- EGA( data = wmt2[,7:24], model = "glasso", plot.EGA = FALSE # no plot for CRAN checks ) # Compute Von Neumman entropy vn.entropy(ega.wmt$correlation, ega.wmt$wc)
A response matrix (n = 1185) of the Wiener Matrizen-Test 2 (WMT-2).
data(wmt2)
data(wmt2)
A 1185x24 response matrix
data("wmt2")
data("wmt2")
Computes weighted topological overlap following the Novick et al. (2009) definition
wto(network, signed = TRUE, diagonal.zero = TRUE)
wto(network, signed = TRUE, diagonal.zero = TRUE)
network |
Symmetric matrix or data frame. A symmetric network |
signed |
Boolean (length = 1).
Whether the signed version should be used.
Defaults to |
diagonal.zero |
Boolean (length = 1).
Whether diagonal of overlap matrix should be set to zero.
Defaults to |
A symmetric matrix of weighted topological overlap values between each pair of variables
Original formalization
Nowick, K., Gernat, T., Almaas, E., & Stubbs, L. (2009).
Differences in human and chimpanzee gene expression patterns define an evolving network of transcription factors in brain.
Proceedings of the National Academy of Sciences, 106, 22358-22363.
# Obtain network network <- network.estimation(wmt2[,7:24], model = "glasso") # Compute wTO wto(network)
# Obtain network network <- network.estimation(wmt2[,7:24], model = "glasso") # Compute wTO wto(network)