Skip to contents

Confirmatory Factor Analysis

Relation to EFA

  • You have a bunch of questions
  • You have an idea (or sometimes not!) of how many factors to expect
  • You let the questions go where they want
  • You remove the bad questions until you get a good fit

CFA models

  • You set up the model with specific questions onto specific factors
  • Forcing the cross loadings be zero
  • You test to see if that model fits
  • So, you may think about how confirmatory factor analysis is step two to exploring (exploratory factor analysis)

CFA Models

  • Reflective – the latent variable causes the manifest variables scores
  • Purpose is to understand the relationships between the measured variables
  • Same theoretical concept as EFA

CFA Models Reflective Example

# a famous example, build the model
HS.model <- ' visual  =~ x1 + x2 + x3
              textual =~ x4 + x5 + x6
              speed   =~ x7 + x8 + x9 '

# fit the model 
HS.fit <- cfa(HS.model, data = HolzingerSwineford1939)

# diagram the model
semPaths(HS.fit, 
         whatLabels = "std", 
         layout = "tree",
         edge.label.cex = 1)

CFA Models

  • Formative – latent variables are the result of manifest variables
  • Similar to principal components analysis theoretical concept
  • Potentially a use for demographics?

CFA Models Formative Example

# a famous example, build the model
HS.model <- ' visual  <~ x1 + x2 + x3'

# fit the model 
HS.fit <- cfa(HS.model, data = HolzingerSwineford1939)
#> Warning: lavaan->lav_data_full():  
#>    all observed variables are exogenous; model may not be identified
#> Warning: lavaan->lav_model_vcov():  
#>    Could not compute standard errors! The information matrix could not be 
#>    inverted. This may be a symptom that the model is not identified.

# diagram the model
semPaths(HS.fit, 
         whatLabels = "std", 
         layout = "tree",
         edge.label.cex = 1)

CFA Models

  • The manifest variables in a CFA are sometimes called indicator variables
  • Because they indicate what the latent variable should be since we do not directly measure the latent variable

General Set Up

  • The latents will be correlated (because they are exogenous only)
  • Similar to an oblique rotation
  • Each factor section has to be identified
    • You should have three measured variables per latent
    • If you only have two, you need to set their coefficients to equal (estimated but equal)
  • Arrows go from latent to measured (reflexive)
    • We think that latent caused the measured answers
  • Error terms on the measured variables, variance on the latent variables
    • If you are counting for degrees of freedom for identification

Correlated Error

  • Generally, you leave the error terms uncorrelated, as you think they are separate items
  • However:
  • These questions all measure the same factor right?
  • Often they are pretty similar
  • Some answers will be related to other items
  • So it’s not too big of a idea to say that item’s errors are related
  • We can use modification indices to see if they should be correlated
    • Make sure these make sense!

Interpretation

  • The latent variable section includes the factor loadings or coefficients
  • These are the same idea as EFA - you want the relationship between the latent variable and the manifest variable to be strong
    • We used a rule of .300 before but for this rule, you should examine the standardized loading
    • Otherwise, why would we think this item measures the latent variable?

Interpretation

  • These coefficients are often called:
  • Pattern coefficients (unstandardized): for every one unit in the latent variable, the manifest variable increases b units
  • Structure coefficients (standardized): the correlation between the latent variable and the manifest variable

Identification Rules of Thumb:

  • Latent variables should have four indicators
  • Latent variables have three indicators AND error variances do not covary
  • Latent variables have two indicators AND Error variances do not covary AND loadings are set to equal each other

Scaling

  • Remember that scaling is the way we “set the scale” for the latent variable
  • We usually do this by setting one of the pattern coefficients to 1 - the marker variable approach
  • Another option is to to set the variance of the latent variable to 1 std.lv in the the standardized output
    • What does that do?
    • Sets the scale to z-score
    • Makes double headed arrow between latents correlation
    • Make sure you are using unstandardized data!

Scaling

  • So what is the std.all as part of the “completely standardized solution?
  • Both the latent variable variance and the manifest variable variance is set to 1
  • If you are going to report the standardized solution, this version is the most common, as it matches EFA and regression
  • All of these options give you different loadings, but should not change model fit

Examples

  • Reminder:
  • When you use a correlation matrix as your input, the solution is already standardized!
  • When you use a covariance matrix as your input, both the unstandardized and standardized solution can be viewed

One-Factor CFA Example

  • IQ is often thought of as “g” or this overall cognitive ability
  • Let’s look at an example of the WISC, which is an IQ test for children
  • We have five of the subtest scores including information, similarities, word reasoning, matrix reasoning, and picture concepts

Convert Correlations to Covariance

wisc4.cor <- lav_matrix_lower2full(c(1,
                                     0.72,1,
                                     0.64,0.63,1,
                                     0.51,0.48,0.37,1,
                                     0.37,0.38,0.38,0.38,1))
# enter the SDs
wisc4.sd <- c(3.01 , 3.03 , 2.99 , 2.89 , 2.98)

# give everything names
colnames(wisc4.cor) <- 
  rownames(wisc4.cor) <-
  names(wisc4.sd) <- 
  c("Information", "Similarities", 
    "Word.Reasoning", "Matrix.Reasoning", "Picture.Concepts")

# convert
wisc4.cov <- cor2cov(wisc4.cor, wisc4.sd)

WISC One-Factor Model

  • The =~ is used to define a reflexive latent variable
  • ~ can be interpreted as Y is predicted by X
  • =~ can be interpreted as X is indicated by Ys
wisc4.model <- '
g =~ Information + Similarities + Word.Reasoning + Matrix.Reasoning + Picture.Concepts
'

Analyze the Model

  • Notice we changed to the cfa() function
  • It has the same basic arguments
  • The std.lv option can be used to only see the standardized solution on the latent variable, usually you want to set this to FALSE
wisc4.fit <- cfa(model = wisc4.model, 
                sample.cov = wisc4.cov, 
                sample.nobs = 550,  
                std.lv = FALSE)

Summarize the Model

  • Logical solution:
    • Positive variances
    • SMCs + Correlations < 1
    • No error messages
    • SEs are not “huge”
  • Estimates:
    • Do our questions load appropriately?
  • Model fit:
    • What do the fit indices indicate?
    • Can we improve model fit without overfitting?

Summarize the Model

summary(wisc4.fit,
        standardized=TRUE, 
        rsquare = TRUE,
        fit.measures=TRUE)
#> lavaan 0.6-19 ended normally after 30 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        10
#> 
#>   Number of observations                           550
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                26.775
#>   Degrees of freedom                                 5
#>   P-value (Chi-square)                           0.000
#> 
#> Model Test Baseline Model:
#> 
#>   Test statistic                              1073.427
#>   Degrees of freedom                                10
#>   P-value                                        0.000
#> 
#> User Model versus Baseline Model:
#> 
#>   Comparative Fit Index (CFI)                    0.980
#>   Tucker-Lewis Index (TLI)                       0.959
#> 
#> Loglikelihood and Information Criteria:
#> 
#>   Loglikelihood user model (H0)              -6378.678
#>   Loglikelihood unrestricted model (H1)      -6365.291
#>                                                       
#>   Akaike (AIC)                               12777.357
#>   Bayesian (BIC)                             12820.456
#>   Sample-size adjusted Bayesian (SABIC)      12788.712
#> 
#> Root Mean Square Error of Approximation:
#> 
#>   RMSEA                                          0.089
#>   90 Percent confidence interval - lower         0.058
#>   90 Percent confidence interval - upper         0.123
#>   P-value H_0: RMSEA <= 0.050                    0.022
#>   P-value H_0: RMSEA >= 0.080                    0.708
#> 
#> Standardized Root Mean Square Residual:
#> 
#>   SRMR                                           0.034
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Latent Variables:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   g =~                                                                  
#>     Information       1.000                               2.578    0.857
#>     Similarities      0.985    0.045   21.708    0.000    2.541    0.839
#>     Word.Reasoning    0.860    0.045   18.952    0.000    2.217    0.742
#>     Matrix.Reasnng    0.647    0.047   13.896    0.000    1.669    0.578
#>     Picture.Cncpts    0.542    0.050   10.937    0.000    1.398    0.470
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>    .Information       2.395    0.250    9.587    0.000    2.395    0.265
#>    .Similarities      2.709    0.258   10.482    0.000    2.709    0.296
#>    .Word.Reasoning    4.009    0.295   13.600    0.000    4.009    0.449
#>    .Matrix.Reasnng    5.551    0.360   15.400    0.000    5.551    0.666
#>    .Picture.Cncpts    6.909    0.434   15.922    0.000    6.909    0.779
#>     g                 6.648    0.564   11.788    0.000    1.000    1.000
#> 
#> R-Square:
#>                    Estimate
#>     Information       0.735
#>     Similarities      0.704
#>     Word.Reasoning    0.551
#>     Matrix.Reasnng    0.334
#>     Picture.Cncpts    0.221

New Functions

  • std.nox: the standardized estimates are based on both the variances of both (continuous) observed and latent variables, but not the variances of exogenous covariates
  • This output is the best way to get the confidence intervals for each parameter
parameterestimates(wisc4.fit,
                   standardized=TRUE)
#>                 lhs op              rhs   est    se      z pvalue ci.lower
#> 1                 g =~      Information 1.000 0.000     NA     NA    1.000
#> 2                 g =~     Similarities 0.985 0.045 21.708      0    0.896
#> 3                 g =~   Word.Reasoning 0.860 0.045 18.952      0    0.771
#> 4                 g =~ Matrix.Reasoning 0.647 0.047 13.896      0    0.556
#> 5                 g =~ Picture.Concepts 0.542 0.050 10.937      0    0.445
#> 6       Information ~~      Information 2.395 0.250  9.587      0    1.906
#> 7      Similarities ~~     Similarities 2.709 0.258 10.482      0    2.202
#> 8    Word.Reasoning ~~   Word.Reasoning 4.009 0.295 13.600      0    3.431
#> 9  Matrix.Reasoning ~~ Matrix.Reasoning 5.551 0.360 15.400      0    4.845
#> 10 Picture.Concepts ~~ Picture.Concepts 6.909 0.434 15.922      0    6.058
#> 11                g ~~                g 6.648 0.564 11.788      0    5.543
#>    ci.upper std.lv std.all
#> 1     1.000  2.578   0.857
#> 2     1.074  2.541   0.839
#> 3     0.949  2.217   0.742
#> 4     0.739  1.669   0.578
#> 5     0.640  1.398   0.470
#> 6     2.885  2.395   0.265
#> 7     3.215  2.709   0.296
#> 8     4.587  4.009   0.449
#> 9     6.258  5.551   0.666
#> 10    7.759  6.909   0.779
#> 11    7.754  1.000   1.000

New Functions

fitted(wisc4.fit) ## estimated covariances
#> $cov
#>                  Infrmt Smlrts Wrd.Rs Mtrx.R Pctr.C
#> Information       9.044                            
#> Similarities      6.551  9.164                     
#> Word.Reasoning    5.716  5.633  8.924              
#> Matrix.Reasoning  4.303  4.241  3.700  8.337       
#> Picture.Concepts  3.606  3.553  3.100  2.334  8.864
wisc4.cov ## actual covariances
#>                  Information Similarities Word.Reasoning Matrix.Reasoning
#> Information         9.060100     6.566616       5.759936         4.436439
#> Similarities        6.566616     9.180900       5.707611         4.203216
#> Word.Reasoning      5.759936     5.707611       8.940100         3.197207
#> Matrix.Reasoning    4.436439     4.203216       3.197207         8.352100
#> Picture.Concepts    3.318826     3.431172       3.385876         3.272636
#>                  Picture.Concepts
#> Information              3.318826
#> Similarities             3.431172
#> Word.Reasoning           3.385876
#> Matrix.Reasoning         3.272636
#> Picture.Concepts         8.880400

All Fit Indices

fitmeasures(wisc4.fit)
#>                  npar                  fmin                 chisq 
#>                10.000                 0.024                26.775 
#>                    df                pvalue        baseline.chisq 
#>                 5.000                 0.000              1073.427 
#>           baseline.df       baseline.pvalue                   cfi 
#>                10.000                 0.000                 0.980 
#>                   tli                  nnfi                   rfi 
#>                 0.959                 0.959                 0.950 
#>                   nfi                  pnfi                   ifi 
#>                 0.975                 0.488                 0.980 
#>                   rni                  logl     unrestricted.logl 
#>                 0.980             -6378.678             -6365.291 
#>                   aic                   bic                ntotal 
#>             12777.357             12820.456               550.000 
#>                  bic2                 rmsea        rmsea.ci.lower 
#>             12788.712                 0.089                 0.058 
#>        rmsea.ci.upper        rmsea.ci.level          rmsea.pvalue 
#>                 0.123                 0.900                 0.022 
#>        rmsea.close.h0 rmsea.notclose.pvalue     rmsea.notclose.h0 
#>                 0.050                 0.708                 0.080 
#>                   rmr            rmr_nomean                  srmr 
#>                 0.298                 0.298                 0.034 
#>          srmr_bentler   srmr_bentler_nomean                  crmr 
#>                 0.034                 0.034                 0.042 
#>           crmr_nomean            srmr_mplus     srmr_mplus_nomean 
#>                 0.042                 0.034                 0.034 
#>                 cn_05                 cn_01                   gfi 
#>               228.408               310.899                 0.982 
#>                  agfi                  pgfi                   mfi 
#>                 0.947                 0.327                 0.980 
#>                  ecvi 
#>                 0.085

Modification Indices

modificationindices(wisc4.fit, sort = T)
#>                 lhs op              rhs     mi    epc sepc.lv sepc.all sepc.nox
#> 21 Matrix.Reasoning ~~ Picture.Concepts 14.157  1.058   1.058    0.171    0.171
#> 19   Word.Reasoning ~~ Matrix.Reasoning  8.931 -0.710  -0.710   -0.151   -0.151
#> 15      Information ~~ Picture.Concepts  5.493 -0.565  -0.565   -0.139   -0.139
#> 20   Word.Reasoning ~~ Picture.Concepts  2.029  0.365   0.365    0.069    0.069
#> 14      Information ~~ Matrix.Reasoning  1.447  0.280   0.280    0.077    0.077
#> 18     Similarities ~~ Picture.Concepts  0.838 -0.223  -0.223   -0.051   -0.051
#> 16     Similarities ~~   Word.Reasoning  0.791  0.242   0.242    0.073    0.073
#> 13      Information ~~   Word.Reasoning  0.279  0.147   0.147    0.047    0.047
#> 17     Similarities ~~ Matrix.Reasoning  0.147 -0.089  -0.089   -0.023   -0.023
#> 12      Information ~~     Similarities  0.010  0.034   0.034    0.013    0.013

Diagram the Model

semPaths(wisc4.fit, 
         whatLabels="std", 
         what = "std",
         layout ="tree",
         edge.color = "blue",
         edge.label.cex = 1)

WISC Two-Factor Model

  • Note that we only have two items on the latent variable
  • If we see an identification error, we can set these to equal (homework hint!)
wisc4.model2 <- '
V =~ Information + Similarities + Word.Reasoning 
F =~ Matrix.Reasoning + Picture.Concepts
'

# wisc4.model2 <- '
# V =~ Information + Similarities + Word.Reasoning 
# F =~ a*Matrix.Reasoning + a*Picture.Concepts
# '

Analyze the Model

wisc4.fit2 <- cfa(wisc4.model2, 
                  sample.cov=wisc4.cov, 
                  sample.nobs=550,
                  std.lv = F)

Summarize the Model

summary(wisc4.fit2,
        standardized=TRUE, 
        rsquare = TRUE,
        fit.measures=TRUE)
#> lavaan 0.6-19 ended normally after 44 iterations
#> 
#>   Estimator                                         ML
#>   Optimization method                           NLMINB
#>   Number of model parameters                        11
#> 
#>   Number of observations                           550
#> 
#> Model Test User Model:
#>                                                       
#>   Test statistic                                12.687
#>   Degrees of freedom                                 4
#>   P-value (Chi-square)                           0.013
#> 
#> Model Test Baseline Model:
#> 
#>   Test statistic                              1073.427
#>   Degrees of freedom                                10
#>   P-value                                        0.000
#> 
#> User Model versus Baseline Model:
#> 
#>   Comparative Fit Index (CFI)                    0.992
#>   Tucker-Lewis Index (TLI)                       0.980
#> 
#> Loglikelihood and Information Criteria:
#> 
#>   Loglikelihood user model (H0)              -6371.634
#>   Loglikelihood unrestricted model (H1)      -6365.291
#>                                                       
#>   Akaike (AIC)                               12765.269
#>   Bayesian (BIC)                             12812.678
#>   Sample-size adjusted Bayesian (SABIC)      12777.759
#> 
#> Root Mean Square Error of Approximation:
#> 
#>   RMSEA                                          0.063
#>   90 Percent confidence interval - lower         0.026
#>   90 Percent confidence interval - upper         0.103
#>   P-value H_0: RMSEA <= 0.050                    0.244
#>   P-value H_0: RMSEA >= 0.080                    0.272
#> 
#> Standardized Root Mean Square Residual:
#> 
#>   SRMR                                           0.019
#> 
#> Parameter Estimates:
#> 
#>   Standard errors                             Standard
#>   Information                                 Expected
#>   Information saturated (h1) model          Structured
#> 
#> Latent Variables:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   V =~                                                                  
#>     Information       1.000                               2.587    0.860
#>     Similarities      0.984    0.046   21.625    0.000    2.545    0.841
#>     Word.Reasoning    0.858    0.045   18.958    0.000    2.219    0.743
#>   F =~                                                                  
#>     Matrix.Reasnng    1.000                               1.989    0.689
#>     Picture.Cncpts    0.825    0.085    9.747    0.000    1.642    0.552
#> 
#> Covariances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>   V ~~                                                                  
#>     F                 4.233    0.399   10.604    0.000    0.823    0.823
#> 
#> Variances:
#>                    Estimate  Std.Err  z-value  P(>|z|)   Std.lv  Std.all
#>    .Information       2.352    0.253    9.295    0.000    2.352    0.260
#>    .Similarities      2.685    0.261   10.282    0.000    2.685    0.293
#>    .Word.Reasoning    4.000    0.295   13.555    0.000    4.000    0.448
#>    .Matrix.Reasnng    4.380    0.458    9.557    0.000    4.380    0.525
#>    .Picture.Cncpts    6.168    0.451   13.673    0.000    6.168    0.696
#>     V                 6.692    0.567   11.807    0.000    1.000    1.000
#>     F                 3.957    0.569    6.960    0.000    1.000    1.000
#> 
#> R-Square:
#>                    Estimate
#>     Information       0.740
#>     Similarities      0.707
#>     Word.Reasoning    0.552
#>     Matrix.Reasnng    0.475
#>     Picture.Cncpts    0.304

Diagram the Model

semPaths(wisc4.fit2, 
         whatLabels="std", 
         what = "std",
         edge.color = "pink",
         edge.label.cex = 1,
         layout="tree")

Compare the Models

anova(wisc4.fit, wisc4.fit2)
#> 
#> Chi-Squared Difference Test
#> 
#>            Df   AIC   BIC  Chisq Chisq diff   RMSEA Df diff Pr(>Chisq)    
#> wisc4.fit2  4 12765 12813 12.687                                          
#> wisc4.fit   5 12777 12820 26.775     14.088 0.15426       1  0.0001745 ***
#> ---
#> Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
fitmeasures(wisc4.fit, c("aic", "ecvi"))
#>       aic      ecvi 
#> 12777.357     0.085
fitmeasures(wisc4.fit2, c("aic", "ecvi"))
#>       aic      ecvi 
#> 12765.269     0.063

How to Tidy lavaan Output

#install.packages("parameters")
library(parameters)
model_parameters(wisc4.fit, standardize = TRUE)
#> # Loading
#> 
#> Link                  | Coefficient |   SE |       95% CI |     z |      p
#> --------------------------------------------------------------------------
#> g =~ Information      |        0.86 | 0.02 | [0.82, 0.89] | 49.47 | < .001
#> g =~ Similarities     |        0.84 | 0.02 | [0.80, 0.87] | 46.32 | < .001
#> g =~ Word.Reasoning   |        0.74 | 0.02 | [0.70, 0.79] | 32.26 | < .001
#> g =~ Matrix.Reasoning |        0.58 | 0.03 | [0.52, 0.64] | 18.29 | < .001
#> g =~ Picture.Concepts |        0.47 | 0.04 | [0.40, 0.54] | 12.94 | < .001

How to Tidy lavaan Output

library(broom)
tidy(wisc4.fit)
#> # A tibble: 11 × 8
#>    term                op    estimate std.error statistic p.value std.lv std.all
#>    <chr>               <chr>    <dbl>     <dbl>     <dbl>   <dbl>  <dbl>   <dbl>
#>  1 g =~ Information    =~       1        0          NA         NA   2.58   0.857
#>  2 g =~ Similarities   =~       0.985    0.0454     21.7        0   2.54   0.839
#>  3 g =~ Word.Reasoning =~       0.860    0.0454     19.0        0   2.22   0.742
#>  4 g =~ Matrix.Reason… =~       0.647    0.0466     13.9        0   1.67   0.578
#>  5 g =~ Picture.Conce… =~       0.542    0.0496     10.9        0   1.40   0.470
#>  6 Information ~~ Inf… ~~       2.40     0.250       9.59       0   2.40   0.265
#>  7 Similarities ~~ Si… ~~       2.71     0.258      10.5        0   2.71   0.296
#>  8 Word.Reasoning ~~ … ~~       4.01     0.295      13.6        0   4.01   0.449
#>  9 Matrix.Reasoning ~… ~~       5.55     0.360      15.4        0   5.55   0.666
#> 10 Picture.Concepts ~… ~~       6.91     0.434      15.9        0   6.91   0.779
#> 11 g ~~ g              ~~       6.65     0.564      11.8        0   1      1
glance(wisc4.fit)
#> # A tibble: 1 × 17
#>    agfi    AIC    BIC   cfi chisq  npar  rmsea rmsea.conf.high   srmr   tli
#>   <dbl>  <dbl>  <dbl> <dbl> <dbl> <dbl>  <dbl>           <dbl>  <dbl> <dbl>
#> 1 0.947 12777. 12820. 0.980  26.8    10 0.0890           0.123 0.0345 0.959
#> # ℹ 7 more variables: converged <lgl>, estimator <chr>, ngroups <int>,
#> #   missing_method <chr>, nobs <dbl>, norig <dbl>, nexcluded <dbl>

Summary

  • In this lecture you’ve learned:

    • How to create a simple confirmatory factor analysis (measurement model)
    • How to view parameter estimates, modification indices, and more
    • Practiced building these models and comparing them