Item Response Theory

What do you do if you have dichotomous (or categorical) manifest variables?
- Many agree that more than four response options can be treated as continuous without a loss in power or interpretation.
- Do you treat these values as categorical?
Do you assume the underlying latent variable is continuous?

Categorical Options

There are two approaches that allow us to analyze data with categorical predictors:
- Item Factor Analysis
  - More traditional factor analysis approach using ordered responses
  - You can talk about item loading, eliminate bad questions, etc.
  - In the lavaan framework, you update your cfa() to include the ordered argument
- Item Response Theory

Item Response Theory

Classical test theory is considered “true score theory”
- Any differences in responses are differences in ability or underlying trait
- CTT focuses on reliability and item correlation type analysis
- Cannot separate the test and person characteristics
IRT is considered more modern test theory focusing on the latent trait
- Focuses on the item for where it measures a latent trait, discrimination, and guessing
- Additionally, with more than two outcomes, we can examine ordering, response choice options, and more

Issues

Unidimensionality: assumption is that there is one underlying trait or dimension you are measuring
- You can run separate models for each dimension
- There are multitrait options for IRT
Local Independence
- After you control for the latent variable, the items are uncorrelated

Item Response Theory

A simple example of test versus person
- 3 item questionnaire
- Yes/no scaling
- 8 response patterns
- Four total scores (0, 1, 2, 3)

Item Response Theory

Item characteristic curves (ICCs)
- The log probability curve of theta and the probability of a correct response

Item Response Theory

Theta – ability or the underlying latent variable score
b – Item location – where the probability of getting an item correct is 50/50
- Also considered where the item performs best
- Can be thought of as item difficulty

Item Response Theory

a – item discrimination
- Tells you how well an item measures the latent variable
- Larger a values indicate better items

Item Response Theory

c – guessing parameter
- The lower level likelihood of getting the item correct

Item Response Theory

1 Parameter Logistic (1PL)
- Also known as the Rasch Model
- Only uses b
2 Parameter Logistic (2PL)
- Uses b and a
3 Parameter Logistic (3PL)
- Uses b, a, and c

Polytomous IRT

A large portion of IRT focuses on dichotomous data (yes/no, correct/incorrect)
Scoring is easier because you have “right” and “wrong” answers
Separately, polytomous IRT focuses on data with multiple answers, with no “right” answer
- Focus on ordering, meaning that low scores represent lower abilities, while high scores are higher abilities
- Likert type scales

Polytomous IRT

Couple of types of models:
- Graded Response Model
- Generalized Partial Credit Model
- Partial Credit Model

Polytomous IRT

A graded response model is simplest but can be hard to fit.
Takes the number of categories – 1 and creates mini 2PLs for each of those boundary points (1-rest, 2-rest, 3-rest, etc.).
You get probabilities of scoring at this level OR higher

Polytomous IRT

The generalized partial credit and partial credit models account for the fact that you may not have each category used equally
Therefore, you get the mini 2PLs for adjacent categories (1-2, 2-3, 3-4)
If your categories are ordered (which you often want), these two estimations can be very similar.
Another concern with the partial credit models is making sure that all categories have a point at which they are the most likely answer (thresholds)

Polytomous IRT

Install the mirt() library to use the multidimensional IRT package.
We are not covering multiple dimensional or multigroup IRT, but this package can do those models or polytomous estimation.

IRT Examples

Let’s start with DIRT: Dichotomous IRT
Dataset is the LSAT, which is scored as right or wrong

library(ltm)
#> Loading required package: MASS
#> Loading required package: msm
#> Loading required package: polycor
library(mirt)
#> Loading required package: stats4
#> Loading required package: lattice
#> 
#> Attaching package: 'mirt'
#> The following object is masked from 'package:ltm':
#> 
#>     Science
data(LSAT)
head(LSAT)
#>   Item 1 Item 2 Item 3 Item 4 Item 5
#> 1      0      0      0      0      0
#> 2      0      0      0      0      0
#> 3      0      0      0      0      0
#> 4      0      0      0      0      1
#> 5      0      0      0      0      1
#> 6      0      0      0      0      1

Two Parameter Logistic

# Data frame name ~ z1 for one latent variable
#irt.param to give it to you standardized
LSAT.model <- ltm(LSAT ~ z1,
                  IRT.param = TRUE)

2PL Output

Difficulty = b = theta = ability
Discrimination = a = how good the question is at figuring a person out.

coef(LSAT.model)
#>            Dffclt    Dscrmn
#> Item 1 -3.3597341 0.8253715
#> Item 2 -1.3696497 0.7229499
#> Item 3 -0.2798983 0.8904748
#> Item 4 -1.8659189 0.6885502
#> Item 5 -3.1235725 0.6574516

2PL Plots

plot(LSAT.model, type = "ICC") ## all items at once

2PL Plots

plot(LSAT.model, type = "IIC", items = 0) ## Test Information Function

2PL Other Options

factor.scores(LSAT.model)
#> 
#> Call:
#> ltm(formula = LSAT ~ z1, IRT.param = TRUE)
#> 
#> Scoring Method: Empirical Bayes
#> 
#> Factor-Scores for observed response patterns:
#>    Item 1 Item 2 Item 3 Item 4 Item 5 Obs     Exp     z1 se.z1
#> 1       0      0      0      0      0   3   2.277 -1.895 0.795
#> 2       0      0      0      0      1   6   5.861 -1.479 0.796
#> 3       0      0      0      1      0   2   2.596 -1.460 0.796
#> 4       0      0      0      1      1  11   8.942 -1.041 0.800
#> 5       0      0      1      0      0   1   0.696 -1.331 0.797
#> 6       0      0      1      0      1   1   2.614 -0.911 0.802
#> 7       0      0      1      1      0   3   1.179 -0.891 0.803
#> 8       0      0      1      1      1   4   5.955 -0.463 0.812
#> 9       0      1      0      0      0   1   1.840 -1.438 0.796
#> 10      0      1      0      0      1   8   6.431 -1.019 0.801
#> 11      0      1      0      1      1  16  13.577 -0.573 0.809
#> 12      0      1      1      0      1   3   4.370 -0.441 0.813
#> 13      0      1      1      1      0   2   2.000 -0.420 0.813
#> 14      0      1      1      1      1  15  13.920  0.023 0.828
#> 15      1      0      0      0      0  10   9.480 -1.373 0.797
#> 16      1      0      0      0      1  29  34.616 -0.953 0.802
#> 17      1      0      0      1      0  14  15.590 -0.933 0.802
#> 18      1      0      0      1      1  81  76.562 -0.506 0.811
#> 19      1      0      1      0      0   3   4.659 -0.803 0.804
#> 20      1      0      1      0      1  28  24.989 -0.373 0.815
#> 21      1      0      1      1      0  15  11.463 -0.352 0.815
#> 22      1      0      1      1      1  80  83.541  0.093 0.831
#> 23      1      1      0      0      0  16  11.254 -0.911 0.802
#> 24      1      1      0      0      1  56  56.105 -0.483 0.812
#> 25      1      1      0      1      0  21  25.646 -0.463 0.812
#> 26      1      1      0      1      1 173 173.310 -0.022 0.827
#> 27      1      1      1      0      0  11   8.445 -0.329 0.816
#> 28      1      1      1      0      1  61  62.520  0.117 0.832
#> 29      1      1      1      1      0  28  29.127  0.139 0.833
#> 30      1      1      1      1      1 298 296.693  0.606 0.855

Three Parameter Logistic

LSAT.model2 <- tpm(LSAT, #dataset
                   type = "latent.trait",
                   IRT.param = TRUE)
#> Warning in tpm(LSAT, type = "latent.trait", IRT.param = TRUE): Hessian matrix at convergence is not positive definite; unstable solution.

3PL Output

Difficulty = b = theta = ability
Discrimination = a = how good the question is at figuring a person out.
Guessing = c = how easy the item is to guess

coef(LSAT.model2)
#>            Gussng     Dffclt     Dscrmn
#> Item 1 0.06389395 -3.3423509  0.8048523
#> Item 2 0.01567005 -1.5530954  0.6070241
#> Item 3 0.30088256  0.3301527 26.4150208
#> Item 4 0.06521055 -2.0342571  0.5700252
#> Item 5 0.02908352 -3.5826451  0.5523586

3PL Plots

plot(LSAT.model2, type = "ICC") ## all items at once

3PL Plots

plot(LSAT.model2, type = "IIC", items = 0) ## Test Information Function

3PL Other Options

factor.scores(LSAT.model2)
#> 
#> Call:
#> tpm(data = LSAT, type = "latent.trait", IRT.param = TRUE)
#> 
#> Scoring Method: Empirical Bayes
#> 
#> Factor-Scores for observed response patterns:
#>    Item 1 Item 2 Item 3 Item 4 Item 5 Obs     Exp     z1 se.z1
#> 1       0      0      0      0      0   3   1.538 -1.659 0.865
#> 2       0      0      0      0      1   6   5.113 -1.245 0.876
#> 3       0      0      0      1      0   2   2.375 -1.245 0.879
#> 4       0      0      0      1      1  11   9.552 -0.815 0.891
#> 5       0      0      1      0      0   1   0.686 -1.659 0.865
#> 6       0      0      1      0      1   1   2.472 -1.245 0.876
#> 7       0      0      1      1      0   3   1.149 -1.245 0.879
#> 8       0      0      1      1      1   4   5.597 -0.815 0.891
#> 9       0      1      0      0      0   1   1.678 -1.205 0.878
#> 10      0      1      0      0      1   8   6.870 -0.777 0.890
#> 11      0      1      0      1      1  16  15.339 -0.330 0.906
#> 12      0      1      1      0      1   3   4.118 -0.777 0.890
#> 13      0      1      1      1      0   2   1.917 -0.774 0.893
#> 14      0      1      1      1      1  15  13.227 -0.330 0.906
#> 15      1      0      0      0      0  10   7.980 -1.053 0.883
#> 16      1      0      0      0      1  29  34.733 -0.619 0.895
#> 17      1      0      0      1      0  14  16.116 -0.616 0.898
#> 18      1      0      0      1      1  81  81.511 -0.166 0.910
#> 19      1      0      1      0      0   3   4.134 -1.053 0.883
#> 20      1      0      1      0      1  28  23.238 -0.619 0.895
#> 21      1      0      1      1      0  15  10.828 -0.616 0.898
#> 22      1      0      1      1      1  80  83.177 -0.166 0.912
#> 23      1      1      0      0      0  16  11.668 -0.577 0.897
#> 24      1      1      0      0      1  56  59.731 -0.129 0.908
#> 25      1      1      0      1      0  21  27.722 -0.122 0.910
#> 26      1      1      0      1      1 173 158.900  0.150 0.376
#> 27      1      1      1      0      0  11   8.068 -0.577 0.897
#> 28      1      1      1      0      1  61  63.489 -0.128 0.913
#> 29      1      1      1      1      0  28  29.695 -0.122 0.916
#> 30      1      1      1      1      1 298 303.369  0.503 0.407

Compare Models

anova(LSAT.model, LSAT.model2)
#> Warning in anova.ltm(LSAT.model, LSAT.model2): either the two models are not nested or the model represented by 'object2' fell on a local maxima.
#> 
#>  Likelihood Ratio Table
#>                 AIC     BIC  log.Lik   LRT df p.value
#> LSAT.model  4953.31 5002.38 -2466.65                 
#> LSAT.model2 4967.02 5040.64 -2468.51 -3.71  5       1

Polytomous IRT

Dataset includes the Meaning in Life Questionnaire

library(rio)
poly.data <- import("data/lecture_irt.csv")
poly.data <- na.omit(poly.data)

#reverse code
poly.data$Q99_9 = 8 - poly.data$Q99_9

#separate factors
poly.data1 = poly.data[ , c(1, 4, 5, 6, 9)]
poly.data2 = poly.data[ , c(2, 3, 7, 8, 10)]

Graded Partial Credit Model

gpcm.model1 <- mirt(data = poly.data1, #data
                    model = 1, #number of factors
                    itemtype = "gpcm") #poly model type
#> Iteration: 1, Log-Lik: -11632.167, Max-Change: 4.82538Iteration: 2, Log-Lik: -10643.196, Max-Change: 2.80519Iteration: 3, Log-Lik: -10466.648, Max-Change: 1.46456Iteration: 4, Log-Lik: -10407.465, Max-Change: 1.07570Iteration: 5, Log-Lik: -10391.165, Max-Change: 0.70497Iteration: 6, Log-Lik: -10380.881, Max-Change: 0.46272Iteration: 7, Log-Lik: -10377.111, Max-Change: 0.31365Iteration: 8, Log-Lik: -10374.263, Max-Change: 0.21356Iteration: 9, Log-Lik: -10372.203, Max-Change: 0.38073Iteration: 10, Log-Lik: -10369.957, Max-Change: 0.15524Iteration: 11, Log-Lik: -10368.478, Max-Change: 0.22565Iteration: 12, Log-Lik: -10367.024, Max-Change: 0.16375Iteration: 13, Log-Lik: -10364.773, Max-Change: 0.09306Iteration: 14, Log-Lik: -10363.888, Max-Change: 0.14681Iteration: 15, Log-Lik: -10363.042, Max-Change: 0.13197Iteration: 16, Log-Lik: -10360.346, Max-Change: 0.12133Iteration: 17, Log-Lik: -10359.333, Max-Change: 0.03727Iteration: 18, Log-Lik: -10359.002, Max-Change: 0.04397Iteration: 19, Log-Lik: -10358.801, Max-Change: 0.04719Iteration: 20, Log-Lik: -10358.591, Max-Change: 0.05133Iteration: 21, Log-Lik: -10358.410, Max-Change: 0.02077Iteration: 22, Log-Lik: -10358.337, Max-Change: 0.04891Iteration: 23, Log-Lik: -10358.184, Max-Change: 0.03701Iteration: 24, Log-Lik: -10358.050, Max-Change: 0.01716Iteration: 25, Log-Lik: -10358.005, Max-Change: 0.03666Iteration: 26, Log-Lik: -10357.894, Max-Change: 0.03487Iteration: 27, Log-Lik: -10357.797, Max-Change: 0.01621Iteration: 28, Log-Lik: -10357.767, Max-Change: 0.01415Iteration: 29, Log-Lik: -10357.694, Max-Change: 0.04601Iteration: 30, Log-Lik: -10357.617, Max-Change: 0.03283Iteration: 31, Log-Lik: -10357.524, Max-Change: 0.01379Iteration: 32, Log-Lik: -10357.474, Max-Change: 0.03921Iteration: 33, Log-Lik: -10357.424, Max-Change: 0.00916Iteration: 34, Log-Lik: -10357.416, Max-Change: 0.01006Iteration: 35, Log-Lik: -10357.380, Max-Change: 0.02822Iteration: 36, Log-Lik: -10357.343, Max-Change: 0.00895Iteration: 37, Log-Lik: -10357.337, Max-Change: 0.00870Iteration: 38, Log-Lik: -10357.310, Max-Change: 0.00980Iteration: 39, Log-Lik: -10357.287, Max-Change: 0.03201Iteration: 40, Log-Lik: -10357.257, Max-Change: 0.00613Iteration: 41, Log-Lik: -10357.241, Max-Change: 0.00621Iteration: 42, Log-Lik: -10357.226, Max-Change: 0.00628Iteration: 43, Log-Lik: -10357.203, Max-Change: 0.00560Iteration: 44, Log-Lik: -10357.190, Max-Change: 0.00178Iteration: 45, Log-Lik: -10357.186, Max-Change: 0.00399Iteration: 46, Log-Lik: -10357.179, Max-Change: 0.00227Iteration: 47, Log-Lik: -10357.176, Max-Change: 0.03798Iteration: 48, Log-Lik: -10357.155, Max-Change: 0.00415Iteration: 49, Log-Lik: -10357.155, Max-Change: 0.00413Iteration: 50, Log-Lik: -10357.148, Max-Change: 0.00366Iteration: 51, Log-Lik: -10357.144, Max-Change: 0.00372Iteration: 52, Log-Lik: -10357.131, Max-Change: 0.04537Iteration: 53, Log-Lik: -10357.110, Max-Change: 0.00132Iteration: 54, Log-Lik: -10357.109, Max-Change: 0.00345Iteration: 55, Log-Lik: -10357.106, Max-Change: 0.00159Iteration: 56, Log-Lik: -10357.105, Max-Change: 0.00040Iteration: 57, Log-Lik: -10357.104, Max-Change: 0.00058Iteration: 58, Log-Lik: -10357.104, Max-Change: 0.00021Iteration: 59, Log-Lik: -10357.104, Max-Change: 0.00047Iteration: 60, Log-Lik: -10357.104, Max-Change: 0.00016Iteration: 61, Log-Lik: -10357.104, Max-Change: 0.00013Iteration: 62, Log-Lik: -10357.104, Max-Change: 0.00025Iteration: 63, Log-Lik: -10357.104, Max-Change: 0.00012Iteration: 64, Log-Lik: -10357.104, Max-Change: 0.00010

GPCM Output

Can also get factor loadings here, with standardized coefficients to help us determine if they relate to their latent trait

summary(gpcm.model1) ##standardized coefficients 
#>          F1    h2
#> Q99_1 0.964 0.929
#> Q99_4 0.984 0.968
#> Q99_5 0.976 0.953
#> Q99_6 0.977 0.955
#> Q99_9 0.720 0.519
#> 
#> SS loadings:  4.324 
#> Proportion Var:  0.865 
#> 
#> Factor correlations: 
#> 
#>    F1
#> F1  1

GPCM Output

coef(gpcm.model1, IRTpars = T) ##coefficients
#> $Q99_1
#>         a     b1     b2     b3     b4    b5    b6
#> par 1.927 -1.905 -1.344 -1.107 -0.607 0.225 1.236
#> 
#> $Q99_4
#>         a     b1    b2     b3     b4    b5    b6
#> par 2.941 -1.952 -1.67 -1.082 -0.592 0.121 0.972
#> 
#> $Q99_5
#>         a     b1     b2     b3     b4     b5    b6
#> par 2.395 -2.052 -1.601 -1.255 -0.825 -0.093 0.979
#> 
#> $Q99_6
#>         a     b1    b2     b3     b4    b5    b6
#> par 2.448 -2.014 -1.43 -1.168 -0.531 0.118 1.085
#> 
#> $Q99_9
#>         a     b1     b2     b3     b4     b5     b6
#> par 0.553 -1.671 -2.488 -1.202 -0.113 -1.115 -0.296
#> 
#> $GroupPars
#>     MEAN_1 COV_11
#> par      0      1

head(fscores(gpcm.model1)) ##factor scores
#>              F1
#> [1,] -0.6805579
#> [2,] -2.7481783
#> [3,] -1.2486173
#> [4,] -1.4226850
#> [5,] -2.7481783
#> [6,] -2.7481783

GPCM Plots

plot(gpcm.model1, type = "trace") ##curves for all items at once

itemplot(gpcm.model1, 5, type = "trace")

GPCM Plots

itemplot(gpcm.model1, 4, type = "info") ##IIC for each item

plot(gpcm.model1, type = "info") ##test information curve

GPCM Plots

plot(gpcm.model1) ##expected score curve

Summary

In this lecture you’ve learned:
- Item response theory compared to classical test theory
- How to run a dichotomous or traditional IRT with 2PL and 3PL
- How to run a polytomous IRT using graded partial credit model
- How to compare models and interpret their output