Building a Computerized Adaptive Version of Psychological Scales

digital assessments psychological scales CAT

Computerized adaptive testing (CAT) is a sophisticated methodology to create measurement instruments that are highly accurate and efficient. In this post, I explain how to evaluate the feasibility of creating a computerized adaptive version of a psychological instrument.

(12 min read)

Okan Bulut http://www.okanbulut.com/ (University of Alberta)https://www.ualberta.ca
02-20-2021
Photo by Glenn Carstens-Peters on Unsplash

Introduction

Psychologists, counselors, educators, and other practitioners often rely on educational and psychological instruments–such as tests, questionnaires, and inventories–to make informed decisions about individuals. Most educational and psychological instruments in use today follow the traditional approach of asking as many items as possible via a paper-and-pencil assessment. Although this approach is likely to increase the internal consistency of the instrument, it may lead to some unintended consequences, such as having low-quality responses due to test fatigue and test-taking disengagement. Therefore, many researchers have proposed systematic ways to shorten educational and psychological instruments (Sandy et al., 2014; Yang et al., 2010; Yarkoni, 2010).

In my previous posts, I demonstrated how to shorten measurement instruments using psychometric methods such as automated test assembly and data science methods such as the ant colony optimization. These methods can help researchers and practitioners build a shorter version of an instrument and thereby increasing measurement efficiency. However, as Weiss (2004) pointed out, conventional assessments with fixed items (i.e., the same items being used for everyone) tend to yield accurate results for individuals whose trait levels are around the mean of the target population but yield poor measurement results for those whose latent trait levels deviate from the mean.

A promising solution to creating assessments with high measurement accuracy for all individuals is the use of adaptive testing. Adaptive testing follows the idea of adapting an assessment to each individual’s latent trait level by administering a customized set of items, instead of administering the same set of items to all individuals. In this post, I will briefly explain how adaptive testing works and then demonstrate how to create a computerized adaptive version of an existing psychological instrument.

Computerized Adaptive Testing

Computerized adaptive testing (CAT) is a sophisticated method of delivering computerized assessments with high measurement precision and efficiency (Thompson & Weiss, 2011; Weiss, 2004). The primary goal of CAT is to customize the assessment for each individual by selecting the most suitable items based on their responses to the previously administered questions. To design and implement a CAT, the following five components are necessary (Thompson & Weiss, 2011):

In a typical CAT, each individual begins to receive items at a particular level (e.g., \(\theta = 0\)). Then, depending on the answer, the next item becomes less or more difficult. For example, if the individual answers an item with moderate difficulty correctly, then the CAT assumes that this individual’s latent trait level is above the difficulty level of the question and thus it presents a more difficult question in the next round. If, however, the individual is not able to answer the item correctly, then she/he is administered an easier question in the next round. this iterative process continues until a test termination criterion is met (e.g., answering the maximum number of questions).

Before implementing a CAT, it is important to obtain enough evidence to support its use in operational settings. For example, if researchers aim to redesign a conventional, non-adaptive instrument as a CAT, then a series of post-hoc simulations can be conducted using previously collected data. A post-hoc simulation is a psychometric procedure to evaluate the effect of different CAT algorithms under specific conditions (Seo & Choi, 2018). Using real data from a conventional instrument taken by a large group of individuals, one can create a hypothetical CAT scenario and evaluate the impact of various CAT elements (e.g., different algorithms for item selection or scoring) on measurement accuracy and efficiency. The results of these simulations can help researchers and practitioners determine the feasibility and applicability of the CAT approach for an existing instrument.

Example

In this example, we will use real data from a sample of respondents (\(n = 4474\)) who responded to the items in the Taylor Manifest Anxiety Scale (Taylor, 1953) and build a CAT version of this instrument through post-hoc simulations. The Taylor Manifest Anxiety Scale is typically used for measuring anxiety as a personality trait. The scale consists of fifty statements about different indicators of anxiety. For each item, individuals select either true or false. The higher the score, the higher the anxiety level. Because some items on the scale are negatively phrased, they must be reverse-coded1. A clean version of the data (including reverse-coding) is available here. Now let’s import the data and check out its content.

# Import the TMA data into R
data <- read.csv("tma_data.csv", header = TRUE)

# Preview the data
head(data)