Are Your Interpretations of the Assessment’s Results Valid?

Key Learning Objectives

The assessment of learners is a critical part of modern surgical training. Educators and researchers tasked with this responsibility must understand validity theory to appropriately assess learners, interpret outcomes, and make and justify decisions about competence, selection, and advancement. Unfortunately, outdated concepts surrounding validity persist in and muddle the medical education literature.^1,2 This article introduces contemporary validity theory to surgical educators and education researchers who design assessments or interpret and act on their results.

How Is Validity Conceptualized in Contemporary Frameworks?

Validity theory has evolved over several decades. Classical frameworks from the 1950s viewed validity as a property of the assessment tool and described multiple types, including content, construct, and face. These frameworks were replaced by Samuel Messick’s contemporary conceptualization, which was first described in the 1980s. It was later incorporated into the Standards for Educational and Psychological Testing, a consensus standard of professional societies in education, psychology, and measurement. Since then, scholars such as Michael Kane have proposed additional contemporary frameworks.³

All these contemporary frameworks view validity as a quality of the evidence that educators and researchers provide to support how they interpreted and used the results of an assessment. For example, educators and researchers can interpret or use the results of an assessment for use in particular ways. When this happens, they essentially make an argument that they interpreted or used the results appropriately. To support that argument, they can gather and present evidence in a process known as validation.³ Thus, validity is the degree to which that evidence supports their argument.

In this way, validity is no longer a property specific to an assessment tool. Instead, the validity of an interpretation or action is specific to the context in which the assessment was used. For example, the Fundamentals of Laparoscopic Surgery—an assessment tool that measures the construct of laparoscopic skill—can be administered to different subjects (for example, medical students versus interns) and for different purposes (for example, feedback versus certification).⁴ Thus, an assessment tool given in a unique context (i.e., to specific subjects and for specific purposes) produces unique results. The arguments—interpretations, uses—based on those results and the evidence gathered to support those arguments are therefore unique to that context rather than to the tool itself.

With this background in mind, Table 1 provides some questions to consider along with example answers from a surgical validation study.⁵ Asking and answering them will help to uncover assumptions and intentions that inform the validation process.

Table 1: Understanding Validity in a Surgical Validation Study

Questions to Consider	Ensuring Competency in Open Aortic Aneurysm Repair—Development and Validation of a New Assessment Tool⁵
What is the construct being measured?	Basic technical skills to perform an open AAA repair
What is the assessment tool?	Using OPERATE, an OSATS-based tool, the subjects will perform a simulation-based open AAA repair
Who are the subjects being assessed?	Novice and experienced vascular surgeons
What is the intended purpose of the assessment tool?	To assess if the subjects have sufficient technical skills for performing an open AAA repair before starting supervised training on patients
What will be the interpretation of the results?	If a subject scores at or above the pass/fail standard on the OPERATE assessment tool, then they are qualified for supervised training on patients

How Do Contemporary Frameworks of Validity Replace Outdated Concepts?

Despite the evolution of validity theory in the broader education sphere, outdated concepts persist in the medical education literature.^1,2 We will replace some commonly used but outdated concepts (in bold) with a contemporary understanding.

“I am using a validated assessment tool from the literature.”

This example highlights two outdated concepts:

Classical frameworks state that validity is specific to the assessment tool, while contemporary frameworks clarify that validity is specific to the context and use of the assessment.
This idea of a “validated…tool from the literature” can stem from the misconception that being published is evidence of validity.

“Is this interpretation valid or not?”

This question views validity as a binary outcome: present or not. Contemporary frameworks instead view validity as a continuum: “Is there sufficient validity evidence to support this interpretation?”

“We demonstrated face, content, and construct validity for this assessment tool.”

While classical frameworks describe distinct types of validity, contemporary frameworks view validity as a unified quality. Furthermore, the use of face validity has been discouraged by most education scholars.^2,3

What Are the Sources of Validity Evidence in Messick’s Framework?

While contemporary frameworks share key principles, they differ on the evidence required. We will focus on Messick’s framework, as it is endorsed by the Standards for Educational and Psychological Testing.⁶ Table 2 defines the five sources of validity evidence in Messick’s framework and provides examples collected in the surgical validation study referenced above.

Table 2: Sources, Definitions, and Examples of Validity Evidence in Messick’s Framework

Sources	Definition	Description of the Evidence from Nayahangan et al. 2020⁵
Content	Evidence used to show that the assessment tool measures the construct and that it is both representative (includes everything it’s supposed to) and relevant (excludes anything irrelevant).	The OPERATE assessment tool was developed by subject-matter experts (experienced surgeons and an education scientist) who uniformly agreed on what skills to assess and how to assess them
Response Process	Evidence used to show that how the subjects respond to the assessment items (i.e., the thoughts and actions behind their responses) and how raters evaluate that response are consistent and fit with what is intended	For consistency, the same authors administered the simulation to all subjects and all raters received rater training to understand the assessment
Internal Structure	Evidence used to show that the scores on assessment items that measure the same construct have the intended relationship to each other	The internal consistency of the OPERATE assessment tool was high at .92, based on Cronbach’s alpha.
Relationship to Other Variables	Evidence used to show that the scores on the construct measured by the assessment tool have the intended relationship to other constructs	Scores from the OPERATE assessment tool correlate with the subject’s experience level (novice vs. experienced).
Consequences	Evidence to highlight the consequences that the results of this assessment have: e.g., (1) does the interpretation of the results maximize benefits and minimize harm and (2) what are intended and unintended consequences of that interpretation?	Based on the pass/fail standard created for the OPERATE tool, 6% of the novices and 33% of the experts passed. A low pass rate in novices fits with the intended use (a credible assessment for pass/fail) but could be an unintended consequence for the experienced group.

For a deeper exploration of validity evidence, contemporary frameworks, and other information about validity, refer to Cook and Hatala’s 2016 article or chapter 2 in the Assessment in Health Professions Education textbook.^3,7 Finally, given the growing diversity of surgical trainees, surgical educators and researchers should consider issues of diversity, equity, and inclusion and ask if the assessment of learners or arguments made from those results account for these issues. For more information on diversity and validity of assessments, we refer readers to additional resources.^8,9

Summary of Learning Points

Contemporary educational frameworks view validity as a singular quality: it is the degree to which evidence supports how an assessment’s results are interpreted or acted upon.
These conceptualizations update older frameworks that describe different forms of validity, such as face, construct, content, predictive, etc. These descriptors are no longer used.
Messick’s framework describes five sources of validity evidence: content, response process, internal structure, relationship to other variables, and consequences.

Conclusion

Surgical educators and researchers should have a clear understanding of contemporary principles of validity theory. It is vital to not only the accurate communication of assessment research, but also to the thoughtful evaluation and education of surgical trainees.

References

Royal KD. Four tenets of modern validity theory for medical education assessment and evaluation. Adv Med Educ Pract. 2017;8:567-570. doi:10.2147/AMEP.S139492
Royal K. “Face validity” is not a legitimate type of validity evidence! Am J Surg. 2016;212(5):1026-1027. doi:10.1016/J.AMJSURG.2016.02.018
Cook DA, Hatala R. Validation of educational assessments: a primer for simulation and beyond. Adv Simul 2016 11. 2016;1(1):1-12. doi:10.1186/S41077-016-0033-Y
Edelman DA, Mattos MA, Bouwman DL. FLS skill retention (learning) in first year surgery residents. J Surg Res. 2010;163(1):24-28. doi:10.1016/J.JSS.2010.03.057
Nayahangan LJ, Lawaetz J, Strøm M, et al. Ensuring Competency in Open Aortic Aneurysm Repair – Development and Validation of a New Assessment Tool. Eur J Vasc Endovasc Surg. 2020;59(5):767-774. doi:10.1016/J.EJVS.2020.01.021
American Psychological Association, American Educational Research Association, National Council on Measurement in Education, eds. Standards for Educational and Psychological Testing. American Educational Research Association; 2014.
Lineberry M. Validity and Quality. Assess Heal Prof Educ. Published online 2019:17-32. doi:10.4324/9781315166902-2
Basterra M del R, Trumbull E, Solano Flores G. Cultural Validity in Assessment : Addressing Linguistic and Cultural Diversity. Routledge; 2011.
Cintron D, Hagan E. Diversity, Equity, and Inclusion Validity Arguments. Evidence for Action Methods Notes. Published 2021. Accessed November 12, 2022. https://www.evidenceforaction.org/sites/default/files/2021-05/DEI-instruments-Methods-Note-43021.pdf

Authors

Gazi Rashid, MD

Research Fellow, Department of Surgery, Massachusetts General Hospital

Kristen Jogerst, MD, MPH

Resident, Department of Surgery, Massachusetts General Hospital

Michael G. Healy, EdD

Health Professions Education Researcher, Massachusetts General Hospital

Yoon Soo Park, PhD

Director of Health Professions Education Research, Massachusetts General Hospital

Emil Petrusa, PhD

Professor of Surgery, Department of Surgery, Massachusetts General Hospital

Corresponding Author

Gazi Rashid

Massachusetts General Hospital
Corresponding author: grashid@mgh.harvard.edu

Are Your Interpretations of the Assessment’s Results Valid? An Introduction to Validity Theory for the Surgical Educator

Gazi Rashid, MD; Kristen Jogerst, MD, MPH; Michael G. Healy, EdD; Yoon Soo Park, PhD; Emil Petrusa, PhD

Key Learning Objectives

How Is Validity Conceptualized in Contemporary Frameworks?

How Do Contemporary Frameworks of Validity Replace Outdated Concepts?

What Are the Sources of Validity Evidence in Messick’s Framework?

Summary of Learning Points

Conclusion

References

Authors

Gazi Rashid, MD

Kristen Jogerst, MD, MPH

Michael G. Healy, EdD

Yoon Soo Park, PhD

Emil Petrusa, PhD

Corresponding Author

Gazi Rashid

Related Pages

References

How the Personal Characteristics of Grit and Resilience Relate to Surgeon Well-Being

Develop a Game Plan to Educate and Prepare Residents for an Intraoperative Crisis

References

Entrustable Professional Activities (EPAs) and Applications to Surgical Training

References

References

Prepare Students for their Surgical Residency Interviews with a Residency Workshop

References

How to Determine Whether Your Residency Evaluation System Is on Target