Importance: Occupational therapy education and practice has changed over time; however, items on the American Occupational Therapy Association’s (AOTA’s) Fieldwork Performance Evaluations (FWPEs) for the Occupational Therapy Student (OTS) and Occupational Therapy Assistant Student (OTAS) have not been updated in more than two decades.
Objective: To explore evidence of validity in relation to test content of the revised FWPEs for the OTS and OTAS
Design: A qualitative study using cognitive interviews was conducted to gather perspectives on the revised FWPEs, including updated items and a proposed rating scale. A content analysis approach was used to link patterns in responses to stable, meaningful constructs to further align and refine content of the tool before further validation.
Setting: Fieldwork sites and academic settings.
Participants: Eighteen fieldwork educators (FWEs) and academic fieldwork coordinators
Results: Seven areas for refinement were identified: (1) relevance to a variety of practice settings, (2) overlapping and redundant items, (3) long item statements, (4) double- and triple-barreled item statements, (5) alignment between OTS and OTAS items, (6) further alignment with AOTA and Accreditation Council for Occupational Therapy Education documents and contemporary practice, and (7) wording and features of the proposed rating scale. The findings guided revisions of FWPE content.
Conclusions and Relevance: Cognitive interviewing was a critical step in refining the FWPE items to support content validity. The issues identified may not have been detected using traditional approaches to survey development and therefore were critical in maximizing the validity and usefulness of the final items, which will ultimately benefit fieldwork educators and students.
What This Article Adds: This study resulted in refinements to FWPE items before the next step in the validation process, ultimately improving the final FWPE items. In addition, this article outlines a process that other researchers can use to validate similar tools.
Educational standards set by the Accreditation Council for Occupational Therapy Education (ACOTE®; 2012, 2018) state that a formal evaluation of student performance during Level II fieldwork must be documented. Occupational therapy and occupational therapy assistant programs may meet this requirement by using the American Occupational Therapy Association (AOTA) Fieldwork Performance Evaluations (FWPEs). The current versions of these tools—the FWPE for the Occupational Therapy Student (OTS; AOTA, 2002b) and the FWPE for the Occupational Therapy Assistant Student (OTAS; AOTA, 2002a)—were adopted in 2002 by AOTA’s Commission on Education (see also Atler, 2003).
As a result of changes in occupational therapy education (e.g., updated education standards) and in practice (e.g., increased productivity expectations, emerging areas of practice), AOTA appointed a series of task forces to, over time, revise the 2002 versions of the FWPEs for the OTS and the OTAS. In 2015, the task force submitted its recommended items, including examples of the items, for both versions. Within this process, there was also an objective to address known issues with the 2002 FWPEs (e.g., how to score the OTS item about assigning responsibilities to the occupational therapy assistant and occupational therapy aide in settings where no such personnel exist).
To further validate these updated items, we engaged in a rigorous two-stage research process with the goal of producing updated tools with strong psychometric properties that reflect current occupational therapy education and practice. The first phase of this process, which involved the use of cognitive interviewing, was aimed at establishing initial validity evidence for item content. This first phase is the focus of the present article. The second phase of the study used a quantitative approach—Rasch analysis—to evaluate evidence of internal structure, response processes, fairness in testing, and precision. That phase is the focus of a second article.
Cognitive interviewing has become a prevalent method for pretesting draft versions of surveys to identify and address problems with survey items (Beatty & Willis, 2007; Collins, 2015; Conrad & Blair, 1996). With roots in cognitive psychology, cognitive interviewing is a systematic approach for gathering information about individuals’ cognitive processes when considering and responding to survey questions (Beatty & Willis, 2007; Miller et al., 2014). Providing a “window” into a “usually hidden process” (Collins, 2015, p. 13), in-depth, semistructured interviews enable researchers to identify potential sources of error and assess the validity of survey items (Beatty & Willis, 2007; Castillo-Díaz & Padilla, 2013).
Cognitive interviews often use a “think-aloud” protocol, with interviewers prompting respondents to verbalize their thought processes as they consider and respond to each item (Fonteyn et al., 1993). This process serves to gather respondent perspectives and reveal any problems within the measure so that improvements can be made before testing with a sample. The think-aloud method is often used in conjunction with interviewer probes, another major technique of cognitive interviewing; these probes allow interviewers to clarify responses, further explore issues as they emerge, and gather respondents’ overall thoughts about the instrument (Beatty & Willis, 2007; Collins, 2015; Drennan, 2003). Given the challenges many respondents face in verbalizing their thought processes, concurrent and follow-up probes enable interviewers to capture a fuller picture of respondents’ thoughts (Collins, 2015).
After the final AOTA task force submitted their proposed items, it was critical for the research team to first investigate how the end users (fieldwork educators [FWEs] and academic fieldwork coordinators [AFWCs]) would understand and interpret how these items should be scored before implementing a larger scale validation study. Our aim was to detect potential problems and to refine items early in the process so that the instruments would have stronger baseline content validity. This approach can promote optimal outcomes of subsequent validity and reliability testing procedures. More specifically, if potential problems in item formulations and rating scales can be detected and refined early on, baseline content validity and optimal outcomes of future validity and reliability testing procedures can be realized (Castillo-Díaz & Padilla, 2013).
The purpose of this study, therefore, was to explore FWEs’ and AFWCs’ perceptions of the items and rating scale to further inform the refinement and alignment of items in the revised Fieldwork Performance Evaluation for the Occupational Therapy Student (AOTA, 2020b) and Fieldwork Performance Evaluation for the Occupational Therapy Assistant Student (AOTA, 2020a).
The epistemological–ontological approach for the project was based on postpositivism and critical realism (Cruickshank, 2012). We believe that an objective system can be developed to systematically evaluate and measure students’ fieldwork performance. We argue that both qualitative and quantitative methods are valid approaches within this process.
Preparation of Forms and Ethics Approval
After receiving the proposed items from the final AOTA task force, the first step for our research team was to present the items in a format that would be familiar to and realistic for respondents. We created two separate forms, one for the OTS and the other for the OTAS, using a format similar to the 2002 versions of the FWPE. The research team also considered recommendations from the task force as well as best practices in rating scale design and subsequently created two new test versions of the FWPE with a new rating scale. Before finalizing the forms, we consulted an author of the third edition of the Occupational Therapy Practice Framework: Domain and Process (OTPF–3; AOTA, 2014) to review the revised items for accuracy of terminology, and we also consulted an AFWC for OTASs to ensure that the items comprehensively represented the expectations for these students. After completion of these steps, the team sought and received ethics approval from the University of Illinois Chicago’s institutional review board to conduct the first phase of the validation study.
The 2002 FWPE rating scale was a mixture of norm-referenced and criterion-referenced ratings. An example of a problem with this rating scale was the use of the designation “top 5%” for “exceeds standards.” This term was originally intended to designate only the very highest level of performance among students. AFWCs, however, have consistently reported that the category has been overused, with more than 5% of students rated as “exceeds standards.”
Our team was charged with redesigning the rating scale on the basis of recommendations from the overall task force. After reviewing scales of other existing fieldwork evaluations (e.g., Student Practice Evaluation Form–Revised; Turpin et al., 2008), we created a rating scale that included only criterion-referenced items; it was this new scale that was used during the cognitive interview process. The new rating scale, which was further evaluated during the quantitative validation process, includes the categories exemplary, proficient, emerging, and unsatisfactory (see Table 1), and clarifications were made to the definitions of each category (see Table 2).
Recruitment and Participants
To gather a wide range of perspectives and identify as many problems as possible, it is important to interview a diverse sample of respondents (Miller et al., 2014). Although much has been investigated and published on the topic of cognitive interview sampling, there remains a lack of consensus about appropriate sample size and selection (Beatty & Willis, 2007). In general, such studies have used relatively small, purposive samples of respondents who were specifically selected to achieve study objectives (Miller et al., 2014).
In the current study, cognitive interviews were conducted with a purposeful sample of FWEs and AFWCs to identify potentially problematic FWPE items so that those items could be improved before piloting the new versions of the tools. We used a purposeful sampling method (Patton, 2001), specifically maximum variation sampling, to recruit a diverse range of FWEs, including occupational therapists and occupational therapy assistants, practitioners with varying levels of experience as FWEs, and practitioners from a variety of settings and geographic locations. The FWEs worked in a variety of practice settings, including, but not limited to, rehabilitation, community mental health, early intervention, outpatient pediatrics, and school systems. Their ages ranged from the 30s to the 60s, and they had a wide range of clinical and FWE experience. All of the FWEs were female, and a variety of regions in the United States were represented, with a higher concentration within the geographic area of the university.
We also recruited AFWCs from both occupational therapy and occupational therapy assistant programs, using expert sampling to ensure high levels of experience and expertise. To gather new perspectives, we specifically sought AFWCs who had not been part of the overall task force charged with updating the FWPE items. Our sample included six FWEs who were occupational therapists, four FWEs who were occupational therapy assistants, four AFWCs from occupational therapy programs, and four AFWCs from occupational therapy assistant programs. Ten participants reviewed the OTS items, and 8 reviewed the OTAS items. All participants were invited by email and provided consent before being interviewed.
Data Collection: Cognitive Interviews
Cognitive interviews were used to gain an understanding of the participants’ perspectives on the revised FWPE items, identify potential concerns, and gather suggestions to improve the items. For example, if several participants experienced concerns with a specific item formulation or expressed that a certain rating criterion was unclear, such responses prompted the team to reconsider and refine the statement formulations to improve clarity and potentially lead to stronger agreement or consensus in the future across FWEs scoring this item or using the criteria. This process served to support the validity and reliability of the tools when used in practice.
The cognitive interviews were conducted in person, via videoconferencing (Skype; Microsoft Corporation, Redmond, WA), or by phone depending on each participant’s preference. Interviewers used an interview guide to assist in ensuring consistency across the interviews. At the start of each interview, the interviewer introduced the participant to the purpose of the study and the interview process. The interviewer then provided the revised FWPE to the participant (OTS or OTAS version, depending on the participant). Participants who were FWEs were asked to read each item of the revised FWPE with a particular student in mind.
The participants were asked to “think aloud” so that the interviewer could gain insight into the participants’ reactions and thoughts as they considered each item. The interviewer took concurrent notes not only to capture each participant’s verbalizations but also to document observations of behavior such as hesitations or lengthy pauses. Using the interview guide, the interviewer then asked questions (i.e., probes) based on the participant’s responses—for example, questions to probe the participant’s interpretation of terms or item content. The FWEs were also asked follow-up questions (e.g., “Are there any items that are not relevant or applicable to your setting?” and “How could this form be improved?”) to gather additional input and overall impressions. In addition to FWEs, the team interviewed AFWCs, who reviewed the items and provided feedback based on their experience and expertise.
The interviews were audio recorded with the participant’s permission when feasible (i.e., when the participant agreed to be recorded and recording equipment was available). The recordings were used only as backup sources for the notes if they were not clear or understandable. Therefore, the recordings were not transcribed.
The cognitive interviews were performed by the first, third, fourth and fifth authors, all of whom were either faculty or research assistants. In some cases, the interviewer was familiar with the participant, but in most cases the interview was their first interaction. The level of experience with cognitive interviewing varied within the team, so before initiation of data collection, the team was trained in this approach, discussed and aligned the interview probing procedures, and discussed how to deal with unforeseen circumstances during the interviews.
The research team met several times during the data collection process to discuss how the interview process was going, what information was being obtained, and whether new information was continuing to be collected. The team also reflected on the interview processes, learned from each other’s experiences, and further aligned the data collection procedures to improve consistency among interviewers. Additional interviews were conducted until data saturation, which was reached after 18 interviews (14 of which were recorded).
The data obtained from the interviewers’ notes on the cognitive interviews were compiled into two summary documents, one for the FWPE for the OTS and one for the FWPE for the OTAS, to provide the research team with an effective and efficient means of analyzing the data. A constant comparison approach was used to identify patterns in participant responses. The analysis was initially based on a deductive process by sorting the comments in relation to specific items to reveal systematic patterns in perceptions from the participants. The team held consensus meetings and used a group decision-making process to come to agreement about common themes and findings from the data.
On the basis of findings from the cognitive interviews, the FWPE item content, format, and proposed rating scale were refined. The research team organized the findings into seven categories based on common themes, each of which is described in this section, and included examples formulated from the participants’ responses. Because of the twofold exploratory nature of the study, this section presents the findings with a summary of participants’ perceptions and, when appropriate, examples of refinements or adjustments made in item formulations.
Relevance to a Variety of Practice Settings
Some participants reported that specific items or examples were not relevant to their practice settings, making it challenging to respond to the items in relationship to a student’s performance. If the practice setting examples given were not relevant to the participants, it became unclear to them how to score the specific item in question.
Several items were revised in response to this feedback. One major way in which the feedback was incorporated was by creating additional examples for each of the items. For example, an occupational therapist who worked in a community-based setting suggested the addition of “community safety” as an example for one of the safety items. Table 3 shows the evolution of this item.
Overlapping and Redundant Items
Several participants reported in their descriptions that certain items overlapped, which could cause confusion as to where to rate specific student behaviors and could also give exaggerated weight to specific behaviors. Having overlapping item descriptions could also result in a higher degree of redundancy and a longer time to administer.
On the basis of this feedback, the team reviewed each revised item for redundancy and minimized overlapping concepts for both the OTS and the OTAS versions. For example, OTS Item 13, “Administers standardized/nonstandardized assessments and surveys accurately and efficiently to ensure findings are valid and reliable. Adheres to standardized assessment guidelines, when indicated, to achieve accurate results,” and OTS Item 15, “Adheres to standardized testing guidelines, when indicated, to achieve accurate results,” were highly similar. After reviewing these two items, the team decided to eliminate Item 15 because the concepts were encompassed in Item 13.
Long Item Statements
Participants reported that some items were excessively long, making these items confusing to rate. In some cases, the item itself included embedded examples, whereas other items had examples in a separate examples section. The item statements with examples led to confusion about which specific aspects to score within the item and which examples to focus on among those provided.
In response, the team shortened items by eliminating extraneous phrases and ensured that a similar structure was used across items, with examples following each item statement to facilitate easier rating. In the updated format, each performance item is stated more concisely, in boldface type, followed by relevant examples, if applicable. One example of an overly long item was OTS Item 31: “Produces clear and accurate documentation according to site requirements, which includes, but is not limited to, legibility (when indicated), adherence to electronic health documentation requirements (when indicated), accurate spelling, punctuation, and grammar. Limits documentation content to relevant information only, eliminating extraneous details.” After reviewing this item, the team decided to shorten it as follows: “Produces clear and accurate documentation. Examples: legibility, spelling, punctuation, grammar, adherence to electronic health documentation requirements.” An additional example is provided in Table 3.
Double- and Triple-Barreled Item Statements
In relation to the long statements described and addressed in the preceding section, some items also included multiple behaviors, which could result in unclear judgments of which aspects a student would need to meet and could therefore jeopardize fair, specific, and accurate evaluation.
To resolve this concern, the team minimized multiple-barreled items whenever possible. If the behaviors were deemed to be very closely related, the item remained the same. An example is OTS Item 23: “Modifies task approach, occupations, and the environment to maximize client performance.” After reviewing this item, the team modified it to “Modifies task and/or environment to maximize client performance.”
Alignment Between OTS Items and OTAS Items
Although similar professional behaviors are important for all occupational therapy practitioners, the 2002 versions of the FWPEs contained several items that were different for OTS and OTAS. The 2015 task force aligned many of the professional behavior items when they made their revisions. This issue was not always addressed by the cognitive interview participants, because they typically reflected on only one of the two tools. The participants who did reflect on both tools in their cognitive interviews, however, highlighted this concern.
It was beneficial for our team to review the OTS and OTAS items simultaneously. This concurrent review process revealed additional opportunities to align the professional behavior items for OTS and OTAS. To specifically ensure that the items comprehensively represented the expectations for OTASs, an occupational therapy assistant AFWC reviewed and agreed with the OTAS items after our team’s revisions.
Further Alignment With AOTA and ACOTE Documents and Contemporary Practice
The 2015 task force that finalized the FWPE items used key AOTA documents to justify their revisions. The cognitive interview participants in this study identified additional opportunities to link items to the ACOTE standards and the OTPF–3. The participants viewed alignment of the FWPE items and content with ACOTE standards as a strength.
In response to the suggestion to further align items with AOTA and ACOTE documents, the team reviewed the ACOTE standards and OTPF–3 and implemented some of the revisions suggested by the participants. As examples of this process, we added the OTPF–3 definitions of occupational profile and occupational performance, as well as client factors and contexts, to clarify the intended meanings of these terms (AOTA, 2014). We also requested that an author of the OTPF–3 review the revised items for accuracy of terminology. On the basis of her feedback, the team incorporated the term targeted outcomes in two items to be consistent with OTPF–3 language.
Wording and Features of the Proposed Rating Scale
The cognitive interviews yielded overall positive feedback about the proposed rating scale, especially in relation to the top performance category. The participants reported that the examples and terms also helped them understand the rating criteria, and they also provided valuable feedback about the proposed rating scale and the specific terms used to represent the scale steps. The scale was further revised on the basis of their suggestions (the final version appears in Table 2).
The findings from this study and the research design revealed several issues about the tools under investigation that we believe would not have been detected using a solely quantitative approach. On the basis of the results of this study and the cognitive interviews, we revised and validated the FWPEs for the OTS ( Appendix A) and the OTAS ( Appendix B) to make them potentially more directly applicable to a larger variety of practice settings and better aligned with current AOTA and ACOTE documents and contemporary practice. The findings and changes made to the tools will likely reveal clearer outcomes in evaluating aspects of construct validity, because the risks of misinterpretation and limited understanding of the items and rating scale have been minimized. The new, criterion-based rating scale was also initially validated through this process.
The cognitive interview process expanded the 2015 task force's proposed items by using a wider audience of critical stakeholders to make the tools more relevant to their needs. In addition, the cognitive interviews resulted in improved alignment between the items in the two FWPE versions, resulting in clearer expectations for behaviors that should be the same for OTS and OTAS (e.g., professional behaviors, communication; ACOTE, 2012, 2018).
A remaining challenge with the revised FWPEs is that some items are still relatively long, and some include more than one behavior. An alternative approach to the double- or triple-barreled items would be to divide them into two or three items addressing distinct features. Such an approach, however, would dramatically increase the number of items in the assessment and would artificially remove the complex interactions among the various features included in the competencies and skills addressed. The research team therefore decided not to completely change the complex item formulations from the task force, with the awareness that this complexity may increase the risk of such items being interpreted and scored differently by different raters because the formulations may support diverse interpretations of the items’ meaning. The next step of the validation process of the FWPEs will reveal preliminary evidence of whether such problems exist within these items. Therefore, any suggestions to reformulate the more complex items should await such findings.
The use of purposeful sampling and the iterative process of data gathering throughout the cognitive interview process were time consuming but worthwhile, because these procedures afforded the inclusion of a variety of perspectives of different stakeholders. Many of the stakeholders suggested changes to item content and format and provided feedback on the rating scale that would not have been gathered or indicated using traditional quantitative test development approaches. By spending time and effort on this step of the process, we potentially avoided a situation in which problems were identified only after quantitative validation of the tool. This process also helped identify crucial changes that were needed to yield more sound and better aligned items and rating scale categories, indicating stronger evidence of validity based on test content. All steps in this process were important for a more optimal quantitative analysis of the FWPE, which is planned as the next step of the recommended test development process (Wilson, 2005).
Although we aimed for wide diversity among FWEs and AFWCs, an even greater diversity, including in relation to geographic location, may have further supported the generalizability of the findings. In addition, although cognitive interviewing is viewed as a qualitative method, the analytical structure and process in our study could be viewed as more deductive than inductive, given that the departure point for analysis was the content of the items included in the FWPE. Still, the findings indicated several areas of the tools in need of revision that might otherwise have been overlooked or neglected in a validation process using solely traditional quantitative methods.
Implications for Occupational Therapy Research and Education
The findings of this study have the following implications for occupational therapy research and education:
Any instrument development process should include methods such as cognitive interviews that incorporate the viewpoints of potential users—in this case, stakeholders such as FWEs and AFWCs—who serve to identify and address concerns with the proposed tools before further quantitative validation processes are undertaken.
We recommend aligning occupational therapy educational tools with ACOTE standards, because these standards represent critical knowledge and skills for occupational therapy practice. This alignment also promotes consistency in competence across all OTS and OTAS in the United States. Although this study was initiated before the most recent ACOTE (2018) standards were published, we strongly argue for ongoing alignment between the FWPEs and both current and future standards on a regular basis.
Finally, we suggest that the process used in this study be used in other countries that do not yet have standardized and validated fieldwork evaluation tools.
This study explored evidence of validity in relation to test content of the revised AOTA FWPEs for the OTS and OTAS. The process of conducting and analyzing cognitive interviews was critical to refining the FWPE items before further quantitative validity analysis. The findings allowed the team to make user-informed refinements that resulted in improved and better aligned FWPE tools. The process also resulted in clearer item formulations from a content validity perspective to facilitate further validation processes.
The research for this article was supported by the American Occupational Therapy Association. All work was conducted at the University of Illinois Chicago.