Reporting guidelines, such as the Consolidated Standards of Reporting Trials (CONSORT) Statement, improve the reporting of research in the medical literature (Turner et al., 2012). Many such guidelines exist, and the CONSORT Extension to Nonpharmacological Trials (Boutron et al., 2008) provides suitable guidance for reporting between-groups intervention studies in the behavioral sciences. The CONSORT Extension for N-of-1 Trials (CENT 2015) was developed for multiple crossover trials with single individuals in the medical sciences (Shamseer et al., 2015; Vohra et al., 2015), but there is no reporting guideline in the CONSORT tradition for single-case research used in the behavioral sciences. We developed the Single-Case Reporting guideline In Behavioral interventions (SCRIBE) 2016 to meet this need. This Statement article describes the methodology of the development of the SCRIBE 2016, along with the outcome of 2 Delphi surveys and a consensus meeting of experts. We present the resulting 26-item SCRIBE 2016 checklist. The article complements the more detailed SCRIBE 2016 Explanation and Elaboration article (Tate et al., 2016) that provides a rationale for each of the items and examples of adequate reporting from the literature. Both these resources will assist authors to prepare reports of single-case research with clarity, completeness, accuracy, and transparency. They will also provide journal reviewers and editors with a practical checklist against which such reports may be critically evaluated.
University courses generally prepare students of the behavioral sciences very well for research using parallel, between-groups designs. By contrast, single-case methodology is “rarely taught in undergraduate, graduate and postdoctoral training” (Kazdin, 2011, p. vii). Consequently, there is a risk that researchers conducting and publishing studies using single-case experimental designs (and journal reviewers of such studies) are not necessarily knowledgeable about single-case methodology nor well trained in using such designs in applied settings. This circumstance, in turn, impacts the conduct and report of single-case research. Even though single-case experimental intervention research has comparable frequency to between-groups research in the aphasiology, education, psychology, and neurorehabilitation literature (Beeson & Robey, 2006; Perdices & Tate, 2009; Shadish & Sullivan, 2011), evidence of inadequate and incomplete reporting is documented in multiple surveys of this literature in different populations (Barker, Mellalieu, McCarthy, Jones, & Moran, 2013; Didden, Korzilius, van Oorsouw, & Sturmey, 2006; Maggin, Chafouleas, Goddard, & Johnson, 2011; Smith, 2012; Tate, Perdices, McDonald, Togher, & Rosenkoetter, 2014).
To address these issues, we developed a reporting guideline, entitled the Single-Case Reporting guideline In BEhavioral interventions (SCRIBE) 2016, to assist authors, journal reviewers, and editors to improve the reporting of single-case research. This Statement provides the methodology and development of the SCRIBE 2016. The companion SCRIBE 2016 Explanation and Elaboration (E&E) article (Tate et al., 2016) provides detailed background to and rationale for each of the 26 items in the SCRIBE checklist, along with examples of adequate reporting in the published literature.
The SCRIBE 2016 Statement is intended for use with the family of single-case experimental designs1 used in the behavioral sciences. It applies to four prototypical designs (withdrawal/reversal, multiple baseline, alternating-treatments, and changing-criterion designs), including combinations and variants of these designs, as well as adaptive designs. Figure 1 presents the common designs using a single case based on surveys in the literature (see, e.g., Perdices & Tate, 2009; Shadish & Sullivan, 2011). (Color versions of this and all other figures are provided as supplemental materials at http://otjournal.net; navigate to this article, and click on “Supplemental.”)
The figure mainly draws on the behavioral sciences literature, which includes a broad range of designs using a single participant. Only those designs above the solid horizontal line use single-case methodology (i.e., an intervention is systematically manipulated across multiple phases during each of which the dependent variable is measured repeatedly and, ideally, frequently). None of the designs below the solid horizontal line meets these criteria, and they are not considered single-case experiments: The B-phase training study comprises only a single (intervention) phase; the so-called “pre–post” study does not take repeated measurements during the intervention phase; and the case description is a report, usually compiled retrospectively, that is purely descriptive without systematic manipulation of an intervention.
The A-B design, also labeled “phase change without reversal” (Shadish & Sullivan, 2011), is widely regarded as the basic single-case design. It differs from the “pre–post” study in that measurement of the dependent variable occurs during the intervention (B) phase. In Figure 1, we place the A-B design in an intermediate position between the nonexperimental single-case designs (below the solid horizontal line) and the four experimental designs above the dotted horizontal line because it has weak internal validity, there being no control for history or maturation, among other variables. As a result, it is regarded as a quasiexperimental design (Barlow, Nock, & Hersen, 2009).
Designs above the dotted horizontal line are experimental in that the control of threats to internal validity is stronger than in the A-B design. Nonetheless, within each class of design the adequacy of such controls and whether the degree of experimental control meets design standards (see Horner et al., 2005; Kratochwill et al., 2013) vary considerably (cf. A-B-A versus A-B-A-B; multiple-baseline designs with two versus three baselines/tiers). Consequently, reports of these designs in the literature have variable scientific quality and features of internal and external validity can be evaluated with scales measuring scientific robustness in single-case designs, such as described in Maggin, Briesch, Chafouleas, Ferguson, and Clark (2014) and Tate et al. (2013b).
The structure of the four prototypical experimental designs in Figure 1 differs significantly: The withdrawal/reversal design systematically applies and withdraws an intervention in a sequential manner, the multiple-baseline design systematically applies an intervention in a sequential manner that also has a staggered introduction across a particular parameter (e.g., participants, behaviors), the alternating/simultaneous-treatments design compares multiple interventions in a concurrent manner by rapidly alternating the application of the interventions, and the changing-criterion design establishes a number of hierarchically based criterion levels that are implemented in a sequential manner. Each of the single-case experimental designs has the capacity to introduce randomization into the design (cf. the small gray rectangle within each of the designs in Figure 1), although in practice randomization in single-case research is not common.
The medical N-of-1 trial is depicted within the withdrawal/reversal paradigm of Figure 1. The analogous reporting guide for the medical sciences, CONSORT Extension for N-of-1 Trials (CENT; Shamseer et al., 2015; Vohra et al., 2015), is available for the reporting of medical N-of-1 trials. These trials consist of multiple crossovers (described as challenge–withdrawal–challenge–withdrawal in Vohra et al., 2015) in a single participant who serves as his or her own control, often incorporating randomization and blinding.
As with other reporting guidelines in the CONSORT tradition, the SCRIBE 2016 does not make recommendations about how to design, conduct, or analyze data from single-case experiments. Rather, its primary purpose is to provide authors with a checklist of items that a consensus from experts identified as the minimum standard for facilitating comprehensive and transparent reporting. This checklist includes the specific aspects of the methodology to be reported and suggestions about how to report. Consequently, readers are provided with a clear, complete, accurate, and transparent account of the context, plan, implementation, and outcomes of a study. Readers will then be in a position to critically evaluate the adequacy of the study, as well as to replicate and validate the research.
Clinicians and researchers who want guidance on how to design, conduct, and analyze data for single-case experiments should consult any of the many current textbooks and reports (e.g., Barker, McCarthy, Jones, & Moran, 2011; Barlow et al., 2009; Gast & Ledford, 2014; Horner et al., 2005; Kazdin, 2011; Kennedy, 2005; Kratochwill et al., 2013; Kratochwill & Levin, 2014; Morgan & Morgan, 2009; Riley-Tillman & Burns, 2009; Vannest, Davis, & Parker, 2013), as well as recent special issues of journals (e.g., Journal of Behavioral Education in 2012, Remedial and Special Education in 2013, the Journal of School Psychology and Neuropsychological Rehabilitation in 2014, Aphasiology in 2015) and methodological quality recommendations (Horner et al., 2005; Kratochwill et al., 2013; Maggin et al., 2014; Smith, 2012; Tate et al., 2013b).
The impetus to develop the SCRIBE 2016 arose during the course of discussion at the CENT consensus meeting in May 2009 in Alberta, Canada (see Shamseer et al., 2015; Vohra et al., 2015). The CENT initiative was devoted to developing a reporting guideline for a specific design and a specific discipline: N-of-1 trials in the medical sciences. At that meeting the need was identified for development of a separate reporting guideline for the broader family of single-case experimental designs as used in the behavioral sciences (see Figure 1).
A 13-member steering committee for the SCRIBE project was formed comprising a Sydney, Australia, executive (authors RLT, convenor, and SM, MP, and LT, with UR appointed as project manager). An additional three members who had spearheaded the CENT initiative (CENT convenor SV, along with MS and LS) were invited because of their experience and expertise in developing a CONSORT-type reporting guideline in a closely related field (N-of-1 trials). To ensure representation from experts in areas of single-case investigations in clinical psychology, special education, and single-case methodology and data analysis, another five experts were invited to the steering committee (authors DHB, RH, AK, TK, and WS). Of course, other content experts exist who would have been eligible for the steering committee, but a guiding consideration was to keep the number of members to a reasonable size so that the project was manageable. In the early stages of the project, steering committee members were instrumental in item development and refinement for the Delphi survey.
The methodology used to develop the SCRIBE 2016 followed the procedures outlined by Moher, Schulz, Simera, and Altman (2010). At the time of project commencement, the literature on evidence of bias in reporting single-case research was very limited, and it has only recently started to emerge. Members of the steering committee, however, were already knowledgeable about the quality of the existing single-case literature, which had prompted independent work in the United States (specifically in compiling competency standards of design and evidence; Hitchcock et al., 2014; Horner et al., 2005; Kratochwill et al., 2010, 2013) and Australia (in developing an instrument to evaluate the scientific quality of single-case experiments; Tate et al., 2008, 2013b). No reporting guideline, in the CONSORT tradition, emerged from literature review.
Since commencement of the SCRIBE project, a reporting guide for single-case experimental designs was published by Wolery, Dunlap, and Ledford (2011). That guide was not developed following the same series of steps as in previously developed reporting guidelines such as those of the CONSORT family (see Moher et al., 2011) and is not as comprehensive as the CONSORT-type guidelines on which the current project is based, covering about half of the items in the SCRIBE 2016. Nevertheless, the convergence between the recommendations of Wolery and colleagues regarding the need to report on features such as inclusion and exclusion criteria for participants, design rationale, and operational definitions of the target behavior versus the corresponding items presented in the SCRIBE 2016 is noteworthy and adds validity to the SCRIBE 2016. Funding for the SCRIBE project was obtained from the Lifetime Care and Support Authority of New South Wales, Australia. The funds were used to employ the project manager, set up and develop a web-based survey, hold a consensus meeting, and sponsor participants to attend the consensus meeting.
Methodology of the Delphi Process
The Delphi technique is a group decision-making tool and consensus procedure that is well suited to establishing expert consensus on a given set of items (Brewer, 2007). The nature of the process allows for it to be conducted online, and responses can be given anonymously. The Delphi procedure consists of several steps, beginning with the identification, selection, and invitation of a panel of experts in the pertinent field to participate in the consensus process. Subsequently, the items are distributed to experts who rate the importance of each topic contained in the items. As we did for the present project, a Likert scale is often used, ranging from 1 to 10, whereby 1 indicates very low importance and 10 very high importance. All expert feedback is then collated and reported back to the panel, including the mean, standard deviation, and median for each item; a graph indicating the distribution of responses; as well as any comments made by other experts to inform further decision making. When high consensus is achieved, which may take several rounds, the Delphi exercise is completed. von der Gracht (2012) reviews a number of methods to determine consensus for the Delphi procedure. Methods include using the interquartile range (IQR), with consensus operationalized as no more than 2 units on a 10-unit scale.
The SCRIBE Delphi Procedure
A set of potential items was drawn up by the SCRIBE steering committee for the Delphi survey. The items initially came from two sources available at the time: (a) those identified in a systematic review previously conducted by the CENT group (Punja et al., in press), and subsequently refined during the CENT consensus meeting process, and (b) items used to develop the Single-Case Experimental Design Scale published by the Sydney-based members as part of an independent project (Tate et al., 2008). Steering committee members suggested additional items, as well as rephrasing of existing items. We formatted the resulting 44 initial items for distribution in the Delphi exercise using an online survey tool, Survey Monkey.
Two rounds of a Delphi survey were conducted in April and September 2011. Figure 2 provides a flow diagram of the Delphi survey participants. In total, we identified 131 experts worldwide as potential Delphi panel members (128 for the initial round, and an additional 3 participants were added at Round 2) based on their track record of published work in the field of single-case research (either methodologically or empirically based) and/or reporting guideline development.
We used several strategies to identify suitable respondents. The Sydney executive drew up lists of authors who published single-case experimental designs in the behavioral sciences by consulting reference lists of books and journal articles and our PsycBITE database (http://www.psycbite.com). We examined the quality of authors’ work, as described in their reports, using our methodological quality scale (Tate et al., 2008) and invited authors of scientifically sound reports. In addition, we conducted Google searches of editorial board members of journals that were known to publish single-case reports, as well as the authors publishing in such journals, and evaluated the quality of their work. Finally, steering committee members made recommendations of suitable authors. This group of 131 invitees represents a sample of all world experts. We distributed invitations by email for ease of communication and speed of contact. An “opt-in” consent arrangement was used, and thus consent to participate required the invitee’s active response. Of the pool of 128 invitations for Round 1, 54 did not respond to the invitation (we sent one reminder email), 8 did respond but declined (mainly on the grounds of not having sufficient time), and 4 email addresses were undeliverable. The remaining 62 responders who consented to participate in Round 1 were sent the survey link.
In Round 1, 53 of 62 consenting experts responded within the 2-wk time frame of the survey, with 50 providing a complete data set of responses to the original set of 44 items. Results were entered into a database. Importance ratings of the items were uniformly high, with no item receiving a group median rating <7/10. The items thus remained unrevised for Round 2, which was conducted to elicit additional comment on the items. These decision-making criteria are compatible with those used in the development of the CENT 2015, which excluded items with mean importance ratings <5/10 (Vohra et al., 2015).
For Round 2, the survey link was sent to 59 of the original 62 consenting participants to Round 1 (the 3 participants who consented but did not complete Round 1 did not provide reasons for their early discontinuance and were not recontacted) and an additional 3 experts recommended by steering committee members. Graphed results were provided to respondents, along with anonymous comments on the items from the other panel members. A complete data set of responses for Round 2 was collected from 45 participants. Again, the ratings of importance for each item were mostly very high, all items having median importance ratings of at least 8/10, but the range of responses decreased. According to the criteria of von der Gracht (2012), consensus was achieved for 82% of items (36/44) which had IQRs of 2 or less on the 10-point scale. The remaining 8 items had IQRs from 2.25 to 4 and were discussed in detail at the consensus meeting.
As depicted in Figure 2, across the two rounds of the Delphi exercise, 65/131 invited experts consented to participate (62 participants in Round 1 and an additional 3 participants in Round 2). Forty participants provided a complete data set of responses to both Round 1 and Round 2, representing a 62% response rate (40/65). The 40 responders represented 31% of the total of 131 experts invited to participate in the survey.
Sixteen world experts in single-case methodology and reporting guideline development attended a 2-day consensus meeting, along with the Sydney executive and two research staff. Representation included clinical research content experts in clinical and neuropsychology, educational psychology and special education, medicine, occupational therapy, and speech pathology, as well as single-case methodologists and statisticians; journal editors and a medical librarian; and guideline developers. Delegates met in Sydney on December 8 and 9, 2011. Each participant received a folder that contained reference material pertinent to the SCRIBE project and results from both rounds of the Delphi survey. Each of the Delphi items contained a graph of the distribution of scores, the mean and median scores of each round of the survey, and the delegate’s own scores when he or she completed the Delphi surveys.
The meeting commenced with a series of brief presentations from steering committee members on the topics of reporting guideline development, single-case methods and terminology, evolution of the SCRIBE project, and description of the CENT. Results of the Delphi survey were then presented. Delegates had their folder of materials to consult and a PowerPoint presentation projected onto a screen to facilitate discussion. A primary aim of the consensus meeting was to develop the final set of items for the SCRIBE checklist. The final stages of the meeting discussed the documents to be published, authorship, and knowledge dissemination strategy.
During the meeting, the 44 Delphi items were discussed, item by item, over the course of four sessions, each led by two facilitators. The guiding principles for discussion were twofold. First, item content was scrutinized to ensure that (a) it captured the essence of the intended issue under consideration and (b) the scope of the item covered the necessary and sufficient information to be reported. Second, the relevance of the item was examined in terms of its capacity to ensure clarity and accuracy of reporting.
Three delegates at the consensus meeting (authors RLT and SM and a research staff member, DW) took notes about the amalgamation and merging of items where applicable and refinements to wording of items. Final wording of items was typed, live-time, into a computer that projected onto a screen so that delegates could see the changes, engage in further discussion, give approval, and commit to the group decision. In addition, the meeting was audiotaped for the purpose of later transcription to have a record of the discussion of the items and inform the direction and points to describe in the E&E document.
Figure 3 illustrates the discussion process that occurred during the consensus meeting. The figure presents a screen shot of the PowerPoint presentation of one of the items (Item 31 of the Delphi survey, treatment fidelity, which was broadened to encompass procedural fidelity as a result of discussion at the consensus meeting and became Item 17 of the SCRIBE). The figure shows the results of each round of the Delphi survey (the results for Round 1 and Round 2 appear in the figure as the left- and right-sided graphs, respectively), along with discussion points. These points comprised comments made by the Delphi survey participants when completing the online surveys, as well as suggestions prepared by the Sydney executive that emerged from the consolidated comments. The points were used to stimulate discussion among the conference delegates, but discussion was not restricted to the prepared points.
By the end of the meeting, delegates reached consensus on endorsing 26 items that thus constitute the minimum set of reporting items comprising the SCRIBE 2016 checklist. The SCRIBE 2016 checklist consists of six sections in which the 26 aspects of report writing pertinent to single-case methodology are addressed. The first two sections focus on the title/abstract and introduction, each section containing two items. Section 3, method, consists of 14 items addressing various aspects of study methodology and procedure. Items include description of the design (e.g., randomization, blinding, planned replication), participants, setting, ethics approval, measures and materials (including the types of measures, their frequency of measurement, and demonstration of their reliability), interventions, and proposed analyses. The results (Section 4) and discussion (Section 5) sections each contain 3 items. Section 6 (documentation) contains 2 items pertaining to protocol availability and funding for the investigation.
In total, 24 Delphi items were merged into 7 SCRIBE items because they referred to the same topics: (a) SCRIBE Item 5 (design) contained 3 Delphi items (design structure, number of sequences, and decision rules for phase change); (b) Item 8 (randomization), 2 Delphi items (sequence and onset of randomization); (c) Item 11 (participant characteristics), 2 Delphi items (demographics and etiology); (d) Item 13 (approvals), 2 Delphi items (ethics approval and participant consent); (e) Item 14 (measures), 9 Delphi items (operational definitions of the target behavior, who selected it, how it was measured, independent assessor blind to phase, interrater agreement, follow-up measures, measures of generalization and social validity, and methods to enhance quality of measurement); (f) Item 19 (results), 2 Delphi items (sequence completed and early stopping); and (g) Item 20 (raw data), 4 Delphi items (results, raw data record, access to raw data, and stability of baseline). One of the Delphi items relating to meta-analysis was considered not to represent a minimum standard of reporting for single-case experimental designs and accordingly was deleted.
The audio recording of the 2-day consensus meeting was transcribed. The final guideline items were confirmed after close examination of the conference transcript, and the SCRIBE 2016 checklist was developed (Table 1). The meeting report was prepared and distributed to the steering committee members in June 2012. The Sydney executive then began the process of drafting background information sections for each item and integrating these with the broader literature for the E&E article. Multiple versions of the E&E article were distributed over the next 2 yr to the steering committee members for their comment, and subsequent versions incorporated the feedback.
Authors can use the checklist to help with writing a research report, and readers (including journal editors and reviewers) can use the checklist to evaluate whether the report meets the points outlined in the guideline. Users will find the detailed SCRIBE 2016 E&E document (Tate et al., 2016) helpful for providing rationale for the items, with examples of adequate reporting from the literature.
Following publication of this SCRIBE 2016 Statement and the E&E article (Tate et al., 2016), the next stage of activity focuses on further dissemination. Obtaining journal endorsement for the SCRIBE 2016 is a vital task because it has been demonstrated that journals that endorse specific reporting guidelines are associated with better reporting than journals where such endorsement does not exist (Turner et al., 2012). The SCRIBE project is indexed on the EQUATOR network (http://www.equator-network.org/), and a SCRIBE Web site (http://www.sydney.edu.au/medicine/research/scribe) provides information and links to the SCRIBE 2016 publications. SCRIBE users are encouraged to access the Web site and provide feedback on their experiences using the SCRIBE and suggestions for future revisions of the guideline. Future research will evaluate the uptake and impact of the SCRIBE 2016.
We expect that the publication rate of single-case experiments and the research into single-case methodology will expand over the years, given the evidence of such a trend (e.g., Hammond & Gast, 2010) and also considering the recent interest shown in journal publication of special issues dedicated to single-case design research referred to earlier in this article. As is common for guidelines, the SCRIBE 2016 will likely require updates and revisions to remain current and aligned with the best evidence available on methodological standards. We developed the SCRIBE 2016 to provide authors, journal reviewers, and editors with a recommended minimum set of items that should be addressed in reports describing single-case research. Adherence to the SCRIBE 2016 should improve the clarity, completeness, transparency, and accuracy of reporting single-case research in the behavioral sciences. In turn, this will facilitate (a) replication, which is of critical importance for establishing generality; (b) the coding of different aspects of the studies as potential moderators in meta-analysis; and (c) evaluation of the scientific quality of the research. All of these factors are relevant to the development of evidence-based practices.
The SCRIBE Group wishes to pay special tribute to our esteemed colleague Professor William Shadish (1949–2016), who died on the eve of publication of this article. His contribution at all stages of the SCRIBE project was seminal.
Funding for the SCRIBE project was provided by the Lifetime Care and Support Authority of New South Wales, Australia. The funding body was not involved in the conduct, interpretation, or writing of this work. We acknowledge the contribution of the responders to the Delphi surveys, as well as administrative assistance provided by Kali Godbee and Donna Wakim at the SCRIBE consensus meeting. Lyndsey Nickels was funded by an Australian Research Council Future Fellowship (FT120100102) and Australian Research Council Centre of Excellence in Cognition and Its Disorders (CE110001021). For further discussion on this topic, please visit the Archives of Scientific Psychology online public forum at http://arcblog.apa.org.
To encourage dissemination of the SCRIBE Statement, this article is freely accessible through Archives of Scientific Psychology and will also be published in the American Journal of Occupational Therapy, Aphasiology, Canadian Journal of Occupational Therapy, Evidence-Based Communication Assessment and Intervention, Journal of Clinical Epidemiology, Journal of School Psychology, Neuropsychological Rehabilitation, Physical Therapy, and Remedial and Special Education. The authors jointly hold the copyright for this article.
This work is licensed under a Creative Commons Attribution 3.0 Unported License (http://creativecommons.org/licenses/by/3.0/), which allows anyone to download, reuse, reprint, modify, distribute, and/or copy this content, so long as the original authors and source are cited and the article’s integrity is maintained. Copyright for this article is retained by the authors. Authors grant the American Psychological Association a license to publish the article and identify itself as the original publisher. No permission is required from the authors or the publisher.
This article was originally published in Archives of Scientific Psychology, 4, 1–9, http://dx.doi.org/10.1037/arc0000026. Supplemental materials are available at http://dx.doi.org/10.1037/arc0000026.supp.
1Single-case methodology is defined as the intensive and prospective study of the individual in which (a) the intervention is manipulated in an experimentally controlled manner across a series of discrete phases and (b) measurement of the behavior targeted by the intervention is made repeatedly (and, ideally, frequently) throughout all phases. Professional guidelines call for the experimental effect to be demonstrated on at least three occasions by systematically manipulating the independent variable (Horner et al., 2005; Kratochwill et al., 2010, 2013). This criterion helps control for the confounding effect of extraneous variables that may adversely affect internal validity (e.g., history, maturation) and allows a functional cause-and-effect relationship to be established between the independent and dependent variables.