Importance: Although occupational balance (OB) is a construct of importance to occupational therapy, existing OB assessments have not been validated in clinical populations.
Objective: To examine the validity and reliability of the 11-item version of the Occupational Balance Questionnaire (OBQ11) in U.S. adults with Type 1 diabetes.
Design: Data were analyzed from adults with Type 1 diabetes enrolled in a larger longitudinal study examining the relationships among blood glucose, emotion, and functioning. Dimensionality of the OBQ11 was assessed with item response theory (IRT); convergent validity was tested by examining whether associations between the OBQ11 and other constructs were consistent with a priori hypotheses.
Setting: Three outpatient clinical sites in the United States.
Participants: Data from 208 U.S. adults with Type 1 diabetes were included in the analyses (42% Latino, 29% White, 14% African American, 7% multiethnic, and 8% other).
Outcomes and Measures: Assessments administered include the OBQ11, Patient Health Questionnaire (depression), and Diabetes Self-Management Questionnaire.
Results: Overall, results from IRT models and correlational tests supported the reliability and validity of the OBQ11. For instance, higher scores on the OBQ11 were significantly associated with better self-ratings of diabetes management behaviors (r = .28, p < .001), lower depression symptoms (r = −.53, p < .001), and greater positive affect (r = .32, p < .001). A single-factor generalized partial credit model fit the OBQ11 acceptably well, supporting its unidimensionality.
Conclusions and Relevance: The OBQ11 may be a reliable and valid measure of OB appropriate for use in clinical populations such as adults with diabetes.
What This Article Adds: OB is not often formally assessed by occupational therapists in the United States, even though the contributions of OB to health and well-being are core components of the philosophy of occupational therapy. The current evidence supports the validity of the OBQ11 in a clinical population of adults with Type 1 diabetes and demonstrates significant associations between OB and health management behaviors. Study results may encourage greater consideration and assessment of OB in occupational therapy clinical practice in the United States.
Shifting treatment from an ailment focus to a health promotion focus has often been advocated as a means to address the substantial cost of chronic conditions, both fiscally and in terms of people’s quality of life (Levine et al., 2019). As of 2018, the economic costs of chronic conditions (including loss of labor productivity) in the United States were estimated to be $3.7 trillion, nearly one-fifth of the U.S. gross domestic product (Waters & Graf, 2018). Among people with chronic conditions, decreased quality of life can be attributed to several pathways, including greater incidence of mental health issues (Lotfaliany et al., 2018) and functional limitations (Dunlop et al., 2002). People with multiple chronic conditions often experience even greater decrements in quality of life (Hajat & Stein, 2018).
Occupational therapy may be particularly suited to health promotion in people with chronic conditions, given the centrality of habits and routines to their scope of practice (Epley et al., 2021), and efforts have been made to capitalize on this potential. Occupational therapy practice guidelines have been made for working with adults with chronic conditions (Fields & Smallfield, 2022), occupational performance coaching has been advocated as an effective means to promote health (Alcorn & Broome, 2014), and occupational therapy–based self-management interventions have been tested in the primary care setting (Garvey et al., 2015; Pyatak et al., 2019). Prior research has provided evidence supporting the efficacy of Lifestyle Redesign®, an occupational therapy intervention framework emphasizing client autonomy, narrative reasoning, and establishing health-promoting habits and routines (Pyatak et al., 2022) in a variety of populations with chronic conditions, including people with diabetes (Pyatak et al., 2018) and those with chronic pain (Uyeshiro Simon & Collins, 2017).
Occupational balance (OB) is a popular concept among occupational therapists and has been found to be associated with health promotion in several populations with chronic conditions, including people with inflammatory arthritis (To-Miles et al., 2022), fibromyalgia (Ortiz-Rubio et al., 2022), and acquired brain injury (Nyman et al., 2021). OB has several definitions, but in this article, we used the conceptualization presented in the 11-item version of the Occupational Balance Questionnaire (OBQ11; Håkansson et al., 2020): the extent to which people perceive themselves as experiencing the right amount of and variation in occupations (Wagman et al., 2012). Other definitions of OB include “having high levels of meaningful occupation with concomitantly low levels of a perceived need for meaning in occupation” (Eakman, 2015, p. 3) and “a satisfying pattern of daily activity that is healthful, meaningful, and sustainable to an individual within the context of his or her current life circumstances” (Matuska & Christiansen, 2008, p. 11). Some occupational therapy interventions addressing health promotion through OB have been conducted, with preliminary evidence supporting their efficacy (Bazyk & Bazyk, 2009; Edgelow & Krupa, 2011; Eklund et al., 2017; Erlandsson, 2013; Gunnarsson et al., 2022; Olsson et al., 2020).
An assessment of OB may be a useful tool for occupational therapists in conducting health promotion interventions for populations with chronic conditions, because of the potential relevance of OB to health management behaviors. In chronic disease management, self-management behaviors that are considered particularly vital, such as checking blood glucose levels for diabetes (Banerjee et al., 2020) and exercise for obesity (Semlitsch et al., 2019), are often the primary targets of intervention. However, lifestyle factors can have a pronounced impact on the practice of these self-management behaviors and may also be vital to address (Pyatak et al., 2018). For instance, a client with diabetes can have low OB because of excessive work demands, which, in turn, results in the client’s neglecting to take insulin, particularly during busier times of the day (Hansen et al., 2018). Thus, addressing the underlying challenge of excessive work demands may facilitate better health management and provide ancillary benefits related to the improvement in OB.
For an OB assessment to be useful in interventions or research for clients with chronic conditions, it requires evidence supporting its validity and reliability with this population. Without such evidence, researchers and clinicians are likely to have less confidence that the OB measure yields trustworthy results (Jerosch-Herold, 2005). In a general Swedish population, researchers found evidence that a unidimensional Rasch model fit the OBQ11 well, suggesting that the items loaded on to a single underlying OB factor (Håkansson et al., 2020). Furthermore, the OBQ11 was found to have good reliability and to exhibit measurement invariance across age and gender (Håkansson et al., 2020). With a convenience sample of adults at a Turkish university, a single-factor model was again found to fit the OBQ11 well, and internal consistency was found to be acceptable (Günal et al., 2020). Validation work on the OBQ11 has also been performed on its Spanish (Peral-Gómez et al., 2021), Arabic (Dhas et al., 2022), and Norwegian (Uhrmann et al., 2019) versions. The validity and reliability of the OBQ11 have not yet been tested in a U.S. sample or in people with chronic conditions.
The purpose of this paper was to investigate the validity and reliability of the OBQ11 in a U.S. sample of adults with Type 1 diabetes (T1D). Because the OBQ11 is intended to assess OB as a single construct, one aspect of validity was addressed by examining whether the scale was unidimensional. We used item response theory (IRT) models for this purpose. Reliability was assessed by calculating Cronbach’s α (internal consistency; Taber, 2018) and by examining the amount of information captured across different levels of OB. Differential item functioning (DIF) analyses were used to examine whether the OBQ11 items performed differently by age or gender. Convergent validity of the OBQ11 was investigated by examining its associations with theoretically related constructs of emotional well-being (depression, anxiety, affect, mental health, perceived stress, and life satisfaction) and health management (diabetes self-management, illness intrusiveness, and blood glucose). A higher OBQ11 score was expected to be associated with greater emotional well-being (Wagman & Håkansson, 2014; Wagman et al., 2020; Yu et al., 2018) and better health management. Table 1 lists the hypothesized relationships between the OBQ11 and constructs that served as the basis for convergent validity testing.
Participants were recruited from three outpatient clinical sites in the greater Los Angeles and New York City metropolitan areas (Pyatak et al., 2021). Clinic-provided patient lists and provider referrals were used to conduct recruitment through mailings, calls, and e-mail. Participants were required to meet the following criteria: They had to be at least 18 yr old; have written or oral proficiency in English or Spanish; have a T1D diagnosis for more than a year; have no significant disruption to their routine during the 2-wk data collection period; and demonstrate sufficient visual, fine motor, and cognitive capacity to use a smartphone to complete daily surveys.
We performed an analysis of the data collected in a larger 2-wk longitudinal study that focused on investigating the relationships among momentary blood glucose, functioning, and emotion in adults with T1D (Pyatak et al., 2021). During the 2-wk period, participants were asked to wear a continuous glucose monitor and complete five to six surveys per day on smartphones that asked questions relevant to functioning, emotional state, and diabetes management. The OBQ11 was completed as part of the online baseline assessment battery. Other measures used in this study were also completed at baseline or as part of the follow-up assessment battery. Typically, all study procedures were completed remotely, although after the easing of coronavirus disease 2019 (COVID-19) pandemic restrictions, participants were given the option to complete some study steps in person. Data collection began in July 2020, during which social distancing requirements were being enforced in the United States, and continued until March 2022, by which time those requirements were eased. Informed consent was provided before participation in any study procedures, which were approved by the University of Southern California Institutional Review Board.
OB was measured with the OBQ11 (Håkansson et al., 2020) at baseline. We chose the OBQ11 because of its emphasis on the subjective experience of OB, which allowed more leeway for participants to use their personal criterion for OB determination. Responses were made on a 4-point Likert scale ranging from 0 (strongly disagree) to 3 (strongly agree). Overall OB was calculated as the sum of all the item scores (possible range = 0–33), with a higher score indicative of greater satisfaction with one’s amount and variation of occupations. We anticipated that many participants would require study materials in Spanish, so a certified translation service was used to translate the OBQ11 to Latin American Spanish.
The other measures used in the study broadly assessed the categories of emotional well-being and health management (Table 1). These were chosen because of their theorized relationships with OB and prior evidence supporting their validities (S. Cohen et al., 1983; Devins, 2010; Diener et al., 1985; Kroenke et al., 2009; McGuire et al., 2010; Schmitt et al., 2013; Scott et al., 2018; Spitzer et al., 2006; Ware, 2000; Weinger et al., 2005). As shown in Table 1, some questionnaires were completed at baseline (i.e., depression, anxiety, diabetes self-management, and diabetes self-care), and others were completed at follow-up (i.e., general mental health, stress, life satisfaction, affect, diabetes distress, and illness intrusiveness). Blood glucose was continuously measured over the 2-wk period between baseline and follow-up assessments.
Dimensionality of the OBQ11
The OBQ11 theoretically assesses only the single construct of OB, so one validity test was whether it was, indeed, sufficiently unidimensional. The dimensionality of the OBQ11 was assessed by examining the fits of two IRT models: the partial credit (or polytomous Rasch) model and the generalized partial credit model (GPCM) (Nguyen et al., 2014). IRT is a test theory developed for binary and ordinal scales. The use of an IRT model acknowledges that some OBQ11 items may more reliably identify patients with high OB, whereas other items may more reliably distinguish those at lower OB ranges. Unlike the Rasch partial credit model, the GPCM further acknowledges that different items in a scale provide different amounts of information. The R package mirt (Chalmers, 2012) was used to test the IRT models, and model fit was assessed by examining model fit indices, item fit, and person–item plots. Consistent with prior research, item fit for the Rasch model was assessed with infit and outfit statistics (Wright et al., 1994), whereas for the GPCM, methods based on χ2 analysis were used (Kang & Chen, 2008).
Reliability of the OBQ11
We examined the reliability of the OBQ11 by using Cronbach’s α and by inspecting the test information function derived from the IRT model. With IRT, the reliability of a measure can vary across the levels of the underlying construct (in contrast to classical test theory, which assumes that reliability is the same across all levels of OB; Nguyen et al., 2014; Thissen, 2000). After combining the information that each OBQ11 item provides, the result is a total (test) information function that indicates the amount of reliability at each level of OB.
Tests of Differential Item Functioning
We emulated the DIF analyses performed in the original OBQ11 validation paper (Håkansson et al., 2020) and investigated whether OBQ11 items were answered differently according to gender or age in ways that were not accounted for by the latent OBQ factor. As in the prior paper (Håkansson et al., 2020), we created two age categories on the basis of the median age in the sample, resulting in a younger adult group (ages 18–38 yr; n = 106) and an older adult group (ages 39 yr and older; n = 102), with sufficient numbers of observations for DIF analyses. We used the R package lordif (Choi et al., 2011) to assess for possible DIF by gender and age. On the basis of prior research on patient-reported outcome measurement, we applied the following criteria for negligible DIF: pseudo- R2 < .13 and change in β < 0.05 (Cook et al., 2012).
Convergent Validity of the OBQ11
We also analyzed the convergent validity of the OBQ11, or the extent to which the measure was associated with other constructs in the expected manner (Abma et al., 2016). Spearman’s correlation was used to measure the associations between the OBQ11 sum score and each of the measures shown in Table 1. The more correlations that are consistent with our hypotheses, the greater the evidence of convergent validity (Abma et al., 2016). Following conventions, effect sizes of r = .50 were considered large; r = .30, medium; and r = .10, small (J. Cohen, 2013).
An association between the OBQ11 and categorical level of depression (e.g., in a depressed state or not) was also hypothesized, given prior evidence of an association between the two (Wagman et al., 2021). To examine this, we used receiver operating characteristic (ROC) analysis to test the extent to which OBQ11 scores could predict moderate or more severe levels of depression, as indicated by the Patient Health Questionnaire (PHQ; Kroenke et al., 2009) and its associated cutoffs. On the PHQ, scores of 10 to 14 represented moderate, 15 to 19 represented moderately severe, and 20 to 24 represented severe depressive symptoms. With ROC analyses, we examined the accuracy with which the OBQ11 could predict PHQ scores of 10 (moderate depression) or higher. The R package cutpointr (Thiele & Hirschfeld, 2020) was used to perform ROC analysis. An additional advantage of ROC analysis was that it also allowed us to calculate the range of OBQ11 scores that were most predictive of moderate or worse depression. This range could potentially aid in the clinical interpretation of OBQ11 scores.
A total of 208 adults with T1D participated in the study and were included in analyses; full demographic information is outlined in Table 2. They had an average age of 40.2 yr (SD = 14.6; minimum = 18, maximum = 75), 55% were female, 88% preferred to have study materials in English, and 12% preferred to have study materials in Spanish. The mean score on the OBQ11 (raw sum) was 19.5 (SD = 5.7; scale = 0–33); median = 20, minimum = 0, and maximum = 33. It had a normal distribution (see Figure A.1 in the Supplemental Appendix, available online with this article at https://research.aota.org/ajot), and the 25th, 50th, and 75th percentiles were 16, 20, and 22, respectively. One participant had a minimum score of 0 (0.5% of participants), and 5 had a maximum score of 33 (2.4%), suggesting minimal ceiling and floor effects.
Dimensionality of the OBQ11
The GPCM fit the OBQ11 acceptably, whereas the partial credit (Rasch) model showed marginally acceptable fit. Unidimensionality of the OBQ11 was supported, because at least one of these models fit it well. For the Rasch model, root-mean-square error of approximation (RMSEA) = .05 (values <.08 indicate acceptable fit), comparative fit index (CFI) = .91 (>.95 indicates good fit), and Tucker–Lewis index (TLI) = .91 (>.95 indicates good fit; Hu & Bentler, 1999; see Table A.1). For the GPCM, RMSEA = .04, CFI = .97, and TLI = .98. The results of a likelihood ratio test using the M2 statistic (Maydeu-Olivares & Joe, 2006) indicated that the GPCM had statistically better fit compared with the Rasch model (p < .001; Table A.2). In terms of item fit, four of the OBQ11 items had poor fit under the Rasch model (Table A.3), as evidenced by standardized infit–outfit values outside of the range of −1.96 to 1.96 (Wright et al., 1994). In the GPCM, none of the items had poor fit (Table A.4), as indicated by p values > .05 for S-χ2 (Kang & Chen, 2008). For both Rasch and GPCM, the range of response category thresholds covered most of the distribution of OB levels (Figures A.2 and A.3). Response category thresholds appeared to have a gap between OB θ values between 0 and 1 for both models, but inspection of their test information functions (Figures 1 and A.4) suggested that the gap was not problematic, as a high amount of test information was still present in this range.
OB scores derived from the GPCM factor scores were found to have a correlation of r = .99 (p < .001) with the simple sum of OBQ11 items. The two appeared to be nearly interchangeable, so tests of convergent validity were conducted with the OBQ11 sum scores.
The OBQ11 had acceptable reliability, as indicated by a Cronbach’s α of .90, which is above the .70 threshold for acceptable reliability (Gliem & Gliem, 2003). Figure 1 shows the GPCM IRT total information function of the OBQ11 across the range of OB levels. An OB value of 0 represents an average OB level, whereas levels of 3 and −3 denote very high and very low OB levels (i.e., 3 SDs above or below the mean of 0), respectively. The formula “reliability = 1 − (1/test information)” was used to translate test information values to reliability, for which a value of .70 is considered acceptable (Thissen, 2000) and >.90 is considered excellent (Gliem & Gliem, 2003). In Figure 1, wherever the test information line is above the horizontal line that indicates a reliability of .70, reliability is considered acceptable. Reliability was acceptable for most OB values except for extremely high levels (i.e., OBQ11 levels were 2.5 SD above the mean of 0).
Differential Item Functioning
Results of DIF testing with the lordif package indicated that none of the OBQ11 items exhibited DIF by gender or age (Tables A.5 and A.6).
Convergent Validity of the OBQ11
As shown in Table 3, most of the correlations were consistent with our hypotheses, which provides evidence supporting the convergent validity of the OBQ11 scale score. The correlation between the OBQ11 and depression (r = −.53, p < .001) had a large effect size. Medium effect sizes were evident for OBQ11 scores in relation to the other emotional well-being measure and self-reported health management measures. The exception was that time in the healthy blood glucose range over the 2-wk study period was only marginally associated with the OBQ11 (r = .14, p = .09).
As hypothesized, the results of ROC analysis suggested that the OBQ11 raw score successfully distinguished people with low depression from those with moderate or more severe depression. The area under the curve (AUC) for distinguishing these groups was .83. AUC values ranging from .8 to .9 are considered “excellent” (Mandrekar, 2010). Furthermore, OBQ11 raw score values of ≤17 were found to be most predictive of a higher probability of moderate or more severe levels of depression.
Validity and Reliability of the OBQ11
Overall, the results supported the reliability and validity of the OBQ11 in a U.S. sample of adults with T1D. The GPCM was found to fit the OBQ11 acceptably well, which is consistent with its theorized unidimensional structure. With regard to reliability, the OBQ11 had a Cronbach’s α of .90. From the test information function, the OBQ11 was found to have at least .70 reliability (acceptable) for assessing most levels of OB, except for very high levels. A similar coverage gap at very high levels of OB was found in the original OBQ11 validation paper, but the authors of that paper argued that less precision in measuring high OB may not be problematic (Håkansson et al., 2020). Those with high levels of OB, regardless of exactly how high it is, may not need to improve their OB, making exact measurement of high OB less vital (Håkansson et al., 2020). In terms of convergent validity, most of the correlations found were consistent with hypothesized directions. Time in the healthy blood glucose range was not associated with the OBQ11, although the correlation was in the expected direction and approaching significance. Consistent with our hypothesis, the OBQ11 was found to be a good predictor of concurrent moderate to severe depression, as measured by the PHQ (Kroenke et al., 2009).
Although there are good reasons to suspect a connection between health management and OB (as measured by the OBQ11), to our knowledge, this is the first study that empirically demonstrated an association between the two. The empirical relationships between the OBQ11 and the health management behaviors examined here suggest that OB may be one aspect of lifestyle that is closely linked with health management behaviors. OB may, therefore, deserve greater attention in both research and practice as a potential intervention target for clients with chronic conditions, to encourage more comprehensively the uptake of healthy self-management behaviors.
Practicalities of OBQ11 Use
The simple sum of OBQ11 items was highly correlated with factor scores for OB obtained from the IRT model (r = .99). This suggests that calculating an overall OBQ11 score does not require the use of IRT software and that sum scores are sufficient, which is more feasible for the majority of practitioners.
In terms of the interpretation of OBQ11 scores, our results suggest that OBQ11 values of 17 or lower (on a scale ranging from 0 to 33) best describe patients with moderate to severe depression levels, as measured by the PHQ (Kroenke et al., 2009). If a practitioner administers the OBQ11 to a person from a population that is similar to the sample studied here, then a score of 17 or less could indicate a problematic level of OB that may warrant further attention.
Additional evidence is needed to support the generalizability of results to adults with chronic conditions more broadly. Participants in our sample were adults with T1D experiencing various stages of the COVID-19 pandemic. The majority of participants had insurance coverage. Although these characteristics may not be shared with many groups, people with chronic conditions often have many common experiences, including the challenges of disease management, such as following complex treatment regimens under various situations and navigating health care systems for treatment supplies such as medications (Fernandez-Lazaro et al., 2019). These shared experiences may increase the generalizability of our results. Aside from the commonality of T1D, our sample was diverse in several characteristics, including ethnicity, income level, age, and job status.
The study used a cross-sectional design in which the study measures were not administered at the exact same time point; the OBQ11 was administered at baseline, whereas several measures used for validity testing were administered at the 2-wk follow-up assessment. Study appointment scheduling logistics further resulted in follow-up assessments being administered up to 1 mo after baseline. The resulting time lags between assessments may have attenuated some relationships with the OBQ11, even though the constructs assessed may be assumed to be relatively stable over time and are unlikely to change significantly within a month.
Correlations were found between the OBQ11 and both emotional well-being and health management variables, but the observational nature of our study precludes causal interpretations of findings. One core idea of the philosophy of occupational therapy is that time use is key to health and well-being (Meyer & Haworth Continuing Features Submission, 1983). Therefore, occupational therapists often theorize that better time use, as can be partly indicated by higher OB, can contribute to improved health outcomes (e.g., emotional well-being and health management). At the same time, however, higher perception of OB may be caused by greater emotional well-being and more effective health management. We speculate that the relationship between OB and both emotional well-being and health management is bidirectional. For greater evidence that OB acts as a causal agent, a randomized controlled trial (RCT) in which participants are randomly assigned to either a control group or a treatment group receiving an intervention focused on improving OB, and the effects on health and well-being are measured, may be helpful.
Implications for Occupational Therapy Practice
The results of this study have the following implications for occupational therapy practice:
▪ Higher perceived OB was found to be associated with better health self-management, including higher reports of adherence to diabetes management behaviors (i.e., glucose monitoring, insulin administration, exercise). It was also associated with better emotional well-being. Thus, indirectly addressing self-management behaviors and emotional well-being through OB may be an approach worth greater consideration (e.g., in a future RCT).
▪ Occupational therapists who work with clients with chronic conditions (e.g., in practice areas of lifestyle management, chronic condition care, home health, and mental health) may find the OBQ11 to be a useful assessment tool or a starting point for discussion about OB.
▪ Our results suggest that if a client’s OBQ11 score is found to be 17 or lower, there is a stronger possibility that OB may be at a level low enough to affect mental health (i.e., moderate or more severe depression). Such information can be considered in combination with other assessment results.
In a U.S. sample of adults with T1D, we found evidence that supports the validity and reliability of the OBQ11. Consistent with prior research and theory, the OBQ11 appeared unidimensional, as evidenced by the acceptable fit of an IRT model (the GPCM). From the OBQ11’s test information function, it was found to have acceptable reliability across most values of OB. Furthermore, the OBQ11 had theoretically expected relationships with validated measures of emotional well-being and health management. Comparing methods of scoring the OBQ11, we found that sum scores were nearly identical to scores derived from IRT models. This suggests that computing an overall OBQ11 score as the sum of items is perhaps preferable, because it is the simpler method. In terms of interpretation of OBQ11 scores, values of 17 or less were found to predict a greater probability of moderate or more severe depression. Overall, our results suggested that the OBQ11 is a reliable and valid measurement tool.
This work was supported by the National Institutes of Health, National Institute of Diabetes and Digestive and Kidney Diseases (1-R01-DK121298-01). Elizabeth A. Pyatak received a donation of continuous glucose monitors from Abbott Pharmaceuticals for use in the research study that collected the data analyzed in this article. The remaining authors declare no known conflict of interest.