Introduction
Clinical supervision is defined as “the formal provision of an intensive relationship-based education and training by qualified health practitioners that is case-focused and which supports, directs, and guides the work of supervisees” (Milne, 2007, p. 440). Training is a more general concept that refers to the acquisition of work-related competencies, and clinical supervision focuses on the therapists' work with their own cases and the provision of supervision guidelines (Milne, 2007). Training, and in particular clinical supervision, for managing the techniques and the forging of an adequate therapeutic relationship has been highlighted as one of the factors that can explain success of psychological therapy (Erekson et al., 2017; Snowdon et al., 2017). Several monographs have been published, including the Wiley International Handbook of Clinical Supervision (Watkins & Milne, 2014), which provides a comprehensive overview of the research and clinical work up to date in this area. In parallel to the previous publication, the American Psychological Association (APA) has released a series of guidelines for the adequate practice of clinical supervision (APA, 2014). These publications highlight the growing commitment to the extension of clinical supervision to therapists and the best ways to do so.
Different methods of supervision have been documented in the literature from multiple theoretical orientations. From the cognitive-behavioral approach (CBT) discussion, role play and role play modeling are among the most widely implemented. Discussion is the most widespread method of supervision regardless of its usefulness, as it seems to be preferred for its ease of application but shows lower compliance with the guidelines (Bearman et al., 2013). Research on variables associated with compliance with the guidelines by the supervisees in therapeutic sessions is scarce. Confidence with the supervisors is among the most described variables that could influence the perceived impact of the guidelines (Bernard & Goodyear, 2014). Research on the aforementioned variables is focused on the relationship between supervisors and supervisees, in many cases relying on simple measures such as a single item (Kavanagh et al., 2003).
Given this scenario, this is an emerging area of research, still with important work ahead. Basic gaps are still to be filled in the literature in order to understand clinical supervision and its usefulness, such as finding out whether therapists actually comply with the guidelines they receive and the mechanisms necessary to favor it (Bearman et al., 2013). Understanding this phenomenon will lead to testing, in future approaches, other effects of clinical supervision, such as favoring the success of psychological therapy (Callahan et al., 2009). For this purpose, we will use observational methodology. Its advantages have been pointed out in the behavioural sciences (Pérez, 1996). Despite this, its use is not widespread due to its implementation difficulties, such as greater time consumption for observer training and data recording, and the economic costs of the acquisition of coding software (Heyman et al., 2014). Natural observation, correctly used, entails the advantage of maintaining adequate external validity (Bakeman & Quera, 2011). Using observational methodology, we can approach to explore the compliance with the supervision guidelines avoiding falling into the limitations of other studies, in which the experimental manipulation has yielded results alien to the clinical reality (Chambless & Ollendick, 2001).
The objectives of this study are: (1) to contribute to the knowledge of clinical supervision using observational methodology, exploring the compliance with the guidelines of novice therapists, as well as variables that could contribute to this; and (2) to explore the characteristics of clinical supervision at the psychology clinic of the Universidad Autónoma de Madrid (CPA-UAM). Observational designs with similar objectives have been carried out with N = 1 designs (e.g., Milne et al., 2003), but our approach is innovative as it considers the supervision guidelines as the unit of analysis, allowing a more sophisticated treatment of the data. According to the aims described, we developed several hypotheses, as follows:
Hypothesis 1: The most frequently used supervision method is discussion. Given the predominance of this method, as reported in the scientific literature (e.g., Bearman et al., 2013), it is expected that a greater number of supervision guidelines will be provided by discussion compared to role play and role play modeling, as assessed by a self-developed rating scale.
Hypothesis 2: The higher the therapists' confidence with the technical supervision provided by the supervisors, the greater their compliance with the guidelines. It is expected that guidelines categorized with a rating scale as “compliance” correspond to higher scores in confidence with supervision on a self-report.
Hypothesis 3: The higher the therapists' rating of the usefulness of supervision methods, the greater their compliance with the guidelines. It is expected that guidelines categorized with a rating scale as “compliance” correspond to higher scores in confidence with the supervision methods on a self-report.
Hypothesis 4: The higher the therapists' confidence to ask questions to their supervisors, the greater their compliance with the guidelines. It is expected that guidelines categorized with a rating scale as “compliance” correspond to higher scores in confidence to ask questions on a self-report.
Method
Sample
Six novice CBT therapists (supervisees), in their first year of residency (R1) at the CPA-UAM and seven supervisors, teachers of the Faculty of Psychology or therapists at the CPA-UAM with more clinical experience than the R1s, participated. Recordings of forty supervision sessions and 80 therapy sessions (the two sessions that followed each supervision session) were collected, which ranged in length from 12 min to 1 hour and 11 min (M = 35.4 min); 31 clients of the CPA-UAM, both adults and children, with different diagnoses participated in the study. The training program consisted of a three-year residency. Between four and six novice therapists enter the first year of residency (R1), and only two of them continue as second (R2), and third-year (R3) residents. Residents participate in periodic supervision sessions, with R1s receiving 8 hours per week. Table 1 includes some characteristics of the supervisors and supervisees.
Supervisees, supervisors, and clients signed informed consent specific to the current research project in which they authorized the researchers to collect, on video tape, the sessions in which they participated. Data collected were encrypted by reversible dissociation through a code, known only by the researchers, thus guaranteeing confidentiality. The research project in which this study is included was approved by the ethics committee of the Autonomous University of Madrid.
Instruments
Rating Scale for Assessing Compliance with Supervision Guidelines (ad hoc Development)
The observers recorded the guidelines verbalized by supervisees and supervisors in the supervision sessions and assessed their compliance in the following two therapeutic sessions. Different groups of categories were elaborated, as listed below:
- Time frame of the guidelines: time interval in which it is foreseeable that the supervisees will be able to comply with the guidelines: (1) short-term (in the next two therapeutic sessions); (2) long-term (unforeseeable in the next two therapeutic sessions); and (3) general guideline (compliance dependent on the occurrence of particular conditions).
- Guideline issuer: (1) guidelines given by a supervisor; (2) guidelines given by a peer (a R1 therapist that is not being supervised); and (3) guidelines given by oneself (the supervised therapist verbalizes a guideline).
- Compliance with the guidelines: degree to which is assessed that the therapists have implemented the supervision guidelines in the therapy sessions: (1) total compliance (guidelines are implemented completely dedicating enough time); (2) partial compliance (guidelines are not implemented completely or not enough time); (3) non-compliance (guidelines are not followed); and (4) future implementation (guidelines are not implemented but the therapists verbalize the possibility of doing so in the future).
Self-report of Supervision Sessions (ad hoc Elaboration)
Supervisees, after each supervision session, indicate who their supervisors were and which peers (R1 therapists not being supervised) were present during the session. They assessed, using a Likert-type scale (0-10): 1) confidence with the technical supervision of each participant, e.g., “Confidence that this person's clinical supervision gives me”; 2) perceived usefulness of the method of supervision implemented, e.g., “Usefulness that each supervision method(s) have had for you”; and 3) confidence to ask questions, e.g., “Confidence I feel with this person to express my doubts openly”.
Other Materials
Supervision sessions were taped using a video camera provided by the Faculty of Psychology of the Autonomous University of Madrid. The therapeutic sessions were taped using a closed-circuit camera installed at the CPA-UAM. Statistical analyses were performed using the statistical packages IBM SPSS Statistics 23.0 (IBM Corporation, 2015) and the programming language R (R Core Team, 2013).
Procedure
Sample Collection
Data collection was conducted between November 2019 and March 2020. Each supervisee participated in weekly supervision sessions, completed their self-reports, and e-mailed them to the researchers up to 48 hours after the supervision session. The researchers themselves were responsible for compiling the supervision and therapeutic sessions.
Development of ad hoc Instruments
The rating scale for assessing compliance with the supervision guidelines was developed following the Bakeman & Quera (2011) instructions for constructing observation instruments:
Formulation of a research question: how can we assess the level of compliance with the supervision guidelines?
Establishment of a level of analysis: the system followed a social criterion. It allowed the assessment of behaviours whose distinctions between them were not purely physical.
Establishment of the observation conditions: a) we defined a window of two therapy sessions to assess compliance with the guidelines; b) we used a continuous register; c) we calculated the inter and intra-observer agreements only after the observation of the therapy sessions; d) we observed recorded and not live sessions; and e) we used a paper register.
Informal observation of the sessions, after which we made the first proposal of the categories. They underwent several changes in the process.
Formal proposal of the final categories. As we described before, the final groups of categories were: a) time frame of the guidelines; b) guideline issuer; c) compliance with the guidelines.
Changes in the system based on experience. Two independent observers with clinical knowledge rated eight supervision sessions and their corresponding 16 therapeutic sessions. An expert in clinical psychology and observational methodology coordinated the construction of the scale. The development of the scale was concluded when three consecutive observations obtained intraclass correlation coefficients (ICC) greater than .81 (Landis & Koch, 1977). The ICCs obtained ranged from .925 to .995. Once the rating scale was developed, one researcher coded the 40 supervision and 80 therapeutic sessions and a second researcher coded 10% of them to guarantee an adequate inter-rater agreement, obtaining ICCs above .81 (.87-.953). Intra-rater agreement was also computed in 10% sessions, obtaining ICCs above .81 (all 1.000).
The self-report of supervision sessions was developed following the clinical criteria of the members of the research team and the scientific literature on confidence and clinical supervision (Bernard & Goodyear, 2014; Kavanagh et al., 2003).
Data Analysis
To test hypothesis 1, Pearson's chi-square goodness-of-fit test was performed using the companion package (Mangiafico, 2020) in Rstudio. To test hypotheses 2, 3, and 4, one-way analyses of variance (ANOVA) were performed using SPSS.
Results
A total of 603 supervision guidelines were collected. The results are presented below, organized according to the hypotheses described.
Hypothesis 1: The Most Frequently Used Supervision Method Is Discussion
As we hypothesized, a significantly higher proportion of supervision guidelines was provided by discussion (n = 581, 96.4%) compared to role play (n = 18, 3.0%) and role play modeling (n = 4, π = 0.6), χ2(2) = 1074.7 (p < .001, V = .944), showing a large effect size.
From hypothesis 2, only the supervision guidelines whose time frame was coded as “short-term” were included in the analyses, as we theoretically predicted that long-term and general guidelines would be always coded as “non-compliance”. We also excluded the supervision guidelines that were coded as “provided by oneself” as it was impossible to rate the confidence with the supervisors and we expected an overwhelming compliance with those guidelines that could mask some effects. The total number of guidelines used was 366.
Hypothesis 2: The Higher the Therapists' Confidence with the Technical Supervision Provided by Supervisors, the Greater the Compliance with Guidelines
Homoscedasticity could not be assumed, F(3, 366) = 12.057 (p < .001), so we used Welch's F, a robust alternative of the F-statistic for samples with different group sizes. A significant difference was found between at least two compliance groups F(3, 49.324) = 6.304 (p = .001, est. ω2 = .042), with a small effect size. Post hoc comparisons, performed using the Games-Howell test, showed that the guidelines rated “total compliance” received a significantly higher score in confidence with the supervision provided by supervisors than the ones rated “non-compliance”, the mean difference being 0.512 (p < .001, CI 95% [0.20, 0.82]). Table 2 shows the confidence means in the compliance groups, the pairwise comparisons and the significance of their differences.
Hypothesis 3: The Higher the Therapists' Rating of Usefulness of Supervision Methods, the Greater the Compliance with Guidelines
Homoscedasticity could not be assumed, F(3, 362) = 3.288 (p = .021). A significant difference was found between at least two compliance groups F(3, 49.763) = 2.977 (p = .04, est. ω2 = .016), with a small effect size. Post hoc comparisons, performed using the Games-Howell test, showed that the guidelines rated “partial compliance” received a significantly higher score in confidence with the supervision provided by supervisors than guidelines rated “non-compliance”, the mean difference being .57 (p < .036, CI 95% [0.26, 1.11). Table 3 shows the confidence means in the compliance groups, the pairwise comparisons and the significance of their differences.
Note. TC = total compliance; PC = partial compliance; NC = non-compliance; FI = future implementation.* p < .05.
Differences in sample sizes between the compliance groups in which a significant difference was found may have led to biased results (Delacre et al., 2019), so we grouped compliance groups in: (1) compliance (total compliance and partial compliance) and (2) non-compliance (non-compliance and future implementation). Guidelines rated “compliance” received a significantly higher score in confidence with the supervision methods (n = 176, M = 8.119, SD = 1.076) than “non-compliance” (n = 190, M = 7.842, SD = 1.320), t(359.273) = 2.203 (p = 0.028, η2 = .01), with a small effect size.
Hypothesis 4: The Higher the Therapists' Confidence to Ask Questions to Their Supervisors, the Greater the Compliance with Guidelines
Homoscedasticity could not be assumed, F(3, 362) = 3.121 (p = .026). There is no evidence of differences in confidence to ask questions between the compliance groups F(3, 51.037) = 0.580 (p = .631, est. ω2 = -.003). Mean confidence to ask questions scores for each compliance group were: (1) total compliance (n = 145, M = 8.69, SD = 0.829), (2) partial compliance (n = 32, M = 8.72, SD = 0.851), (3) non-compliance (n = 175, M = 8.80, SD = 0.945), and (4) future implementation (n = 14, M = 8.86, SD = 0.663).
Discussion
In light of the results, the predominant method of supervision at the CPA-UAM is discussion, and there are higher ratings of confidence with technical supervision and usefulness of the methods in the supervision guidelines complied by the supervisees. No relations were found between a more relationship-related confidence type (confidence to ask questions) and compliance with the guidelines. The results will be discussed below.
As we predicted in hypothesis 1, the most used supervision method was discussion, a phenomenon reported in the scientific literature, even though other methods, such as role play and role play modeling, are more advisable (Bearman et al., 2013). This could be due to the ease of implementation of this method and time constraints that may exist in psychology clinics. Eight hours of supervision are carried out weekly at CPA-UAM with the six R1 therapists in which they supervise their cases. The predominance of discussion may be due to the need to supervise several cases in a short period of time: based on this data, a reorganization of the schedule could be advisable, accompanied by training for the supervisors on the benefits of implementing other methods.
Hypotheses 2, 3, and 4 were proposed to explore the relations between compliance with the guidelines and some variables of supervisees' confidence with supervision, a phenomenon loosely analyzed in the literature. In general terms, we found that guidelines assessed as “compliance” had higher scores in terms of confidence with technical supervision (hypothesis 2) and perceived usefulness of the methods (hypothesis 3), both technical aspects of supervision. In comparison, a more relationship-related confidence type (confidence to ask questions, hypothesis 4), was not related to compliance with the guidelines. The scientific literature is unclear as to what confidence with supervision entails, so the differences found can guide the approach of future studies. In studies planned by our research team, a coding system is being used to analyze the verbal interactions between supervisors and supervisees, allowing us to analyze aspects such as the supervisors' styles of supervision. This will help us understand the results found to a greater extent.
The main contribution of this study is the proposal of an innovative approach to the study of the compliance with the supervision guidelines using observational methodology. The guidelines were used as the unit of analysis as opposed to previous observational approaches, that used the therapists as the unit of analysis with the proposal of N = 1 designs (Milne et al., 2003). It has been possible, therefore, to perform more sophisticated analyses, helping the generalizability of the results, a limitation of some N = 1 designs. Moreover, this study has allowed for an in-depth study of the training method of the CPA-UAM. The final aim of this research line is the elaboration of a guide for the training of supervisors, given that nowadays they face this task guided only by their own clinical experience. Future approaches could focus on the exploration of the level of compliance between therapists and supervisors with different levels of experience.
Limitations
The main limitation of this study is the small sample size: guidelines assessed as “partial compliance” and “future implementation” were less numerous compared to those assessed as “total compliance” and “non-compliance”. Some effects could have been masked by this. Future approaches will require a larger sample of supervision sessions and, therefore, of supervision guidelines. Moreover, the sample collection period (November 2019-March 2020) was not as extensive as it should have been. Further approximations should consider all the sessions included in an academic year in case this variable could affect the results. Another limitation of this study relates to the small number of supervisors and supervisees, as well as the fact that all participants were women. The scientific literature consulted so far has not addressed whether characteristics such as gender influence the success of clinical supervision. In future studies, it will be necessary to expand the sample size and study possible new mediating variables.
Conclusions
The objectives of this study have been successfully addressed: (1) we have explored the phenomenon of compliance with the guidelines and probably related variables and (2) we have contributed to the knowledge of these practices in the CPA-UAM. These results will lead us, in the future, to the elaboration of a guide for action to help supervisors in their task.