Introduction
Therapist’s performance in psychological interventions is a key factor to both research and clinical practice (Beutler et al., 2004; Dinger et al., 2008; Norcross & Lambert, 2011; Ricks, 1974). In fact, from a common factor’s perspective, metanalysis conducted over the years found higher effect sizes for therapist than for therapy, showing up to 21% of effect size in natural clinical settings (Baldwin & Imel, 2013; Crits-Christoph et al., 1991; Johns et al., 2019; Wampold & Imel, 2015). Although there are controversies about the accuracy of common factors studies for these kind of outcome comparisons (e.g., Siev & Chambless, 2007), what studies on the specific common factor therapist effect show is that, regardless the therapeutic model or client’s problematic, there are therapists who are systematically better than others (Johns et al., 2019; Miller et al., 2008).
However, knowing that some therapists have higher performance does not settle the question of what exactly they do. Studying outcome and connecting it with the therapist effect is relevant to assess and detect this type of therapist but is not substantial in terms of improving psychotherapies and explaining which behaviors are connected to the highly effective therapists (Miller et al., 2008; Nissen-Lie et al., 2010; Saxon et al., 2017).
With that in mind, many authors correlate psychological, chronological, and demographic variables with the therapist effect to understand why these therapists are so effective (e.g., Anderson et al., 2009; Barkham et al., 2017; Goldberg et al., 2016b; Saxon et al., 2017). Among the results found, the therapeutic framework from the therapist, for example, is often dismissed as a significative variable (e.g., psychodynamic framework versus behavioral) (Anderson et al., 2009; Chow, 2014; Nissen-Lie et al., 2013). Likewise, the therapeutic experience does not seem to show significative correlation either (Cologon et al., 2017; Delgadillo et al., 2020). For example, Lutz and colleagues (2007) found that experience measured in years did not predict outcome among different health interventions (e.g., medical care) and that this type of measurement (years) was not accurate to operationalize experience. Also, variables such as age, gender, personality features or academic titles was not found significative either. That is, if all these personal variables were not correlated with highly effective therapists, which variables were? Are non-personal variables, such as type of intervention (e.g., psychotherapy versus psychiatric), modality (group versus individual), type of the clinical center (public versus private) or therapy length, relevant to the therapist effect? In sum, the correlations found in this field still unclear and the therapist effect, even being a great topic of interest, does not seem to be fully explained (Speers et al., 2022).
To understand why these questions are inconclusive, some authors point out that the theoretical framework and methodology used in these studies are mainly responsible for their limitations (Barker & McFall, 2014; Berglar et al., 2016; Dinger et al., 2008; Fonagy & Clark, 2015). In order to understand behaviors, not only must the rejection of the null hypothesis be considered, but also the theoretical approach on which it is based and the methodology behind the findings. For example, different operationalizations of outcomes may lead to different results, so it is necessary to comprehensively define what is outcome and what is reliability and validity of the instruments for its measurement (Green et al., 2014; Weinberger, 2014). Likewise, the theoretical approach behind the research could also affect its conclusions (Froxán-Parga et al., 2006; González-Blanch & Carral-Fernández, 2017). Isolating traits without their context can lead to different results and interpretations of the same phenomenon (Zilcha-Mano & Fisher, 2022), just as descriptive labels for predictive variables create tautological reasoning (Núñez de Prado-Gordillo et al., 2020). In short, the conclusions from a methodology such as the one discussed above could be normative (Sellars, 1956), and the findings, although significant, could have little or no practical utility.
Summarizing, there is a lack of information and precision regarding the explanatory variables of the effect between therapists. Although there are authors studying predictors factors of therapist effect, the findings are not organized, and current systematic reviews focus exclusively on measuring statistical indices of effect size. There is no unanimity or consensus in the current findings on the variables underlying highly effective therapist, whether replicable behavioral measures are used, or whether construct validity is adequate. Therefore, the aim of the present work is to systematically identify and bring together the available information on the variables that explain the therapist effect, considering the main methodological issues underlying these explanations.
Method
Study selection criteria
Following the PRISMA protocol (Preferred Reporting Items for Systematic Reviews), the research question for this SR is: ‘what are the variables associated with highly effective therapists in outpatient psychological therapy’ (outpatient interventions being understood as those carried out in both public and private facilities, but never in hospital or inpatient settings). Therefore, the aim is to classify the predictive variables and to identify how they have been operationalized and the type of instruments used. Due to the broad review scope of the research question (rather than a narrow scope), it is important to highlight that our interest is transversal to the specific interventions, i.e., what is sought is precisely to identify characteristics of different types of intervention, so therefore the research question does not necessarily make a specific prediction about a given variable.
Regarding to PICO (participants, interventions, comparisons, and SR outcome measures), the participants belong to private and public non-hospital centers; the interventions are from multiple types (as the therapy could be a predictor variable); the comparisons are made between therapist effects; the results are significant predictor of outcome variables; and the design is exclusively quantitative (the study focuses on outcome measures). The inclusion and exclusion criteria of the articles are presented in Table 1.
Identification of studies
The studies were collected from Scopus (Elsevier), MEDLINE/PubMed, Social Sciences Citation Index (Web of Science), PsycInfo, Google Scholar and ProQuest Research Library. The time interval was 2000 to 2020. Guided by previous systematic reviews (Baldwin & Imel, 2013; Johns, 2019), the search criteria were "therapist AND effects," "highly AND effective AND therapist," "supershrink," "therapist AND expert," "therapist AND highly AND effective," and "effect AND of AND therapist." The outcome criteria were "scientific article” AND "peer-reviewed publications". The languages were "English" and "Spanish", and the filter for the participants’ age was "over 18 years of age". The specific data can be found in Figure 1.
2,784 articles were first identified. Eight articles were added to these after searching the documentation from the previous metanalyses. The first screening was made throughout the titles and abstracts of the filtered documents. This entire process was carried out by the main reviewer, and 52 articles were included for complete reading. In this new phase, two reviewers made decisions regarding the inclusion and exclusion criteria. The main reviewer fully-read the 52 articles included while the second reviewer, by randomized selection, read 40, the minimum number necessary to ensure correct reliability according to Sánchez-Meca and Botella (2010). In order to avoid possible bias when selecting the articles (Heckman, 1990), the additional reviewer was unaware of the research question. Finally, to measure the degree of agreement, a simple kappa statistic was calculated (Higgins & Deeks, 2011). Considering that kappa values between .4 and .59 reflect moderate agreement, .6 and .74, as fair to good agreement, and .75 or more, excellent agreement (Orwin, 1994 in Higgins & Deeks, 2011), the kappa index for the eligibility of the present SR is "excellent" (.9).
Data extraction
The data extraction was performed by two reviewers. After a 10 pilot articles training, an extraction guide and an extraction form were developed. Both, guide and form, can be found in the Open Science Framework (access link https://osf.io/c87zd/?view_only=52ec7f341d00486498c703f9cf08aa49). Table 2 shows their classifications based on the specific categories proposed by Sánchez-Meca & Botella (2010). In order to obtain the kappa index, the eligibility criteria and the extraction of variables from each report were performed separately and in duplicate in 40 articles.
Note. aSubstantive variables are the ones related to the aim of the review;
bExtrinsic variables refer to characteristics that should not be related to the scientific process a priori but could affect the results (Sánchez-Meca & Botella, 2010).
Results
Extrinsic, treatment, and participant variables
Regarding the extrinsic characteristics (variables that are not related to the main purpose of the study but may have an influence on it according to Sánchez-Meca and Botella (2010), 84% of the therapists, 76% of the clients and 22.5% of the first authors are women. The authors and the goals of each article are listed in Table 3.
Regarding to the characteristics of the participants, anxiety is the most prevalent behavioral problem among clients (24 studies) and psychodynamic is the most prevalent therapeutic framework (14). In addition, although psychotherapy is the most prevalent treatment (20), interventions to mental health problems were also performed by non-psychologists, such as physical therapists, social workers, or computer devices. The complete characteristics of all participants can be found in Open Science Framework (access link https://osf.io/c87zd/?view_only=52ec7f341d00486498c703f9cf08aa49).
Methodological and substantive variables
90.3% of the articles define outcome as reduction of symptoms, 16.1% as therapeutic alliance, 12.9% as psychological well-being, 6.4% as vital functioning and 3.2% as therapeutic adherence (notice that some articles used more than one definition). The most used outcome instrument is self-report, specifically, the Symptom Checklist-90 (SCL-90; Derogatis & Melisaratos, 1983) (13 articles). Even though the self-report was used in every article included, some of them combined different methodology of data collection (e.g., observation or interview).
Most studies exclusively focused on psychological variables as predictor factors, however, 12 articles (out of 31) also combined them with demographical, structural, and chronological variables, such as age, gender, years of experience, ethnic, treatment length and/or academical titles. From all these studies, only three found significative correlations regarding these variables, specifically, age, years of experience and academic titles (Anderson et al., 2009; Berglar et al., 2016; Hersoug et al., 2009).
Combining all the variables studied, a total of 46 therapist effect predictive variables were compiled of which 41 were significant (Figure 2). Figure 2 also shows which variables were the most replicable and which were significant. Of those authors who operationalized their variables of interest, Table 4 contains a textual compilation of the definitions used for each predictive variable. It should be noted that, due to the characteristics of this SR, the effect size is not estimated at these levels of significance; it is simply highlighted whether or not the results were found to be significant in the study in question.
Note.AAI = Adult Attachment Interview; CD-RISC = The Connnor and Davidson Resilience Scale; CPPS = Comparative Psychotherapy Process Scale; CTS-R = Revised Conflict Tactic Scale; DPCCQ = Development of Psychotherapists' Common Core Questionnaire; ECR = Experiences in Close Relationships Scale; ESQ = Empathy and Sociability Questionarie; FIS = Facilitative Interpersonal Skills; GAS = Global Assesment Scale; HEXACO-H = Honestyhumility, Emotionality, Extraversion, Agreeableness, Conscientiousness and Openness to Experience; HSQ = Humor Style Questionnaire; HSQ = Humor Style Questionnaire; MAAS = Mindfulness Attention Awareness Scale; MITI 2.0 = Motivational Interview Treatment Integrity 2.0; NEO PI-R = Personality Inventory; N.A. = Not applicable; PBI = Parental Bonding Instrument; PIES = The Psychosocial Inventory of Ego Strengths; PSA = Playfulness Scale for Adults; RAPIDpractice = Retrospective Analysis of Psychotherapists’ Involvement in Deliberate Practice; REI = The Rational-Experiential Inventory measures Intuition; RFS = Reflective Functioning Scale; SSI = Social Inventory; THCLVT = Traditional high contact low volume therapists; TRIB-G = Therapy-Related Interpersonal Behaviors; TRIB-I = Therapy-Related Interpersonal Interview; WAI-C/T = Working Alliance Inventory – Client/Therapist;
aCompass Assesment is an instrument that allows to combine patients with similar clinic and demographic characteristics within a same case group of a given therapist in order to assess whether the second patient treated within a same combination shows better results compared to the first.
Discussion
The aim of this study was to systematically identify and organize the currently available information on the variables that explain the therapist effect. Based on the data found, we can draw different conclusions.
First, after a detailed analysis of the variables, the overall conclusion lies upon the 41 predictor variables extracted; although they were found significative to explain therapist effect, we still do not know the exact behaviors of the highly effective therapist and/or how to teach the lowest effective therapists to be better. From our perspective and after analyzing the data, this could be happening due different reasons.
The methodology used was controversial, especially the construct validity. Numerous variables pose an underlying problem: the construct is defined by another construct which is, in turn, defined by another construct that never is fully operationalized (e.g., “work engagement”, which is defined by “therapeutic connection” which is defined by “therapeutic alliance” that is never defined); also, several variables refer to the past of the therapist (e.g., type of attachment, maternal care), which means that they are immutable and therefore, unteachable. In addition, sometimes the variables can be independent and dependent at the same time (e.g., therapeutic alliance as independent and dependent variable). Considering these results, we can conclude that the practical applications of the conclusions of the studies reviewed in this research are very limited, since the variables that have been shown to be relevant are not defined as behaviors that can be trained.
On the other hand, approximately 90% of the variables collected turned out to be different from each other, exemplifying the current problem of hypothesis confirmation and replicability in psychology (Pérez-Álvarez, 2018; Spellman, 2015). Each author has his or her own impermeable theory and seeks to confirm his or her prediction, regardless of whether it is tautological, trivial, practical, or contributes to the fragmentation of psychology. Furthermore, the fact that all studies included in this SR showed, at least, one significant data, points to a possible publication bias effect (Dickersin et al., 1994).
The methodology used also does not allow studying the client's role moment by moment during therapy. The logic of correlation in aggregate studies prevents the identification of the interaction in session (Stiles, 1999). That is, even if the labels were correctly operationalized, correlating the therapist's data without taking into account what the client did before or after his or her performance would imply erratic behavior on his or her part.
About the characteristics of the studies, we have found several interesting aspects to highlight.
On the one hand, most of the leading researchers are identified as males. Although in recent years the gender equity of the first authors’ publications seems to be increasing (González-Sala & Osca-Lluch, 2018), the data found in this SR point in the opposite direction.
Regarding the characteristics of the participants and the sample, it is striking that physical therapists work with a population diagnosed with a mental disorder (and also present systematically high efficacy). This could support that, although variables like therapeutic model were the most replicated among studies (perhaps because of the ease of access and accuracy in measuring age, gender, and years providing therapy), they were precisely the only ones non-significant to the therapist effect. In addition, it is particularly interesting that experience does not correlate with outcome. This makes sense if we consider that time is not necessarily the explanation for being skillful, but rather the behaviors that one does while time is passing (Leon et al, 2005; Santacreu & Hernández, 2019); which brings back the ambiguity in the philosophical assumptions of the investigations. Do they consider the time as an independent variable? Or as the condition that allows the development of a certain type of learning (which undoubtedly should be considered for a study), but should not be the predictor variable?
About the instruments used to measure the variables, all 31 studies used questionnaires and/or scales of symptom reduction. This fact exemplifies the current predominance of self-reports in psychology (Santacreu & García-Leal, 2000). Although self-reports with equal measures of efficacy contribute to homogeneity, generalization of data and inter-therapeutic comparison, taking them as a unanimous measure also assumes a series of repeated limitations (Nissen-Lie et al., 2013).
Finally, it is important to emphasize that the aim of this study is ambitious, so it would be crucial to carry out other SR like the present one to reach more solid conclusions. Among the main limitations, we must highlight the complexity of comparing variables with different measurement parameters (e.g., "years of experience" in some studies is operationalized by the mean and in others, by the number of cases carried by each therapist or range of years by sample grouping). Secondly, not all the information was available. Empirical studies usually devote little space to clarify the construct validity of the variables and procedures carried out to specify the behaviors measured. Finally, it should be noted that the research question of this study is formulated under a broad scope. That is, the variables extracted seek to summarize the evidence in a global manner, focusing the findings on generalizable conclusions. Thus, its main strength is also its weakness, as it makes it difficult to interpret the results, hinders the work of the review team, and to assess the data (Higgins & Deeks, 2011).
In sum, the data found invites us to reflect about the present and future of therapist effect research; now that we organized and identified relevant predictors of therapist effect, what is its practical use? Could we create an "exceptional therapist" protocol and start teaching "regular therapists" how to behave? Does it make sense to continue confirming hypotheses and keep finding more predictor variables under the same methodology? With the data found in this study and from our perspective, the answer to these questions is no. Although nuances are found in the degree of operationalization of the different variables, none is sufficiently clear to be defined in replicable parameters and most importantly, is defined based on the therapeutic interaction (therapist's actions as reactions to the client's actions). All this points, once again, to the methodological obstacles of studying change processes in therapy (Callaghan & Follette, 2020), and that the research design of these studies could not be detecting the behaviors patterns that explain higher outcomes between therapists.