Decades of scientific research have established three important findings concerning personnel selection interviews. First, according to a number of surveys carried out in different countries and with all types of organizations, the employment interview is the most frequently used procedure and it is the most relevant in the decision-making of practitioners (Alonso, Moscoso, & Cuadrado, 2015; Salgado & Moscoso, 2011). Second, research has also found that structured interviews have proven to be a valid procedure for predicting job performance (Huffcutt, Culbertson, & Weyhrauch, 2014; McDaniel, Whetzel, Schmidt, & Maurer, 1994; Salgado & Moscoso, 1995, 2006). The third finding has been to demonstrate, across the world, that interviews are overall the instrument which is most positively regarded by candidates (Anderson, Salgado, & Hülsheger, 2010; Liu, Potočnik, & Anderson, 2016; Steiner & Gilliland, 1996).
A scarcely researched issue concerning the selection interview is the degree to which interviewers feel confident about their decisions when they use a specific type of interview (e.g., unstructured vs. structured). A second issue is to identify what structured interview content (e.g., conventional vs. behavioral) shows a better capacity to identify candidates’ suitability for a job. A third less investigated issue is related to two biases that can affect the assessments: a) the degree to which sex similarity between candidate and interviewer affects interview decisions and b) the effect of having additional information about the candidate (e.g., test results, resume, and recommendation letters).
The objective of this research is to shed further light on these four neglected issues concerning the usefulness of the interview as a procedure for making hiring decisions.
Employment Interviews: Types and Psychometric Properties
There are three main interview types depending on their content and degree of structure (Salgado & Moscoso, 2002): (1) Conventional Unstructured Interview (CUI), which is the most used personnel interview, refers to an informal conversation between the candidate and the interviewer, who formulates the questions according to the course of the conversation and without following any previous script (Dipboye, 1992; Goodale, 1982); (2) Structured Conventional Interview (SCI), in which the interviewer works from a script or a series of guidelines about the information that must be obtained from each interviewee and it typically includes questions about credentials, technical skills, experience, and self-evaluations (Janz, Hellervik, & Gilmore, 1986); and (3) Structured Behavioral Interview (SBI), which is based on the evaluation of past behaviors (Janz, 1982, 1989; Moscoso & Salgado, 2001; Motowidlo et al., 1992; Salgado & Moscoso, 2002, 2011). Meta-analyses have shown the reliability and construct and criterion validity of the different types of interviews (e.g., Huffcutt & Arthur, 1994; Huffcutt, Culbertson, Weyhrauch, 2013, 2014; McDaniel et al., 1994; Salgado & Moscoso, 1995, 2006). Other studies have also reported on content validity (e.g., Choragwicka & Moscoso, 2007; Moscoso & Salgado, 2001).
With respect to reliability, Huffcutt et al. (2013) carried out a new meta-analysis to update the results found by Conway, Jako, and Goodman (1995). The results for low structure interviews (CUI) were .40 when they were evaluated by separate interviewers and .55 in panel interviews. For the interviews with a medium level of structure (SCI), the values increase to .48 (serial interviews) and .73 (panel of evaluators). Finally, in the category of “high structure” (SBI) they found a reliability of .61 in the case of serial interviews and .78 when the evaluation is performed by a panel of evaluators. In their meta-analysis, Salgado, Moscoso, and Gorriti (2004) found a coefficient of .83 for SBI. These results are like those found by Conway et al. (1995), that is, the higher the degree of structure, the greater the reliability among interviewers.
Several studies have found that structure is also an important moderator of validity since as the level of structure increases, the interview validity increases. Recently, Huffcutt et al. (2014) found higher validity coefficients. Specifically, their results showed a coefficient of .20 for non-structured interviews (CUI), .46 for conventional structured interviews (SCI) and .70 for those with a higher level of structure (SBI). This last result is very similar to the value of .68 found by the meta-analysis of Salgado and Moscoso, 1995, 2006), in which they concluded that the SBI was valid for all occupations with validity ranging from .52 for managers to .80 for clerical occupations.
Other relevant studies have found that the SBI is more resistant to adverse impact (Alonso, 2011; Alonso, Moscoso, & Salgado, 2017; Levashina, Hartwell, Morgeson, & Campion, 2014; Rodríguez, 2016). There is also evidence of the economic utility of the SBI (Salgado, 2007). As a whole, the results of the meta-analytical reviews performed supported the use of SBIs for hiring decisions.
Research vs. Practice Gap
Despite the empirical evidence on the psychometric properties of the SBI, there is still a gap between research findings and professional practice (Alonso et al., 2015; Anderson, Herriot, & Hodkingson, 2001). Nowadays, most medium and small companies continue using unstructured interviews rather than structured behavioral ones.
In this regard, there are some issues related to professional practices that have been insufficiently researched. For instance, research is scarce concerning the degree to which interviewers feel confident about the decisions based on SBI or SCI. Two small-sample studies carried out by Salgado and Moscoso (1997, 1998) found that the interviewers have more confidence in their assessments with SBI than with SCI. However, additional studies are necessary.
Research has also shown that access to previous information about candidates (e.g., resume, recommendation letters, academic record, and test scores) can produce impression bias in appraisals (Campion, 1978; Paunonen, Jackson, & Oberman, 1987). For example, Macan and Dipboye (1990) found that the interviewer's prior impressions on candidates correlated .35 with the ratings given to interviewees. The frequency of this kind of bias seems to be larger for unstructured interviews than for structured ones (Dipboye, 1997). In fact, research on highly structured interviews recommends against having access to the candidate's prior information (Campion, Palmer, & Campion, 1997; Latham, Saari, Pursell, & Campion, 1980). This recommendation has been supported by the meta-analytical studies of McDaniel et al. (1994) and Searcy, Woods, Gatewood, and Lace (1993), who found higher criterion validity when the interviewers did not have access to cognitive test scores.
Another scarcely researched issue is the degree to which sex similarity between candidate and interviewer can bias interview decisions. Elliott (1981) found that the female candidates were assessed slightly higher by male interviewers (d = 0.28) and that the male candidates were rated similarly by female and male interviewers in a SCI. Using a campus recruitment interview, Graves and Powell's (1996) findings showed that sex similarity of interviewer and candidate correlated .08 with the overall appraisal. In a third study, Sacco, Scheu, Ryan, and Schmitt (2003) found that the ratings for the candidate were higher when interviewer and candidate sex were matched (d = 0.09). More recently, McCarthy, Van Iddekinge, and Campion (2010) examined the effects of sex similarity on the evaluations for three types of highly structured interviews (experience-based, situational, and behavioral). They concluded that the effects of sex similarity were non-significant. Therefore, as a whole, the findings of these three studies are inconclusive, although they suggest that SBIs can be more robust against sex-similarity bias than SCIs and UCIs.
Aims of the Study
The first objective of this study is to compare the effectiveness of each interview in identifying the candidate's suitability for a job. Considering that the SBI has more validity than the SCI, the following hypotheses are considered:
Hypothesis 1: the SBI identifies candidates’ capacities more accurately, which implies that it discriminates better between qualified and unqualified candidates than the SCI.
Hypothesis 1a: Accuracy for identifying qualified and unqualified candidates will be greater for the SBI than for the SCI.
Hypothesis 1b: Qualified candidates will receive higher scores in the SBI than in the SCI, while non-qualified candidates will receive lower scores in the SBI than in the SCI.
The second objective is the study of the degree to which interviewers feel confident about their decisions according to the type of interview used, SBI or SCI. Considering that the SBI allows for a more precise evaluation and that the information acquired through the SCI is more susceptible to different interpretations, the following hypothesis is proposed:
Hypothesis 2: Interviewers will be more confident about their decisions when using the SBI than when using the SCI.
The third aim of this research is to analyze and compare SCI and SBI resistance to two biases that may influence the interview decision: (a) the effect of having additional information about the candidate and (b) the effect of the similarity of sex between evaluator and candidate. With regard to these two biases, we make the following two hypotheses:
Hypothesis 3: Additional information affects assessments made with both SCIs and SBIs.
Hypothesis 4: SCIs are more affected by interviewer-interviewee sex similarity than SBIs.
Method
Sample
The sample consisted of 241 university students aged between 18 and 59 (mean was 24.53 and SD was 6.7); 78.4% were studying a subject related to personnel selection and, therefore, had theoretical knowledge about the different types of interview that are used in selection contexts; 57.4% of the sample was female. The study was presented to the students as an academic exercise in which they had to evaluate different candidates.
Design
We used a 2 x 2 x 2 design. The independent variables were: (a) type of interview, i.e., SCI or SBI; (b) candidate qualification level, i.e., qualified or unqualified; and (c) interviewee sex. The dependent variable was the raters’ assessment of the candidates.
Experiment Preparation
Scripts creation. Before the video-recording of interviews, four scripts were developed for a HR technician job. The content of these scripts detailed both the questions that the interviewer should ask and the exact answers that the interviewee should give. Scripts 1 and 2 were for an SCI and the other two for an SBI.
For the two interview types, we created two different scenarios with exactly the same questions. Nevertheless, the candidate's responses varied substantially, so that the answers corresponded to a qualified candidate in scripts 1 and 3, and the answers corresponded to an unqualified candidate in scripts 2 and 4. The participants rated the candidates in four dimensions, including organization and planning, teamwork, problem-solving, and overall score. Figure 1 shows the four experimental combinations.
To verify that the scripts fulfilled the purpose of this research, seven personnel selection experts were asked to evaluate the candidates represented in the four designed scripts. A written copy of each of the scripts was given to them along with an evaluation sheet with the interview dimensions. A 5-point Likert scale was used. Table 1 summarizes the assessments made by the expert group.
As can be seen in Table 1, in both the SBI and the SCI, the qualified candidate obtains higher scores than the unqualified candidate in all dimensions; in all cases these differences were significant. Scores are more extreme in the case of SBI, that is, the differences between the qualified and the unqualified candidate are much more pronounced in the SBI than in the SCI.
Video recordings. A man and a woman were selected to play the role of interviewees. To avoid the image of the interviewees making an impression on the raters which could affect the ratings, an effort was made to ensure that the two candidates had a similar image and both appeared dressed in the same way during the interview: a black jacket and a white shirt. The role of interviewers was played by two personnel selection experts.
Finally, eight interviews were recorded, four in which the man played the role of interviewee and four in which it was the woman who did. The scripts were the same for both actors in each condition. The duration of the videos is approximately 10 minutes in the case of the SCI, and in the SBI, script 3 (qualified candidate) lasted 35minutes and script 4 (unqualified candidate) 22 minutes.
Measures
Interviewee assessment. Participants assessed the candidate in the dimensions of organization and planning, teamwork, problem-solving, and overall assessment on a scale of 1 to 5 (1 = insufficient and 5 = excellent). Although one of the characteristics of the SBI is the use of behavioral anchor scales (BARS) for the assessment of candidates, in this study it was decided to use the same rating scale in both interviews. In this way, the potential effect of the rating system was neutralized.
Internal consistency reliability, calculated from Cronbach's alpha, ranges from .77 (n = 143) in the case of the qualified interviewee to .86 (n = 154) for the unqualified candidate using the SBI. In SCI, the values obtained were .80 (n = 121) and .78 (n = 153) for the qualified and the unqualified applicant respectively.
Confidence about the assessment. Raters indicated the degree of confidence about the assessment they had made. A 5-point Likert scale was used (1 = not confident and 5 = totally confident).
Procedure
Participants had to adopt the role of raters. They were instructed so as to believe that the interviews were part of a real selection process for a HR technician. After seeing each interview, raters assessed the candidates, using the scale described in the previous section.
A sub-sample (n = 175) was divided into four groups, in which sex and candidate type (qualified vs. unqualified) were alternatively presented, as can be seen in Figure 2. Raters watched an SCI and an SBI. In the first group, the female candidate was interviewed with an SCI and the male candidate with an SBI. The two candidates were qualified. In the second group, the female candidate was interviewed with the SBI and the male candidate with an SCI. In this case, the two candidates were qualified, too. In the third group, the female candidate was interviewed with an SCI and the male candidate with an SBI. In the fourth group, the female candidate was interviewed with SBI and the male candidate with an SCI. In these last two groups, the candidates were unqualified. Furthermore, in each group, raters were divided randomly, one half of the group first watched the SBI and then the SCI and the other half watched them in the reverse order.
Another sub-sample (n = 66) watched both interview types for the same candidate. That is, they watched two different interviews with the same candidate. As shown in Figure 3, raters were divided into two groups, in which the sex and candidate type were alternated. Thus, in one of the groups, the qualified candidate (both in SCI and SBI) corresponded to the woman and the unqualified candidate to the man and in the other group, the qualified candidate corresponded to the man and the nonqualified to the woman.
Results
Table 2 shows the results for the four interview ratings. Firstly, we report the differences between qualified and unqualified candidates, who have been interviewed using the SCI and, secondly, the differences using the SBI. Finally, the differences according to the type of interview appear.
Regarding the SCI, the mean of the qualified candidate ranged from 3.27 for the teamwork dimension to 3.85 for organization and planning. The overall score was 3.55. For the unqualified candidate, the mean ranges from 2.04 for the problem-solving dimension to 3.20 for teamwork. In this case, the overall score was 2.64. The differences between the two candidates were significant for the dimensions of organization and planning, problem-solving, and overall score (p < .001). The effect sizes (ES) ranged from d = 1.13 for the overall score to d = 1.62 for the problem-solving dimension.
With respect to the SBI, the differences between candidates’ scores were more extreme. The mean of the qualified candidate ranged from 4.50 for organization and planning to 4.19 for teamwork, while the mean of the unqualified candidate was considerably smaller. The mean ranged from 2.15 for teamwork to 1.88 for organization and planning. The differences between the interviewees were statistically significant for the four dimensions (p < .001), which confirms Hypothesis 1. The ES ranged from d = 3.40 for overall scores to d = 2.64 for teamwork. This result suggests that the differentiation between qualified and unqualified candidates becomes more accurate with the SBI.
The last two columns of Table 2 show the results of the comparison between the two candidates according to the type of interview. The qualified candidate obtained higher scores when evaluated with the SBI. The differences are statistically significant (p < .001) for the four dimensions. They ranged from d = 1.24 for the teamwork dimension to d = .85 for organization and planning. In the case of the unqualified candidate, the scores were lower when interviewed with an SBI, and the differences were statistically significant for organization and planning, teamwork, and overall score (p < .001). The ES ranged from d = -1.15 for teamwork to d = -0.80 for the overall score. These results support H1a and H1b, as the SBI allows for better discrimination between qualified and unqualified candidates than SCI.
With regard to the confidence degree of raters, the means were 4.13 and 2.97 for the SBI and SCI, respectively, in the case of the qualified candidate. The difference was statistically significant (p < .001, d = 1.22). For the unqualified candidate, the degree of confidence was 4.12 and 3.47 for the SBI and the SCI, respectively (p < .001, d = 0.8). Therefore, the SBI interviewers reported a similar degree of confidence for qualified and unqualified candidates, while SCI interviewers reported to be more confident when the candidate was an unqualified one (p < .001, d = 0.51). In other words, SCI interviewers report to be less confident about their rating when the candidate is qualified. These results supported Hypothesis 2.
Table 3 shows the comparison between the candidate scores when the raters watched one interview type only or both. Positive signs indicate that candidates received a higher score when the raters watched one interview only. For SCI, the ratings were higher for all dimensions in the case of the qualified candidate when the raters have available information on the other interview. However, for the unqualified candidate, the ratings were higher only for the problem-solving dimension.
As far as the SBI is concerned, in the case of the qualified candidate, there were non-significant differences for some of the dimensions assessed. For the unqualified candidate, the scores were lower in all dimensions, and the differences were statistically significant. These results partially supported Hypothesis 3.
Tables 4 and 5 report the results for the effects of interviewer-interviewee sex similarity. Table 4 shows that there were no differences when the qualified candidate was interviewed with a SCI. However, sex-similarity had significant effects for male rater-female candidate combination for the unqualified candidate. In this last case, female candidates obtained higher scores than the male candidates in three of the rated dimensions.
Table 5 shows that there were differences in overall scores for female rater-male candidate combination for the qualified candidate. In this last case, male candidates obtained higher scores than the female candidates. For the unqualified candidate, sex-similarity had significant effects for male rater- female candidate combination for the unqualified candidate in the organization and planning dimension.
Discussion
This study contributes to the research and practice of personnel interviews by shedding further light on four neglected issues concerning the usefulness of selection interviews as a procedure for making hiring decisions. The first issue was whether SBI is a more accurate method for distinguishing between qualified and unqualified candidates than SCI. The second issue was to examine what interview type produces more self-confidence in ratings and decision made by the interviewers. The third research issue refers to the potential bias of having prior information about candidates. Finally, the fourth issue refers to the effects of interviewer-interviewee sex similarity on interviewer appraisals and decisions.
The first contribution of this study is that, in accordance with Hypothesis 1, our findings showed that SBI allows for a clearer and more accurate differentiation between qualified and unqualified candidates. In other words, the findings supported the hypothesis that SCIs are a weaker method than SBIs for identifying what candidate is a better fit for the job conditions and requirements.
The second contribution of this study is to support previous findings of Salgado and Moscoso (1997, 1998), which showed that interviewers feel more self-confident about their appraisal and decisions when they use a SBI than when they use a SCI. This finding confirmed our Hypothesis 2.
The third contribution was to show how prior information about the candidate can produce biased interview decisions. We found that this only occurs in the evaluation of one of the candidates, which partially confirms Hypothesis 3. However, our results may be due to sampling error and, also, we have not used behaviorally anchored rating scales (BARS) due to experimental design needs. However, SBI is characterized by the use of BARS, which facilitate the accuracy in evaluations (Motowidlo et al., 1992). So, this bias could be reduced for the SBI when the BARS are used (Blackman, 2017).
With regard to the fourth aim of this research, results show that interviewer-interviewee sex similarity produces differences for some dimensions, for the two interview types, and for different rater-candidate sex combinations. Therefore, we can only conclude that the results were inconclusive. This finding suggests that additional studies are needed.
Implications for Practice and Future Research
The findings of this study have implications for the practice of personnel selection interviews. The results suggest that, from an applied point of view, SBI is a more robust method than SCI for identifying a candidate's suitability for a job. The SCI shows some limitations in differentiating between qualified and unqualified candidates. These results converge with the empirical evidence of the superior operational validity of SBI (Huffcutt & Arthur, 1994; McDaniel et al., 1994; Salgado & Moscoso, 2006). Moreover, it is likely that in practice the differences between the candidates are less obvious than in this study, which could contribute to an increase in the limitations of SCI in differentiating between qualified and unqualified candidates. Consequently, we recommend practitioners use SBI rather than SCI when possible.
Form a practical point of view it is also relevant to know that interviewers feel more confident when they use a SBI than a SCI. This point is relevant in connection with the finding that SBIs are less frequently used than SCI (Alonso et al., 2015). Taking into account that SBIs are more valid and accurate for making personnel decisions, researchers can also recommend SBI to practitioners because they produce greater self-confidence in their appraisals and decisions.
Our third recommendation to interviewers is to avoid using the information collected previously with other methods (e.g., cognitive tests, personality inventories, and letters of recommendation) during the interview process and decisions. We found that prior information has a bias effect on the interviewer evaluations. This recommendation concurs with meta-analytic findings that showed larger criterion validity for both types of interviews when interviewers do not have access to prior information about the candidate (McDaniel et al., 1994; Searcy et al., 1993).
With regard to future research, the significant growth in the use of new information technologies (IT) in the selection and assessment processes (e.g., e-recruitment, phone-based interviews, online interviews) suggests that new studies should be conducted to verify the way in which the findings of the present research can be transported to new assessment methods (Aguado, Rico, Rubio, & Fernández, 2016; Bruk-Lee et al., 2016; García-Izquierdo, Ramos-Villagrasa, & Castaño, 2015; Schinkel, van Vianen, & Ryan, 2016). This is especially relevant in the case of the interview, given that more and more companies are conducting online interviews, in which the interviewer and the interviewee communicate only through a computer and in which there is no real interaction between the interviewer and the interviewee (Grieve & Hayes, 2016; Silvester & Anderson, 2003).
Research Limitations
This study has some limitations. As mentioned previously, one limitation is that the samples sizes in some of the experimental conditions are relatively small. Although the overall sample is large (n = 241), in some conditions the sample size was thirty individuals. A second limitation is that the interview scripts were acted out by the same two people (one man and one woman). Additional interviewees would be desirable but this would require increasing the experimental sample size accordingly.
In summary, the objective of this research is to shed further light on these four neglected issues concerning the usefulness of the personnel interview as a procedure for making hiring decisions. Our findings suggest that SBI makes raters feel more confident and their appraisals are more accurate. We also found that prior information on the candidate negatively affects the interview outcomes.