Introduction
In the past decades, research in medical education has highlighted the importance of assessment of competences with effective feedback as an important mechanism to promote the shift from 'assessment of learning' to 'assessment for learning' [1,2]. This premise, based in Miller's [3] and Mager's [4] principles changed the dogmas of the assessment methodologies, giving more emphasis to the processes that underlie learning and consolidation of knowledge. The latter is also critically promoted by a shift in the focus of assessment from acquisition of knowledge ('knows') to acquisition of competencies ('does') [1,5,6].
However, assessment of competences poses relevant challenges, especially due to the lack of appropriate tools and processes. In fact, and as an example, it has been shown that, the majority of first-year trainees in internal medicine residency program were neither observed nor received feedback more than once by a faculty member during a clinical encounter [7]. In addition, real-life scenarios may be hard to simulate in controlled environments and not fit all needs of competence training and assessment. Part of this problem is solved by the introduction of simulation scenarios, both for learning/training of skills in simulation labs and assessment processes, namely using objective structured clinical exams (OSCEs). In parallel, the focus in workplace-based assessment (WBA) grew and new tools for assessment in the workplace context appeared. WBA scales were created to foster the opportunity for the assessment of clinical skills and provide direct feedback with the final goal of improving global performance. Many tools have been developed with this premise in mind and some of the most known are i) the Mini-Clinical Evaluation Exercise (Mini-CEX) [8], ii) the Direct Observation Procedural Skills (DOPS) and iii) the case-based discussion. Each of these tools has different specific assessment domains and focuses but the goal of providing direct and clear feedback is maintained and considered essential for the good use of these methods [2,6,9].
In the Portuguese context, the case-based discussion is by far the most used method. While it may present some advantages (e.g. it is very comprehensive), it also presents several limitations: 1) it does not assess the ability to perform a clinical encounter; 2) it is time-consuming, being performed only a few times during the medical degree; and 3) it is artificial when compared with the daily clinical encounter.
The Mini-CEX scale was originally created for post-graduate assessment by the American Board of Internal Medicine, to encourage the observation of performance in short daily clinical encounters by qualified faculty. This scale focus primarily in providing useful feedback to the person being evaluated, with the main goal being the induction of change in behavior and, ultimately, the improvement of the student's clinical performance [8].
The Mini-CEX has been validated in several languages and in different populations [10-13]. Nevertheless, its validity in Portuguese language and population is still to be studied. In this work, we aim to translate, adapt and validate the Mini-CEX to the Portuguese language. For this, we applied the Mini-CEX in a Internal Medicine Department and in an OSCE.
Subjects and methods
This study took place at the School of Medicine-University of Minho (EM-UM) and at the Internal Medicine Department at the Hospital of Braga. The experimental protocol was approved by the Ethics Committee of the University of Minho (CEICVS 072/2019). The Declaration of Helsinki and the Council of Europe's Convention on Human Rights were strictly followed [14,15].
An extensive literature review regarding WBA scales, mainly the Mini-CEX, as well as their variations, design, validation process and impact evaluation, was conducted through available databases [16 17 18 19-20].
Mini-CEX Scale
In its original version, the Mini-CEX, developed by the American Board of Internal Medicine (Appendix 1), is an essentially formative assessment scale, composed by 7 items that assesses the competencies inherent to the clinical interview, the physical examination, the patient counseling and the clinical judgement. The last item assesses global clinical performance.
Each item is quantitatively rated in a 9-point Likert scale and qualitatively rated in three classes: 'unsatisfactory' (classification from 1 to 3), 'satisfactory' (classification from 4 to 6) and 'superior' (classification from 7 to 9). The Mini-CEX was designed to be short (10 to 20 min of observation and 5 to 10 min providing feedback) and easy to apply.
The Mini-CEX was developed in a post-graduation context, with the purpose of facilitating the formative evaluation of nuclear clinical competencies and it can be used by the assessors as a routine evaluation of the students in any scenario.
Simulation scales
The OSCE consisted of six 15-minutes stations, each with 10 minutes for the clinical interview and 5 minutes for a specific task of physical examination. Four different domains were assessed using five different assessment scales. The first and second domains, Information Gathering (Hx) and Physical Examination (ExF) respectively, were assessed with checklist scales (Hx and ExF). The third domain was Communication, assessed with a Communication Assessment Scale developed at EM-UM and rated by two groups of assessors: faculty (CASF) and Standardized Patients (CASSP), all of those with assessment experience. Finally, the fourth domain was a post-encounter task. The final classification of the OSCE is the combination of the 5 different classifications of the four domains.
Scale translation
The translation of the English version of Mini-CEX to the Portuguese language was conducted by two Portuguese independent English teachers with expertise in translation. One of the translators worked together with a medical student from the EM-UM, to better understand the nuances of the topic being translated. Discrepancies between the two translators were discussed and resolved between the original translators, and with the assistance of a physician specialized in medical education. The scale was pilot-tested in a small sample of physicians who usually supervise medical students, medical students and faculty of EM-UM that provided feedback about their thoughts on each item. Suggestions from this group were assimilated into the preliminary version. This preliminary version was submitted to a committee of four physicians, experts in medical education, which resulted in the final forward-translation (Portuguese version).
After the translation process, the scale was then back-translated by the same initial team and submitted to the committee of experts, that produced the final back-translation (English version). The final back-translation was then sent to the original authors to assess if the original purpose of the scale was maintained.
After the translated scale passed through a preliminary pilot testing and subsequent revisions, we conducted a final pilot test among the intended respondents for initial validation. In this final pilot test, the final version of the scale was administered in a workplace context, as would happen in a real WBA, and the participants were asked if the items were perceptible and it's objective clear.
Participants
To study the validity of the scale, trained faculty rated each student during a high-stakes OSCE at the final of the 3rd year of our 6-year Medical Degree. This OSCE is composed by 6-stations where students must collect the clinical history, perform a specific task of the physical examination and provide counseling about the next steps of patient management.
Each individual was assessed in 6 different stations. There was no significant time interval between assessments.
Sample size
For this work the sample size was calculated using a ratio of 10:1 (10 respondents for scale item) with a minimum of 70 respondents for a total of 7 items [21].
The final sample size of the 3rd year students assessed during the OSCE was 818 assessments with a total of 143 students. The assessments were performed by 34 faculty with experience in evaluating students' clinical performance.
Reliability-internal consistency
Internal consistency reflects the extent to which the scale items are intercorrelated, or whether they are consistent in measurement of the same construct. In this work we used the coefficient alpha (Cronbach's alpha) to estimate the internal consistency of the scale. In this analysis only items P1 to P6 were included, taken into consideration that P7 is a global performance item. Cronbach's alpha ranges from 0 to 1. Cronbach's α = 0 indicates no internal consistency (none of the items are correlated with one another), whereas α = 1 reflects perfect internal consistency (all the items are perfectly correlated with one another). In practice, Cronbach's alpha of at least 0.70 has been suggested to indicate adequate internal consistency. On the other hand, an alpha value that is too high (α ≥ 0.90) suggests that some questionnaire items may be redundant.
Validity
The validity of a scale assesses if that scale measures what it is intended to measure. In other words, it verifies if the inferences and conclusions made based on the results of the scale are valid. There are two major types of validity to be considered when validating a questionnaire: content validity and construct validity.
Content validity
Content validity refers to the extent to which the items in a scale are representative of the entire theoretical construct the questionnaire is designed to assess. Although the construct of interest determines which items are written and/or selected in the questionnaire development/translation phase, content validity of the scale should be evaluated after the initial form of the scale is available.
The scale was assessed by a committee of experts that judged whether the items are adequately measuring the construct intended to assess, and whether the items are sufficient to measure the domain of interest, classifying each item of the scale in three levels: not necessary, useful but not essential, essential.
The content validity ratio (CVR) was then calculated for each item by employing the Lawshe's method [22]. The CVR proposed by Lawshe is a linear transformation of a proportional level of agreement on how many experts within a panel rate an item 'essential'.
The final evaluation to retain the item based on the CVR depends on the number of total panel members. Table I shows the guideline for the valid value of CVR in order to retain the item.
Participants' satisfaction
Participant's satisfaction analysis intends to replace to some level the previously used term 'face validity” which, in turn, refers to the degree to which the respondents judge the scale items to be valid. To assess satisfaction, the participants (both students and assessors) answered a questionnaire of four 5-point Likert scale questions and a 9-point Likert scale question (Table II), respectively, regarding their satisfaction with the scale.
Students questions (1, low; 5, high) | 1. The Mini-CEX is a practical method |
| |
2. The answers are a fair assessment of your skills | |
| |
3. This process is useful for your personal development | |
| |
4. This process gave useful information about you to the assessor | |
| |
Assessors question (1, low; 9, high) | 1. Assessors satisfaction with the Mini-CEX |
Construct validity
Construct validity refers to the extent to which a measure adequately assesses the construct it purports to assess [23]. The construct validity of a questionnaire can be evaluated by estimating its association with other variables. It should be correlated positively, negatively, or not at all.
In order to assess the construct validity of the Portuguese Mini-CEX, an Exploratory Factor Analysis (EFA) was performed on a random sample of approximately 50% of the total sample (n = 376). EFA is a statistical analysis used to explore the underlying structure and relationship of multiple variables, that allows the reduction of the number of variables.
To test the suitability of the scale for the factor analysis, we used: 1) Kaiser-Meyer-Olkin (KMO), which measures sampling adequacy, ranging from 0 to 1, in which higher values mean higher suitability and a value of 0.6 is a suggested minimum and 2) Bartlett's test of sphericity, which tests the hypothesis that the correlation matrix is an identity matrix, which would indicate that the variables are unrelated and therefore unsuitable for structure detection. Taken together, these tests provide a minimum standard which should be passed before a factor analysis is conducted. After the suitability tests, we used the Principal Axis Factoring Analysis as the extraction method.
On a second moment, a Confirmatory Factor Analysis (CFA) was performed on the remaining variables (after the random selection for the EFA; n = 442). The CFA is a multivariate statistical procedure that is used to test how well the measured variables represent the number of constructs. In this case, the CFA was used to confirm the analysis performed in the previous EFA.
Criterion validity
Criterion validity refers to degree to which there is a relationship between a given test score and performance on another measure of particular relevance, typically referred to as criterion. There are two forms of criterion validity: predictive (criterion) and concurrent (criterion) validity.
The concurrent validity was assessed through the correlation between the simulation-based assessments scales usually used in the OSCE and the Mini-CEX.
The OSCE consisted of six 15-minutes stations, each with 10 minutes for the clinical interview and 5 minutes for the examination of the patient. Four different domains were assessed with five different assessment scales/tasks. The first and second domains, Information Gathering (Hx) and Physical Examination (ExF) respectively, were assessed with a checklist scale (Hx and ExF). The third domain was Communication, assessed with a Communication Assessment Scale developed in the EM-UM and rated by two types of assessors: the faculty (CASF) and the Standardized Patients (CASSP), all of those with assessment experience. And finally, the fourth domain was a post-encounter task. The final classification of the OSCE is the combination of the 5 different classifications of the four domains.
To study the correlation between the two assessment methods, we calculated the Pearson correlation coefficient (R) and the R2 in two different analysis. For 'analysis 1' we used the following variables: HxX, ExFX, CASFX, CASSPX and P7_X (the overall clinical competence item) classification for each OSCE station. For 'analysis 2' we used the average classification of Hx_Average, ExF_Average, CASF_Average, CASSP_Average and P7_Average for each individual student and the correspondent Final OSCE Classification. R values near –1 or 1 are considered perfect correlation; strong correlation if –0,95 or +0,95; medium correlation for –0,5 and +0,5 values; weak correlation if –0,1 or +0,1 and no correlation if zero.
Normal distribution was assumed for all variables, given the high sample size (n > 30).
The predictive validity was not assessed in this work.
Results
Scale translation
The final forward translation and back translation of the 7 items can be seen in Table III and IV, respectively. The full documents of both forward and back translations can be found in Appendix 2.
Não satisfaz | Satisfaz | Satisfaz bastante | Não observado /aplicável | |||||||
---|---|---|---|---|---|---|---|---|---|---|
1. Relação médico-doente | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
2. Recolha de informação | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
3. Competências no exame físico | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
4. Raciocínio clínico | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
5. Aconselhamento e planeamento | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
6. Organização na abordagem ao doente | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
7. Competência clínica global | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Unsatisfactory | Satisfactory | Very satisfactory | Not observed / applicable | |||||||
---|---|---|---|---|---|---|---|---|---|---|
1. Doctor-patient relationship | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
2. Information gathering | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
3. Physical examination skills | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
4. Clinical judgment | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
5. Counseling and planning | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
6. Patient approach organization | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | n/o |
| ||||||||||
7. Overall clinical competence | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
The back translation was approved by the original authors of the scale, in the person of Dr. John J. Norcini, PhD.
The scale was piloted in 5 medical students and no major problems were detected.
Validation process
For the final validation process, we had a total of 143 subjects, 34 faculty members and 818 assessments. The missing values (n = 101 missing values of a total of 5,726; 1,76%) were replaced with the subject's 7 items score average. The descriptive statistics of each item can be found in Table V.
Items | |||||||
---|---|---|---|---|---|---|---|
P1 | P2 | P3 | P4 | P5 | P6 | P7 | |
| |||||||
Mean | 6.501 | 6.108 | 6.784 | 5.807 | 5.896 | 6.113 | 6.162 |
| |||||||
Median | 7.00 | 6.00 | 7.00 | 6.00 | 6.00 | 6.00 | 6.00 |
| |||||||
Standard deviation | 1.718 | 1.814 | 1.801 | 1.742 | 1.753 | 1.709 | 1.606 |
| |||||||
Minimum | 1 | 1 | 1 | 1 | 1 | 1 | 1 |
| |||||||
Maximum | 9 | 9 | 9 | 9 | 9 | 9 | 9 |
Reliability
In Table VI, we show the results of the internal consistency analysis of each item, with and without exclusion of items, as well as the corrected item-total correlation. The evaluation of the internal consistency of the scale by the Cronbach's alpha reveals values superior to 0.70, either for the global scale (0.927), as for the individual items. In 6 of the 7 items (P1, P2, P4, P5, P6 and P7), the Cronbach's alpha if item deleted was inferior to the one of the global scale, suggesting that the items in question substantially contribute to the global scale. Only P3 had a superior Cronbach's alpha if item deleted (0.938), but very close to global Cronbach's alpha. The corrected item-total correlation suggested good discriminative power, surpassing the critical value of 0.20, defined as the minimum value for a good correlation index.
Validity
Content validity was guaranteed for every item by 16 panel members, with a minimum consensus of 0.50 Lawshe score (for P5) and a maximum of 1 (for P2). The results for each item are presented in Table VII.
Item | Essential (n) | Useful but not essential (n) | Not necessary | Lawshe score |
---|---|---|---|---|
P1 | 15 | 1 | 0 | 0.875 |
| ||||
P2 | 16 | 0 | 0 | 1.000 |
| ||||
P3 | 15 | 1 | 0 | 0.875 |
| ||||
P4 | 14 | 2 | 0 | 0.750 |
| ||||
P5 | 12 | 4 | 0 | 0.500 |
| ||||
P6 | 13 | 3 | 0 | 0.625 |
| ||||
P7 | 13 | 3 | 0 | 0.625 |
Overall satisfaction (Table VIII), assessed by both students and faculty, was considered satisfactory with an average of 4.79 out of 5 for students and an average of 7.88 out of 10 for faculty.
Average answer | ||
---|---|---|
Students questions (1, low; 5, high) | 1. The Mini-CEX is a practical method | 4.80 |
| ||
2. The answers are a fair assessment of your skills | 4.70 | |
| ||
3. This process is useful for your personal development | 4.90 | |
| ||
4. This process gave useful information about you to the assessor | 4.70 | |
| ||
Total | 4.79 | |
| ||
Assessors question (1, low; 9, high) | 1. Assessors satisfaction with the Mini-CEX | 7.88 |
For the construct validity, test suitability was ensured by the KMO measure (KMO = 0.906) and the Bartlett's test of sphericity (χ2(376) = 1784.114; p < 0.001).
Construct validity was firstly assessed by an EFA with a Principal Axis Factoring as the extraction method (Table IX). One factor was extracted, including items P1 to P6. From the six items analyzed, P6 is the item with the most expression, while P3 is the item with least expression.
Factor Matrixa | ||
---|---|---|
| ||
Item | Factor 1 | Dimension |
P1 | 0.792 | Clinical competence |
| ||
P2 | 0.842 | |
| ||
P3 | 0.625 | |
| ||
P4 | 0.889 | |
| ||
P5 | 0.859 | |
| ||
P6 | 0.924 |
Extraction Method: principal axis factoring.
a1 factors extracted; 5 iterations required.
A new variable, 'Clinical Competence' (CC), representing factor 1, was computed from the average of items P1 to P6. The correlation between CC and P7 was statistically very strong (p < 0.001; R = 0.961). In fact, this correlation is maintained for the all sample size (p < 0.001; R = 0.959), and the predictive value of the 6 items maintains its distribution when predicting P7.
To confirm the EFA, we performed a CFA on a first model compiling one latent variable with 6 observable variables, as suggested by the EFA. This model revealed inadequate goodness of fit. The post-hoc analysis of the models suggested additional covariances between e1-e3, e1-e5 and e2-e4. For that reason, a new model was developed, as seen in Figure. The second model confirms the analysis performed with the EFA, but identifies additional covariances between items, raising awareness to their individual suitability and interdependence.
Concurrent validity
To better understand the strength of the relationship between the Mini-CEX and the OSCE results, we performed a Pearson's correlation (Table X) that demonstrated a significant correlation between both the assessment methods, not only in the final score (R(143) = 0.796), but in all the domains assessed in the OSCE. A more detailed analysis of each OSCE station can be found in Table XI.
Hx average | ExF average | CASF average | CASSP average | Final score | ||
---|---|---|---|---|---|---|
P7 average | Pearson correlation | 0.689a | 0.595a | 0.926a | 0.747a | 0.796a |
| ||||||
Sig. (2-tailed) | 0.000 | 0.000 | 0.000 | 0.000 | 0.000 | |
| ||||||
n | 143 | 143 | 143 | 143 | 139 |
aCorrelation is significant at the 0.01 level (2-tailed).
Hx1 | ExF1 | CASF1 | CASSP1 | ||
---|---|---|---|---|---|
P7_1 | Pearson correlation | 0.543a | 0.025 | 0.798a | 0.469a |
| |||||
Sig. (2-tailed) | 0.000 | 0.772 | 0.000 | 0.000 | |
| |||||
n | 141 | 141 | 141 | 141 | |
| |||||
Hx2 | ExF2 | CASF2 | CASSP2 | ||
| |||||
P7_2 | Pearson correlation | 0.573a | 0.217b | 0.826a | 0.381a |
| |||||
Sig. (2-tailed) | 0.000 | 0.011 | 0.000 | 0.000 | |
| |||||
n | 137 | 137 | 137 | 137 | |
| |||||
Hx3 | ExF3 | CASF3 | CASSP3 | ||
| |||||
P7_3 | Pearson correlation | 0.563a | 0.275a | 0.756a | 0.440a |
| |||||
Sig. (2-tailed) | 0.000 | 0.001 | 0.000 | 0.000 | |
| |||||
n | 143 | 143 | 143 | 143 | |
| |||||
Hx4 | ExF4 | CASF4 | CASSP4 | ||
| |||||
P7_4 | Pearson correlation | 0.523a | 0.232a | 0.730a | 0.442a |
| |||||
Sig. (2-tailed) | 0.000 | 0.006 | 0.000 | 0.000 | |
| |||||
n | 138 | 138 | 138 | 138 | |
| |||||
Hx5 | ExF5 | CASF5 | CASSP5 | ||
| |||||
P7_5 | Pearson correlation | 0.456a | 0.225b | 0.822a | 0.613a |
| |||||
Sig. (2-tailed) | 0.000 | 0.015 | 0.000 | 0.000 | |
| |||||
n | 117 | 117 | 117 | 117 | |
| |||||
Hx6 | ExF6 | CASF6 | CASSP6 | ||
| |||||
P7_6 | Pearson correlation | 0.513a | 0.163 | 0.812a | 0.602a |
| |||||
Sig. (2-tailed) | 0.000 | 0.054 | 0.000 | 0.000 | |
| |||||
n | 141 | 141 | 141 | 141 |
aCorrelation is significant at the 0.01 level (2-tailed);
bCorrelation is significant at the 0.05 level (2-tailed).
Discussion
This work demonstrates that the Portuguese version of the Mini-CEX has good internal consistency and reliability. This observation is in line with other validity studies, mostly in the English language [10 11 12-13,24 25-26]. Importantly, the use of this scale enables better assessment of clinical skills and provides relevant feedback to the students.
Regarding the validation process, and starting with the internal consistency analysis, the scale has a very high Cronbach's alpha score (0.927). However, an important reflection about the internal consistency of the Mini-CEX, is the high sample size, which might influence the analysis. Nevertheless, and to support this analysis, we analyzed the internal consistency in a much smaller sample of different students and faculty that confirmed the good internal consistency (Cronbach's alpha: 0.889; n = 32).
The validity of the scale also proved very satisfying, with good content and construct validity. In fact, 16 experts were consulted to perform this qualitative analysis and no item was considered 'not necessary' by any of the panel members. Satisfaction, another subjective and qualitative analysis, also demonstrated high values, especially by the students but also by the faculty. Given that this a formative scale, this type of validity is important because it sustains the scale feasibility amongst its users, and most importantly, shows that the students perceive it as a good and fair assessment method, providing useful information from their assessors and for their personal development. Faculty satisfaction is critical to the implementation of the scale. In fact, the application of the scale requires time from the faculty, which might present as an obstacle to its use; therefore, the high levels of satisfaction of the faculty with its use is of relevance to a successful implementation of Mini-CEX.
Another interesting finding occurred during the construct validity analysis. As previously explained, the item P7 was excluded from the EFA given that it is a global performance question. During this analysis, only one factor was extracted, which lead to a confirmatory analysis that computed a new variable 'CC'. As expected, this new variable is strictly correlated to P7, suggesting that the first 6 items are, in fact, all contributing to the global performance. In a different perspective and taking into account that P7 might be sufficient at measuring the clinical performance, one might question the relevance of the other 6 items. However, this reflection depends mainly on the purpose and context of the scale's use. If the scale is used in a formative context, in which providing quality and discriminated feedback is the main goal, we believe that maintaining all the items strengthens the scale and its purpose. On the other hand, if the scale is used exclusively for summative assessment purposes, the present analysis suggests that a single global performance 5-point Likert scale question might be sufficient.
During the EFA, the expression of each individual item was also assessed and, curiously, the item with most expression was P6, which refers to Patient Approach Organization. This finding was surprising and suggests that the student's ability of organizing the clinical encounter is the factor that influences more the faculty's opinion regarding the global performance. In contrast, P3 was the item with least expression, suggesting that the physical examination, despite being an essential part of the clinical encounter, does not influence as significantly the global performance. The CFA actually confirmed these findings and identified new covariances between variables. Again, we decided not to aggregate or exclude any of the variables keeping in mind that formative feedback is the main goal of Mini-CEX.
Finally, the decision of studying the concurrent validity was fruitful, and highly supports that the Mini-CEX is a good clinical performance scale. Looking into these results, we can understand that the Mini-CEX strongly correlates with the assessment scales used in the OSCE, particularly with the domains that are not assessed with a checklist tool (CASF and CASSP). This is also interesting because the strength of the correlation with these two domains is closer to the one seen in the final score correlation. In the future, we believe that a global performance Likert-point question might be of value in the Information Gathering (Hx) and the Physical Examination (ExF) domains, both assessed with checklist, in the summative perspective of the Mini-CEX applied to simulation exercises.
There are several limitations in this study that should be considered when interpreting these results: i) the design of the study did not allow convergent and divergent validation mechanisms, which could be important to establish construct validity; ii) authors did not assess test-retest reliability since the scale was administered in different simulation environments, by different assessors and sequentially during an OSCE; iii) authors did not assess inter-rater reliability because each student was assessed by different assessors but in different environments and never with the same exact condition; iv) students were recruited from a single Medical School.
The strengths of the study include: multiple assessors involved; large sample size.
In summary, our study concludes that the Portuguese Mini-CEX is a valid formative scale to assess clinical performance, with good internal consistency and validity and that is ready to be implemented in the Portuguese context as a complement to other assessment methods.