Introduction
Knowing yourself and acting accordingly is an important personal feature often named as authenticity. As human beings live their lives, they develop a sense of self of which they become aware to a certain degree. Conversely, becoming aware of the self does not imply acting accordingly as people may be influenced by others’ expectations (Barrett-Lennard, 1998). Individual differences in authenticity are key factors of personal well-being (Wood et al., 2008) and thus have become of central interest in humanistic psychology (Rogers, 1961), psychodynamics (Winnicott, 1965), and positive psychology (Gable & Haidt, 2005). Empirical psychology has developed measures to assess dispositional levels of authenticity, such as the Authenticity Inventory (Kernis & Goldman, 2005) and the Authenticity Scale (AS; Wood et al., 2008). However, the former has faced criticism for its extensive item count (Nartova-Bochaver et al., 2021) and inconsistent validity evidence (Grégoire et al., 2014). The AS has been extensively used and adapted across various cultures. This study introduces distinct types of validity and reliability evidence of the AS in a sample from Brazil and Portugal.
The Person-Centered Model of Authenticity
The AS is grounded on the person-centered model of authenticity. This is a model grounded on Rogers (1961), an exponent of humanistic psychology and one of the first theorists to discuss the notion of authenticity in psychology and counseling. Based on the person-centered approach of Rogers (1961), Barrett-Lennard (1998) defined authenticity as a tripartite construct designating “consistency between the three levels of (a) a person’s primary experience, (b) their symbolized awareness, and (c) their outward behavior and communication” (p. 82). According to Wood et al. (2008), this model provides a widely accepted construct definition and dimensionality framework of authenticity. Thus, it well-suited as the theoretical background for the development of the AS.
The first dimension is referred to as self-alienation, which “involves the inevitable mismatch between the conscious awareness and actual experience” (Wood et al., 2008, p. 386). One’s actual experience encompasses one’s true self, which differs to some extent from one’s conscious awareness. High rates of self-alienation can contribute to psychopathologies, whereas congruence between the true self and conscious awareness reflects the subjective experience of being connected to oneself.
The second and third dimensions are named authentic living and accepting external influence, respectively. Authentic living “involves behaving and expressing emotions in such a way that is consistent with the conscious awareness of physiological states, emotions, beliefs, and cognitions” (Wood et al., 2008, p. 386). In turn, accepting external influence “involves the extent to which one accepts the influence of other people and the belief that one has to conform to the expectations of others” (Wood et al., 2008, p. 386). Both dimensions entail the degree to which one acts in line with one’s perceived self. Authentic living refers to behaving in accordance with one’s values and beliefs. Conversely, accepting external influences represents the opposite, i.e., the introjection of others’ expectations.
The Construction and Adaptations of the AS
The AS was originally developed in the UK and has three factors with four items each (Wood et al., 2008). The 12 items are responded to in a 7-point scale ranging from 1 (does not describe me at all) to 7 (describes me very well); intermediate scale points are not anchored. The factors assess the three dimensions of the person-centered model of authenticity. The first version with 25 items was tested by exploratory factor analysis (EFA) resulting in three factors. A brief version containing the four items with the highest loadings per factor was tested in a confirmatory factor analysis (CFA) with a second-order factor. The model presented excellent fit and reliability, as tested by alpha and test-retest correlations. The AS showed discriminant validity from the Big Five personality. Associations with well-being (self-esteem, life satisfaction, positive affect, psychological well-being, and gratitude) and ill-being (anxiety, stress, and negative affect) showed convergent validity.
The AS has been adapted to Iranian (Shamsi et al., 2012), Turkish (İlhan & Özdemir, 2013), French (Canada; Grégoire et al., 2014), Italian (Di Fabio, 2014), Swedish (Vainio & Daukantaitė, 2016), Serbian (Grijak, 2017), Portuguese (Balbino et al., 2018), Ukrainian (Zlyvkov et al., 2019), Chinese (Xia et al., 2022), Russian (Nartova-Bochaver et al., 2021), and Sinhala (Zoysa et al., 2021). In Russia, the authentic living subscale was rephrased to assume a reserve-coded form. For example, item 1 “I think it is better to be yourself than to be popular” was rephrased as “I think it is better to be popular than to be yourself.” In Sweden, item 12 (“I feel alienated from myself”) was rephrased to a suitable sentence in Swedish, back-translated as “I feel like a stranger to myself.”
EFA was implemented in Iran (Shamsi et al., 2012), Canada (Grégoire et al., 2014), Portugal (Balbino et al., 2018), China (Xia et al., 2022), Russia (Nartova-Bochaver et al., 2021), and Sri Lanka (Zoysa et al., 2021). Item 1 yielded loadings below .50 in Canada and Sri Lanka, being removed from the latter. In Russia, item 1 migrated to accepting external influence. In addition, items 4 and 11 had loadings below .50 in Canada and Russia, respectively.
CFA were implemented in all adaptations but to Iranian (Shamsi et al., 2012), Swedish (Vainio & Daukantaitė, 2016), and Ukrainian (Zlyvkov et al., 2019). In Turkey (İlhan & Özdemir, 2013), Canada (Grégoire et al., 2014), Italy (Di Fabio, 2014), and China (Xia et al., 2022), the original model with a second-order factor was selected. In Canada, item 1 had loadings below .50, whereas items 1, 5, 8 and 9 had low loadings in Turkey. We did not have access to the factor loadings in the Italian form, and they were not reported in the adaptation to China. The Russian form also selected a second-order model yet with item 1 loading onto accepting external influence due to the results of EFA. In addition, item 4 was removed due to its high correlations to items 5 and 10. In Portugal (Balbino et al., 2018), the 3-factor solution with no second-order factors was selected, and all loadings were above .50. The 3-factor solution was also selected in Sri Lanka (Zoysa et al., 2021) yet item 1 was removed due to the results of EFA. In addition, item 8 was correlated to items 5 and 6. In Serbia, the bifactor model was selected yet the factor loadings were not reported (Grijak, 2017).
Even though most studies tested the second-order model, CFA does not properly estimate second-order factors composed of only three first-order factors. In such a situation, the model has zero degrees of freedom and thus the fit indices may not be computed (Kline, 2011). Fit indices extracted in such circumstances pertain to the estimation of only the first-order factors. This is the reason why fit indices of second-order models and correlated-factor models are the same when the second-order factor has only three factors.
Regarding the bifactor models tested in Serbia, no justification for using such a factor structure was provided. A bifactor model include a general factor onto which all items load. This general factor reflects what is common among all items in a scale. In addition to this general factor, a bifactor model includes orthogonal specific factors. Hence, each item is an indicator of both the general factor and the orthogonal specific factors. The specific factors are not equivalent to first-order factors in a high-order structure. Rather, they represent the common variance of a subgroup of items once the common variance between all items has been partitioned out through the general factor (Dunn & McCray, 2020). Therefore, the use of bifactor models must embrace a rationale for a general factor underlining all items in a scale.
In all adaptations, reliability was assessed through alpha, with values ranging from 0.62 to 0.95. Other coefficients were extracted in Iran (Shamsi et al., 2012) and Russia (Nartova-Bochaver et al., 2021), with Spearman-Brown coefficient and omega indicating good reliability, respectively. Test-retest correlations were assessed in Canada (Grégoire et al., 2014), Serbia (Grijak, 2017) and China (Xia et al., 2022). In Serbia, low correlations were observed, which might be related to a larger interval between data collections.
Multigroup CFA have tested the invariance of the AS across different groups. In the construction study (Wood et al., 2008) and in the adaptions to China (Xia et al., 2022) and Sri Lanka (Zoysa et al., 2021), configural and metric invariance models were tested, whereas in Russia (Nartova-Bochaver et al., 2021) scalar invariance models were additionally tested. Metric invariance was observed across gender in the construction study and in the adaptations to Sinhala and Chinese; scalar invariance was observed in Russia. Metric invariance across ethnicity and occupation was observed in the construction study and in China, respectively. Metric invariance across age was observed in China and Russia. In Russia, scalar invariance across age was not observed, and a partial model with three freed item intercepts showed good fit. Lastly, in Russia, metric invariance across depression rates was observed, although scalar models did not demonstrate good fit.
Most adaptations assessed relations to other measures but to Iranian (Shamsi et al., 2012), Portuguese (Balbino et al., 2018), and Sinhala (Zoysa et al., 2021). The adaptations included measures of well-being (psychological well-being, subjective well-being, mental well-being, life satisfaction, harmony in life, sense of coherence, need satisfaction, self-esteem, and positive affect) and ill-being (psychological distress, anxiety, negative affect, depression, and stress). The Canadian adaptation (Grégoire et al., 2014) demonstrated discriminant validity from the Big Five personality.
To our knowledge, the AS has only been assessed by item response theory (IRT) in Ukraine (Zlyvkov et al., 2019). Three polytomous models were applied (i.e., graded response model, generalized partial credit model, and graded ratings scale model), with a primary focus on comparing the models. The item parameters were not reported.
Research Goals
This study introduces distinct types of validity and reliability evidence of the AS in Brazil and Portugal. To our knowledge, this is the first study assessing the psychometric properties of the AS in Brazil. Although the psychometric properties of the AS have been previously assessed in Portugal, this paper introduces a new version of the AS with a modification to one item (for more details, refer to the ‘Measures’ section). Moreover, conducting a study with samples from both Brazil and Portugal enables cross-cultural comparisons. The specific goals are: (a) to test the internal structure and reliability of the AS; (b) to test the invariance of the AS across culture, gender, age, education, occupation, and Covid-related concern and impact; (c) to assess the AS following an IRT approach; (d) to assess potential ceiling and floor effects; and (e) to identify convergent validity evidence based on the relation to presence of meaning. Moderate correlations to presence of meaning are expected. In addition, the magnitudes of the correlations to self-alienation are expected to be higher as this is the authenticity dimension mostly related to meaning in life.
Method
Participants and Procedures
Data collection took place as part of the project ‘Future Time Orientation and Life Project: A theoretical and transcultural approach from a psychosocial perspective.’ The dataset used in this paper is a subset composed of 1,786 participants who responded to the AS in Brazilian or European Portuguese. Data collection was entirely online, from March to December 2020, via LimeSurvey platform. Incomplete answers (n = 2, 0.1%), foreigners (n = 23, 1.3%), and participants who selected the same response category for scales with reverse items (n = 43, 2.4%) were eliminated. Two participants exhibited unusual patterns of response (i.e., Mahalanobis distance per degree of freedom above 4.0) in multiple scales and were therefore eliminated. We examined participants whose responses in the AS had a Mahalanobis distance per degree of freedom above 2.5. Altogether, 19 participants were eliminated as they selected only extreme responses across all items. As the AS is anchored only in the extreme points, those participants might have not understood how to use the rating-scale.
Therefore, 1,699 participants were analyzed. The overall sample had an age range of 18 to 72 years, with M = 31.1 (SD = 11.60). In Brazil (n = 1,077), ages ranged from 18 to 72 years old, with M = 32.3 (SD = 11.80); while in Portugal (n = 622) ages ranged from 18 to 72 years, with M = 29.0 (SD = 10.94). As seen in Table 1, participants were predominantly Caucasian, female, workers, and had a college degree.
Notes.TGNC = Transgender and gender non-conforming people,
1The question allowed for multiple answers,
2The category was not originally included in the study but was commonly reported by participants in the ‘other’ field.
Data collection occurred during the Covid-pandemic and two questions assessed (a) the level of concern regarding the pandemic; and (b) the extent to which the pandemic impacted the answers in the survey. As seen in Table 1, most participants expressed moderate to high levels of concern, but most indicated that the pandemic had little to no impact on their answers.
Measures
Authenticity Scale (Appendix 1)
The forms used in this study differ slightly from that adapted by Balbino et al. (2018). The Brazilian form was created with two items being slightly modified considering syntactic particularities in Brazil. Item 12 (“I feel alienated from myself”) was modified in both forms to enhance comprehension. Taking inspiration from the Swedish form (Vainio & Daukantaitė, 2016), a statement that aligns better with daily language was created and is back-translated as “I feel like a stranger to myself.”
Meaning in Life Questionnaire
The Meaning in Life Questionnaire (MLQ; Steger et al., 2006) measures presence of and search for meaning in life. However, in this study, only presence of meaning was used. It consists of a 5-item subscale responded to in a 7-point rating scale ranging from ‘totally false’ to ‘totally true.’ The MLQ has been adapted in Brazil (Damásio & Koller, 2015) and Portugal (Portugal, 2017). The internal structure was tested in our samples by CFA with maximum likelihood robust (MLR) estimator. The single-factor model yielded good fit in Brazil, χ² = 29.6(5), p > .001, CFI = .986, TLI = .972, RMSEA (90% C. I.) = .068 (.050; .087), SRMR = .019, and Portugal, χ² = 16.9(5), p = .005, CFI = .988, TLI = .977, RMSEA (90% C. I.) = .062 (.036; .089), SRMR = .019.1 Reliability analysis yielded excellent results both in Brazil, α = .90 and ω = .90, and Portugal, α = .91 and ω = .91.
Data Analysis
The AS was tested by CFA considering three models: the 3-factor solution with correlated factors, the unidimensional model, and the bifactor model. In accordance with Barrett-Lennard (1998), authenticity is a tripartite construct. Therefore, we anticipated that the 3-factor solution would provide the best fit. The bi-factor model was also tested to compare our findings with the Serbian version (Grijak, 2017), which supported the bi-factor model. Additionally, the unidimensional model was tested to further support that the 3-factor solution best fit the data. MLR estimator was chosen because data violated multivariate normality in Brazil, Mskewness = 3,539.1, p < .001, Mkurtosis = 49.4, p < .001, and Portugal, Mskewness = 2,456.0, p < .001, and Mkurtosis = 41.9, p < .001. We considered using weighted least squares mean and variance adjusted (WLSMV) or unweighted least squares mean and variance adjusted (ULSMV) estimators, as they are widely regarded as the most appropriate methods for ordinal data (Li, 2016; Rhemtulla et al., 2012). Conversely, as a few items lacked responses in some response categories, WLSMV and ULSMV were not allowed. Thus, choosing MLR stems from a Monte Carlo simulation study suggesting 7-point scales are appropriately tested by MLR (Rhemtulla et al., 2012). Goodness-of-fit indices cutoffs were based on Schreiber et al. (2006), who recommended comparative fit index (CFI) and Tucker-Lewis index (TLI) above or equal to .95, and standardized root mean square residual (SRMR) and root mean square error of approximation (RMSEA) below .06. The following cutoffs were considered acceptable: CFI and TLI above or equal to .90, and RMSEA and SRMR below .080 (Brown, 2006).
Multigroup CFA tested the invariance of the AS’s factor structure (configural model), factor loadings (metric model), and item intercepts (scalar model) across groups. Consistent with previous studies, invariance models across gender, age, and occupation were tested. Transgender and gender non-conforming people were not considered for the models across gender because only a few participants belonged to this category. Invariance across age compared youths (up to 30 years old) and non-youths. Invariance across occupation compared workers and students, with participants both workers and students being ignored. Additionally, invariance models across education and culture were tested to ensure that individuals from diverse educational and cultural backgrounds respond to the AS in a similar psychometric pattern. Invariance across culture compared participants from Brazil and Portugal. Invariance across education compared participants with and without a college degree. Invariance models across gender, age, education, and occupation were tested in each country separately. Moreover, invariance models across Covid-related concern and impact were tested to ensure the pandemic did not influence participants’ response patterns. Invariance across Covid-related concern and impact considered the entire sample due to the reduced number of responses in some categories. The two first categories of Covid-related concern and the two last categories of Covid-related impact were collapsed. To establish invariance across groups, in addition to good fit, we expected no big differences between compared models (configural versus metric, and metric versus scalar), i.e., ΔRMSEA ≤ .050 and ΔCFI ≥ -.010 (Cheung & Rensvold, 2002).
The AS was tested by graded response model (GRM; Samejima, 1969), an IRT approach for polytomous data. The three first response categories for authentic living items were collapsed because the two first ones were answered by a reduced number of participants, n < 19. According to Linacre (1999), distributions with long tails of relatively infrequently used categories may bias item calibration. Hence, only four difficulty thresholds were estimated for authentic living. Separate GRM per factor were preferred over a multidimensional approach because tridimensional item characteristic curves (ICC) and test information curves (TIC) are hardly interpretable. Moreover, while response categories were collapsed for the authentic living subscale, the number of response categories varied across items. Hence, generalized partial credit model (GPCM) is better suited to the nature of the data, as it allows for the calibration of items with different numbers of response categories. However, GPCM is restricted to unidimensional models in mirt package (used in this study). Nevertheless, for the purpose of comparison, Appendix 2 shows the item parameters considering a multidimensional GRM, with no big differences being observed. In this model, response categories for authentic living were not collapsed, and self-alienation and accepting external influence items were reverse-coded to ensure all items were directly correlated. Lastly, GRM were preferred over Rasch models because they compute two item parameters (discrimination and difficulty) rather than one (difficulty). In addition, the fit indices of GRM were contrasted to Rasch models, with GRM showing the best performance (Appendix 3).
Statistical assumptions and fit indices were tested. Unidimensionality was tested by Loevinger’s H coefficient, with values above .30 being expected (Sijtsma & Molenaar, 2002). Local dependence was tested by Q3 test, with values below |1/(L - 1)| (L meaning the length of the scale) being expected (Yen, 1993). Monotonicity was assessed by scalability coefficient H, with values above .30 being expected (Mokken, 1971). Item fit was tested by RMSEA (Cook et al., 2009). Person fit was tested by Zh statistics, with values below -3.0 suggesting potential aberrant response patterns (Paek & Cole, 2019). The reliability of latent trait was tested by Rho coefficient, with ρ <. 70 being expected (Sijtsma & Molenaar, 1987).
GRM parameters were then interpreted. Discrimination (a) informs the degree to which the responses are able to distinguish individuals with different latent trait levels (ɵ). GRM provides an index of discrimination that can be interpreted as follows: a > 1.69, very high; a > 1.34, high; a < 0.64, moderate; otherwise, low (Baker & Kim, 2017). GRM estimates K - 1 difficulty thresholds, where K represents the number of response categories. Item difficulty indicates the ɵ at which an individual is equally likely to endorse two adjacent response categories. Hence, b1 designates the ɵ at which an individual is equally likely to respond to the first or second categories; and so on. General item difficulty (b) suggests the ɵ at which one has the same chance to respond to the first and last categories. ICC and TIC were plotted to assess the adequacy of the rating scale and the range of ɵ that the test assesses most effectively.
Reliability was tested by alpha (α), omega (ω), Spearman-Brown coefficient (rkk )2, and average variance extracted (AVE) 3. The following cutoffs were used to interpret α, ω, and rkk: below .50, inacceptable; below 0.60, poor; below, 0.70, questionable; below 0.80, moderate; below 0.90, good; otherwise, excellent (Gliem & Gliem, 2003). AVE values above .50 were expected (Fornell & Larcker, 1981). The percentage of participants with the minimum and maximum punctuations were computed to assess ceiling or floor effects, with percentages over 15% indicating ceiling/floor effect (Terwee et al., 2007).
Associations between the AS and presence of meaning were assessed via Pearson correlations. Factor scores were computed using the maximum a posteriori method. Pearson correlation was selected because the values of skewness and kurtosis were between -1.0 and 1.0. The following cutoffs were used: r < .30, weak; r < .50 are moderate; otherwise, strong (Dancey & Reidy, 2007). Coefficients were compared with the AVE, with values lower than the root square of AVE being expected (Fornell & Larcker, 1981). R-to-Z transformations tested whether correlations between presence of meaning and self-alienation were stronger than correlations between presence of meaning and the other two factors.
The sample size was adequate for all analyses. For CFA, a sample size calculator (Soper, 2023) suggested a minimum of 100 participants for testing the model structure. Considering α = .05, β = .20, and the smallest subsample (n = 159), CFA were powerful to detect significant parameters with an effect of .264. All CFA models were implemented with and without outliers. The outliers were retained as no big differences were observed after their removal. For GRM, the sample size exceeded 500 (Nunes & Primi, 2005).
All analyses were conducted using R software 4.1.3 (R Core Team, 2023). The following packages were used: lavaan (version 0.6-9; Rosseel, 2012) for CFA; mirt (version 1.34, Chalmers, 2012) for GRM; semTools (version 0.5-5; Jorgensen et al., 2021) and multicon (version 1.6; Sherman, 2015) for reliability.
Results
The model with three correlated factors achieved excellent fit indices in Brazil, χ² = 151.9(51), p < .001, CFI = .973, TLI = .965, RMSEA (90% C. I.) = .043 (.036; .050), SRMR = .038, and Portugal, χ² = 118.1(51), p < .001, CFI = .968, TLI = .958, RMSEA (90% C. I.) = .046 (.036; .056), SRMR = .041. The single-factor solution showed a poor fit in Brazil, χ² = 1,284.9(54), p < .001, CFI = .675, TLI = .602, RMSEA (90% C. I.) = .145 (.139; .152), SRMR = .113, and Portugal, χ² = 926.4(54), p < .001, CFI = .580, TLI = .487, RMSEA (90% C. I.) = .161 (.153; .170), SRMR = .115. The bifactor model reached the best fit indices in Brazil, 68.9(36), p < .001, CFI = .991, TLI = .984, RMSEA (90% C. I.) = .029 (.020; .038), SRMR = .021, and Portugal, χ² = 58.5(36), p < .001, CFI = .989, TLI = .980, RMSEA (90% C. I.) = .032 (.017; .045), SRMR = .025. However, in both samples, the covariance matrix was not positive definite, and several factor loadings were non-significant and below .50. Therefore, the 3-factor solution was retained. As seen in Figure 1, all factor loadings were above .500. Lastly, as seen in Table 2, the 3-factor solution demonstrated scalar invariance in all conditions.
As seen in Appendix 4, GRM assumptions were partially observed, whereas item fit and person fit were good. Loevinger’s and scalability H coefficients were all above .30, which suggest unidimensionality and monotonicity. One item pair of authentic living and self-alienation, and two item pairs of accepting external influence exceeded the expected cutoff, indicating local dependence and potential biases in the GRM calibration. RMSEA values were all below .060, suggesting good item fit. As for person fit, less than 5% of respondents had Zh values below -3.0. All ρ coefficients were above .70.
As seen in Table 3, all items exhibited very high discrimination, except for item 1. Authentic living items were less difficult compared to self-alienation and accepting external influence items. That is, higher rates of the latent trait level were needed to endorse the highest response categories for self-alienation and accepting external influence items. Conversely, in the case of authentic living items, the highest response categories were endorsed by participants with fewer rates in the latent trait level. As seen in Figure 2, authentic living subscale was most effective in assessing participants with ɵ between -3.0 and 2.0, whereas self-alienation and accepting external influence subscales demonstrated best performance among respondents with ɵ from -2.0 to 3.0. Furthermore, as seen in Figure 3, the 5-point rating scale with the three first categories merged did not fit item 1, from authentic living subscale. Participants did not discriminate the four first categories. In self-alienation subscale, the 7-point scale did not fit the response patterns of items 2 and 10, with no differentiation between the second and third categories. In addition, participants did not distinguish the sixth and seventh categories of item 2. The 7-point scale fitted the response patterns of all accepting external influence items.
Table 4 displays the results of ceiling/floor effect and reliability. The mean of authentic living was much higher compared to self-alienation and accepting external influence. Over 15% of participants from the two samples reached the maximum punctuation in authentic living, meaning ceiling effect. Self-alienation and accepting external influence demonstrated good reliability. In authentic living subscale, α, ω, and rkk reached moderate values, and AVE was below .50. With the exclusion of item 1, the reliability of authentic living would have yielded better results in Brazil, α = .75, ω = .75, rkk = .73, and AVE = .50, and Portugal α = 0.77, ω = .77, rkk = .75, and AVE = .53.
Notes. *p > .05,
**p > .001,
Aut = authentic living, Ali = self-alienation, Ext = accepting external influence, Min. punct. = minimum punctuation, Max. punct. = maximum punctuation, MLQ = presence of meaning subscale.
Table 4 displays the correlation results. As expected, self-alienation and accepting external influence were positively correlated with each other, as well as inversely correlated to authentic living. The correlation magnitudes were moderate and below the root square of the AVE. As expected, MLQ was positively correlated to authentic living and inversely correlated to self-alienation and accepting external influence. The magnitudes of correlations were moderate, except for accepting external influence in Portugal, which showed a weak correlation. R-to-Z transformations suggested that correlations to self-alienation were stronger compared to authentic living (Z = 4.357, p < .001, in Brazil; Z = 4.947, p < .001, in Portugal) and accepting external influence (Z = 7.697, p < .001, in Brazil; Z = 8.319, p < .001, in Portugal).
Discussion
This study introduces distinct types of validity and reliability evidence of the AS in a sample from Brazil and Portugal. Based on CFA, the 3-factor solution assessing self-alienation, authentic living, and accepting external influence was selected. The previous adaptation in Portugal (Balbino et al., 2018) also concluded for a 3-factor solution. To our knowledge, this is the first study testing the internal structure of the AS in Brazil. Unlike the original internal structure, a second-order factor was not extracted because CFA does not compute the fit indices of models with zero degrees of freedom (Kline, 2011). Two alternative models were tested: the unidimensional model and the bi-factor model. The unidimensional model was rejected due to its poor fit to the data. In turn, despite the bi-factor model showing the best fit indices, the covariance matrix in both samples was not positive definite, and several factor loadings were non-significant and below .50. Although the bi-factor model was retained in Serbia (Grijak, 2017), the factor loadings were not reported, limiting the drawing of conclusions.
Multigroup CFA concluded for scalar invariance across culture, gender, age, education, occupation, and Covid-related concern and impact. The results are in line with previous studies that tested the invariance of the AS across gender (Nartova-Bochaver et al., 2021, Wood et al., 2008; Xia et al., 2022; Zoysa et al., 2021). In China (Xia et al., 2022), metric invariance across occupation was observed yet the scalar model was not tested. Metric invariance across age was observed in China (Xia et al., 2022) and Russia (Nartova-Bochaver et al., 2021), though in the latter the scalar model reached good fit only after freeing three item intercepts. Thus, there might be aspects in Russian culture that influence intercepts across age, while item intercepts are equivalent across age groups in Brazil and Portugal. To our knowledge, this is the first study testing invariance across culture, education, and Covid-related concern and impact.
So far, it seems this is the first study discussing the AS item parameters following an IRT approach. Discrimination was very high for nearly all items, meaning the AS is able to distinguish people with different trait levels. Items’ difficulty and TIC suggested the AS is most effective in assessing participants with medium to high rates of self-alienation and accepting external influence, as well as participants with medium to low rates of authentic living. Therefore, the AS demonstrates low reliability among participants with high authenticity traits. Lastly, ICC suggested the inadequacy of the rating-scale. This might be attributed to the lack of anchoring for the intermediate points on the rating-scale. Indeed, this was the reason for the elimination of 19 participants who potentially had difficulty understanding how to use the rating-scale.
The results suggest a ceiling effect for the authentic living subscale. In both samples, the mean of direct scores was very high and the percentage of participants with the maximum punctuation was above 15%. Moreover, a reduced number of participants selected the two first response categories for authentic living items, prompting the collapse of the three first response categories prior to GRM calibration. Hence, the ceiling effect might have exacerbated the bad performance in the assessment of people with high authentic living traits.
This study reported four types of reliability coefficients. Self-alienation and accepting external influence demonstrated good reliability, while authentic living achieved moderate reliability and an AVE below .50. The existing literature mostly reported alpha, yielding values similar to those obtained in this study. In Russia (Nartova-Bochaver et al., 2021), omega was assessed and achieved similar results to the present study. Lastly, Spearman-Brown coefficient was assessed only in Iran, considering the whole set of items. To our knowledge, this is the first study assessing the AVE of AS factors, enabling the identification of discriminant validity by comparing the correlation coefficients to the square root of AVE.
The study identified associations between authenticity and presence of meaning. The results are in line with the literature on the relationship between authenticity and well-being (e.g., Di Fabio, 2014; Grijak, 2017), including studies with the MLQ (Akin & Taş, 2015). This study advances the literature by comparing the correlation coefficients between factors. R-to-Z transformation revealed that correlations to self-alienation were stronger than correlations to authentic living and accepting external influence. This corroborates our hypothesis that self-alienation would have higher correlation coefficients. This is the authenticity dimension mostly related to knowledge about the self and, consequently, meaning in life.
Finally, multiple findings indicated that item 1 has a dubious performance. In the CFA, item 1 yielded a factor loading below .50. In GRM, item 1 was the only one that did not exhibit very high discrimination. In addition, ICC indicated that participants did not distinguish between the four first response categories, potentially exacerbating ceiling effect. Lastly, the elimination of item 1 would have improved the reliability of authentic living in both samples. The bad performance of item 1 has also been observed in Turkey (İlhan & Özdemir, 2013), Canada (Grégoire et al., 2014), and Sri Lanka (Zoysa et al., 2021). In Russia (Nartova-Bochaver et al., 2021), item 1 migrated to accepting external influence subscale. Item 1 assesses the respondent’s inclination to prioritize alignment with their values over popularity. The bad performance might be attributed to variations in individuals’ valuing of popularity. In other words, individuals may value not being popular yet still accept and introject others’ expectations.
Limitations and Future Directions
This study has some limitations. First, the data collection was online with a convenience sample. Future studies employing representative samples in more controlled environments may assess the AS under less biased conditions. Second, participants’ responses were maldistributed, especially in authentic living subscale. The maldistribution of the data might have affected the quality of data analysis. For instance, the lack of responses in certain categories precluded the use of ordinal estimators. Third, violations of local dependency might have biased the GRM calibration. Lastly, given that the AS has only three factors, the fit indices for the original model with a second-order could not be computed. Therefore, we cannot ensure that the model without a second-order factor is actually better than the original model with a second-order factor.
Based on the results of this study, future versions of the AS may be constructed. First, alternative rating-scales with fewer response categories and with all points properly anchored may be used. Second, items may be rephrased or created to properly distinguish individuals with high authenticity traits. The new items should be more difficult, in the case of authentic living, and less difficult, in the case of self-alienation and accepting external influence. Third, because of its bad performance, item 1 should be removed or rephrased.
Conclusion
This study introduces distinct types of validity and reliability evidence of the AS in Brazil and Portugal, based on CFA, invariance models, GRM, four types of reliability coefficients, and relations to presence of meaning. The findings indicate that the internal structure of the AS has three factors assessing self-alienation, authentic living, and accepting external influence, with the three subscales showing moderate to good reliability. The internal structure, factor loadings, and item intercepts are invariant across different groups, and associations with presence of meaning showed additional validity evidence. All items properly assess authenticity, except for item 1, which showed a few unsatisfactory results. Despite the good evidence, the rating-scale is inappropriate for some items, especially for authentic living subscale, which is affected by ceiling effect. Moreover, the three subscales are not able to distinguish individuals with high authenticity traits. Regardless of limitations, the study suggests that the AS may be effectively employed to assess authenticity, especially if refined versions are created to overcome the limitations acknowledged in this study.