<?xml version="1.0" encoding="ISO-8859-1"?><article xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<front>
<journal-meta>
<journal-id>0213-9111</journal-id>
<journal-title><![CDATA[Gaceta Sanitaria]]></journal-title>
<abbrev-journal-title><![CDATA[Gac Sanit]]></abbrev-journal-title>
<issn>0213-9111</issn>
<publisher>
<publisher-name><![CDATA[Sociedad Española de Salud Pública y Administración Sanitaria (SESPAS)]]></publisher-name>
</publisher>
</journal-meta>
<article-meta>
<article-id>S0213-91112002000200010</article-id>
<title-group>
<article-title xml:lang="en"><![CDATA[Correspondence analysis of the Spanish National Health Survey]]></article-title>
<article-title xml:lang="es"><![CDATA[El análisis de correspondencias en la explotación de la Encuesta Nacional de Salud]]></article-title>
</title-group>
<contrib-group>
<contrib contrib-type="author">
<name>
<surname><![CDATA[Greenacre]]></surname>
<given-names><![CDATA[M.]]></given-names>
</name>
<xref ref-type="aff" rid="A01"/>
</contrib>
</contrib-group>
<aff id="A01">
<institution><![CDATA[,Universitat Pompeu Fabra Centre for Research in Health Economics Department of Economics and Business]]></institution>
<addr-line><![CDATA[Barcelona ]]></addr-line>
</aff>
<pub-date pub-type="pub">
<day>00</day>
<month>04</month>
<year>2002</year>
</pub-date>
<pub-date pub-type="epub">
<day>00</day>
<month>04</month>
<year>2002</year>
</pub-date>
<volume>16</volume>
<numero>2</numero>
<fpage>160</fpage>
<lpage>170</lpage>
<copyright-statement/>
<copyright-year/>
<self-uri xlink:href="http://scielo.isciii.es/scielo.php?script=sci_arttext&amp;pid=S0213-91112002000200010&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.isciii.es/scielo.php?script=sci_abstract&amp;pid=S0213-91112002000200010&amp;lng=en&amp;nrm=iso"></self-uri><self-uri xlink:href="http://scielo.isciii.es/scielo.php?script=sci_pdf&amp;pid=S0213-91112002000200010&amp;lng=en&amp;nrm=iso"></self-uri><abstract abstract-type="short" xml:lang="en"><p><![CDATA[This report gives a comprehensive explanation of the multivariate technique called correspondence analysis, applied in the context of a large survey of a nation's state of health, in this case the Spanish National Health Survey. It is first shown how correspondence analysis can be used to interpret a simple cross-tabulation by visualizing the table in the form of a map of points representing the rows and columns of the table. Combinations of variables can also be interpreted by coding the data in the appropriate way. The technique can also be used to deduce optimal scale values for the levels of a categorical variable, thus giving quantitative meaning to the categories. Multiple correspondence analysis can analyze several categorical variables simultaneously, and is analogous to factor analysis of continuous variables. Other uses of correspondence analysis are illustrated using different variables of the same Spanish database: for example, exploring patterns of missing data and visualizing trends across surveys from consecutive years.]]></p></abstract>
<abstract abstract-type="short" xml:lang="es"><p><![CDATA[Este artículo desarrolla una amplia explicación de una técnica de análisis multivariada denominada análisis de correspondencias, aplicándola a datos de una encuesta nacional de salud, en este caso la Encuesta Nacional de Salud española (ENS). Primero se indica cómo puede utilizarse el análisis de correspondencias para interpretar una tabla de contingencia visualizándola en forma de un gráfico de puntos que representan las filas y columnas de la tabla. También pueden ser interpretadas diferentes combinaciones de las variables codificando los datos de la manera apropiada. Esta técnica puede emplearse también para obtener valores óptimos de escala para los niveles de una variable categórica, dándole de este modo un sentido cuantitativo a este tipo de variables. El análisis de correspondencias múltiple puede analizar varias variables categóricas simultáneamente, y es análogo al análisis de factores de las variables continuas. Otras aplicaciones del análisis de correspondencias se ilustran usando diferentes variables de la ENS; por ejemplo, para analizar pautas en los datos perdidos y visualizando tendencias entre encuestas de años consecutivos.]]></p></abstract>
<kwd-group>
<kwd lng="en"><![CDATA[Correspondence analysis]]></kwd>
<kwd lng="en"><![CDATA[Health survey]]></kwd>
<kwd lng="en"><![CDATA[Principal component analysis]]></kwd>
<kwd lng="en"><![CDATA[Statistical graphics]]></kwd>
<kwd lng="es"><![CDATA[Análisis de correspondencias]]></kwd>
<kwd lng="es"><![CDATA[Encuesta de salud]]></kwd>
<kwd lng="es"><![CDATA[Análisis de componentes principales]]></kwd>
<kwd lng="es"><![CDATA[Gráficos estadísticos]]></kwd>
</kwd-group>
</article-meta>
</front><body><![CDATA[ <div align="center"><font face="Arial, Helvetica, sans-serif" size="2"><b>ESPECIAL</b></font>  </div> <hr size="2" noshade>     <p align="center"><font face="Arial, Helvetica, sans-serif" size="2"><b><font size="4">Correspondence    analysis of the Spanish National Health Survey</font></b></font></p>     <p align="center"><font face="Arial, Helvetica, sans-serif" size="2"> <b>M. Greenacre</b>    <BR>   Department of Economics and Business. Centre for Research in Health Economics    (CRES).     <br>   Universitat Pompeu Fabra. Barcelona.</font></p>     <p>&nbsp;</p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><i>Correspondencia:</i>    Michael Greenacre.     <br>   Universitat Pompeu Fabar. C/ Ramon Trias Fargas, 25-27. 08005 Barcelona    <br>   Correo electrónico: <a href="mailto:michael@upf.es">michael@upf.es</a></font></p>     <p align="right"><font face="Arial, Helvetica, sans-serif" size="2"><i>Recibido:    14 de junio de 2001.    ]]></body>
<body><![CDATA[<br>   Aceptado: 19 de octubre de 2001.</i></font></p>      <p><font face="Arial, Helvetica, sans-serif" size="2"><b>(El análisis de correspondencias    en la explotación de la Encuesta Nacional de Salud)</b></font></p> <hr size="2" noshade>     <p><font face="Arial, Helvetica, sans-serif" size="2"><b>Summary</b>    <br>   This report gives a comprehensive explanation of the multivariate technique    called correspondence analysis, applied in the context of a large survey of    a nation's state of health, in this case the Spanish National Health Survey.    It is first shown how correspondence analysis can be used to interpret a simple    cross-tabulation by visualizing the table in the form of a map of points representing    the rows and columns of the table. Combinations of variables can also be interpreted    by coding the data in the appropriate way. The technique can also be used to    deduce optimal scale values for the levels of a categorical variable, thus giving    quantitative meaning to the categories. Multiple correspondence analysis can    analyze several categorical variables simultaneously, and is analogous to factor    analysis of continuous variables. Other uses of correspondence analysis are    illustrated using different variables of the same Spanish database: for example,    exploring patterns of missing data and visualizing trends across surveys from    consecutive years.    <br>   <b> Keywords:</b> Correspondence analysis. Health survey. Principal component    analysis. Statistical graphics.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><b>Resumen</b>    <br>   Este artículo desarrolla una amplia explicación de una técnica de análisis multivariada    denominada análisis de correspondencias, aplicándola a datos de una encuesta    nacional de salud, en este caso la Encuesta Nacional de Salud española (ENS).    Primero se indica cómo puede utilizarse el análisis de correspondencias para    interpretar una tabla de contingencia visualizándola en forma de un gráfico    de puntos que representan las filas y columnas de la tabla. También pueden ser    interpretadas diferentes combinaciones de las variables codificando los datos    de la manera apropiada. Esta técnica puede emplearse también para obtener valores    óptimos de escala para los niveles de una variable categórica, dándole de este    modo un sentido cuantitativo a este tipo de variables. El análisis de correspondencias    múltiple puede analizar varias variables categóricas simultáneamente, y es análogo    al análisis de factores de las variables continuas. Otras aplicaciones del análisis    de correspondencias se ilustran usando diferentes variables de la ENS; por ejemplo,    para analizar pautas en los datos perdidos y visualizando tendencias entre encuestas    de años consecutivos.    <br>   <b>Palabras clave:</b> Análisis de correspondencias. Encuesta de salud. Análisis    de componentes principales. Gráficos estadísticos.</font></p> <hr size="2" noshade>     <p><font face="Arial, Helvetica, sans-serif" size="2"><B>Introduction</B></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The Spanish National Health    Survey (Encuesta Nacional de Salud) is an example of a large complex social    survey designed to establish a picture of the Spanish nation's state of health    at a particular moment in time. We take the 1997 survey as an example, in order    to show how correspondence analysis may be applied systematically to gain insight    into the survey results.</font></p>     ]]></body>
<body><![CDATA[<p><font face="Arial, Helvetica, sans-serif" size="2">In the 1997 survey there    are some 46 basic questions, many of which can have multiple responses, effectively    increasing the total number of questions to 83. Added to this there are several    questions which are conditional on the responses to the basic questions, giving    a maximum of 27 additional questions. Each of the 6,400 respondents interviewed    thus provides between 83 and 110 items of information, so that the complete    data file comprises approximately 640,000 numbers.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The usual way to summarize    such data is to count frequencies of response and present these in tables or    in graphical form, usually bar or line charts. A second level of analysis is    to explore relationships between different questions in the survey. Standard    procedures are available when the questions involve quantitative responses,    for example correlation-based methods such as regression analysis, principal    component analysis and factor analysis. In the case of categorical responses,    which predominate in questionnaire surveys, the way to proceed is less obvious,    for example relating health status, which is a multicategory variable having    five possible responses, and the intake of medecines, where there are as many    as 17 categories of medecine.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">We aim to show how correspondence    analysis can be used to explore relationships between variables in a complex    health survey and suggest models for these relationships. Correspondence analysis    is a method aimed specifically at quantifying categorical data, that is assigning    numerical scale values to the response categories of discrete variables, with    certain optimal properties. These scale values have been shown to have interesting    geometric properties and provide what are called «maps» of the relationships    between variables.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">After introducing the method    in «Correspondece analysis», we shall give a simple illustration in «Applications    to crosstabulations» using a crosstabulation computed from the 1997 health survery.    Further applications will be given using more complex crosstabulations. In «Correspondence    analysis as a scaling method» we shall show how correspondence analysis can    be used to develop scales which synthesize the responses to several questions    which have a common theme. This is of great use in model building, since several    categorical variables can be replaced by a single scale which can then be used    in subsequent analyses such as regression analysis which require interval-scaled    data. Several other issues will be dealt with, for example, the exploration    of patterns of missing data («Exploring missing data») and how to explore trends    between surveys from different years («Trend data»).</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><B>Correspondence analysis</B></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The theory of correspondence    analysis is fully explained in several texts<SUP>1-6</SUP>, including one in    the context of biomedical research<SUP>7</SUP>. Here a non-technical introduction    will be presented in the context of the health survey data.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">In its simplest form, correspondence    analysis applies to a two-way crosstabulation, like the one in <a href="#tab1">table    1</a>. This table summarizes the distribution of perceived health status categories    in different age groups. The ultimate aim of the method is to produce a «map»    of this table, where each row and each column is represented by a point. This    approach is very similar to that of principal component analysis, in that a    measure of total variance of the table is defined and then this total is decomposed    optimally along so-called «principal axes». For mapping purposes it is usually    hoped that a large percentage of total variance is accounted for by the first    two principal axes, thereby allowing the table to be visualized in two dimensions.</font></p>     <p><a name="tab1"></a></p>     <p>&nbsp;</p>     <p align=center><font face="Arial, Helvetica, sans-serif" size="2"><img src="/img/gs/v16n2/a07tab01.gif"></font></p>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Correspondence analysis    contains there basic concepts, that of a profile point in multidimensional space,    a weight (or mass) assigned to each point and finally a distance function between    the points, called the</font> <font face="Symbol" size="2">c</font><font face="Arial, Helvetica, sans-serif" size="2"><SUP>2</SUP>    distance <I>(chi-square distance)</I>. Once these three concepts are defined,    the method optimally reduces the dimensionality of the points by projecting    them onto a subspace, usually a two-dimensional plane. This subspace is fitted    to the points by weighted least-squares, where each point is weighted by its    respective mass, and distances between points and the subspace are measured    in terms of</font> <font face="Symbol" size="2">c</font><font face="Arial, Helvetica, sans-serif" size="2"><SUP>2</SUP>    distance.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Let us look at each of these    three concepts in turn. Since correspondence analysis is defined equivalently    for rows or columns, we shall explain it in terms of the rows of <a href="#tab1">table    1</a>, with the understanding that the columns are analyzed in an identical    fashion if we simply transpose the matrix at the start.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Each row divided by its    row total is a vector called a <I>profile</I>, that is a set of proportions    adding up to 1. In <a href="#tab2">table 2</a> we have expressed the elements    of each profile in the more familiar form of percentages which add up to 100%.    It is the profiles which define the points in multidimensional space. The eventual    map will attempt to show us these points representing the rows, or age groups    in this case, where each age group is described by the vector of five coordinates,    its distribution across the health status categories.</font></p>     <p><a name="tab2"></a></p>     <p>&nbsp;</p>     <p align=center><font face="Arial, Helvetica, sans-serif" size="2"><img src="/img/gs/v16n2/a07tab02.gif"></font></p>     <p>&nbsp;</p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Each row profile point is    then given a weight which is essentially a measure of importance of the point,    called the <I>mass</I>. The row mass is the frequency of the row category divided    by the grand total. For example, since age group 16-24 has 1223 respondents    out of the total of 6371, then this row point is weighted by the mass 1223/6371    = 0.192. The row masses add up to 1, and are nothing else but the row marginal    proportions of the table.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Finally we measure distance    between row points by the</font> <font face="Symbol" size="2">c</font><font face="Arial, Helvetica, sans-serif" size="2"><SUP>2</SUP>    <I>distance</I>, which is a slight variant of the usual physical distance between    points in vector space. Physical distance between two vectors x = &#91;<I>x</I><SUB>1</SUB>    <I>x</I><SUB>2</SUB> ... <I>x</I><SUB>n</SUB>&#93; and y = &#91;<I>y</I><SUB>1</SUB>    <I>y</I><SUB>2</SUB> ... <I>y</I><SUB>n</SUB>&#93; is measured as:</font></p>     ]]></body>
<body><![CDATA[<p align=center><font face="Arial, Helvetica, sans-serif" size="2"><img src="/img/gs/v16n2/a07for01.gif"></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The</font> <font face="Symbol" size="2">c</font><font face="Arial, Helvetica, sans-serif" size="2"><SUP>2</SUP>    distance, however, is a distance which weights each squared term inversely by    the corresponding column marginal proportion as follows:</font></p>     <p align=center><font face="Arial, Helvetica, sans-serif" size="2"><img src="/img/gs/v16n2/a07for02.gif"></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">where in our example (see    <a href="#tab1">table 1</a>) c<SUB>1</SUB> = 817/6371 = 0,128, c<SUB>2</SUB>    = 3542/6371 = 0,556, and so on. The idea is to compensate for the different    variances in the columns of the profile matrix. The range of values in the first    column of <a href="#tab2">table 2</a> will tend to be small, since the percentages    are smaller (they vary from 5.1 to 19.9, that is 14.8 percentage points), whereas    the range in the second column will be greater because overall they are larger    percentages (they vary from 34.3 to 65.6, that is 31.3 percentage points). Dividing    by the column margin effectively equalizes out these inherent differences in    the column variances, and it can be argued that the <I>chi-square distance</I>    is the natural Euclidean distance for frequency data.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The total variance in correspondence    analysis is measured by the inertia, which is equal to the usual Pearson</font>    <font face="Symbol" size="2">c</font><font face="Arial, Helvetica, sans-serif" size="2"><SUP>2</SUP>    statistic calculated on the crosstabulation, divided by the total sample size    <I>n</I>. It is this inertia which measures the degree of difference between    the age groups that we are trying to represent optimally in the eventual map.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">As we have said, the map    &#150;usually two-dimensional&#150; is obtained by weighted least-squares, and    the row profile points are projected onto the map. The coordinates of these    points are called principal coordinates, because they are the coordinates with    respect to the principal axes of the space. Each principal axis accounts for    a certain amount of the total inertia, called the principal inertia, usually    expressed as a percentage of the total.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">In addition we have points    in the map representing the columns as well. There are different ways of representing    the columns jointly with the rows, but the most common way is known as the symmetric    map. In this map the column profiles have been analyzed in exactly the same    way as we have just described, as if the matrix were transposed and the whole    process repeated in a symmetric fashion, leading to the principal coordinates    of the columns. The rows and columns are then jointly plotted with respect to    the same axes, both in principal coordinates. The merits and demerits of this    joint display are discussed in many texts<SUP>3,6</SUP>. Rather than enter into    such a discussion, we prefer to illustrate how to interpret such maps correctly    using actual examples.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><B>Applications to crosstabulations</B></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">As a first illustration    of how correspondence analysis operates, <a href="/img/gs/v16n2/html/a07fig01.html">figure    1</a> shows the symmetric map of the age groups and health status categories    of <a href="#tab1">table 1</a>.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">What can we conclude from    this map? First we look at the amounts of inertia and especially their percentages    along each axis. Clearly, the first (horizontal) axis is very important, accounting    for 97.3% of the inertia, and the second is of insignificant importance, accounting    for only 1.5% of the total inertia. Thus the essential information in the original    table is captured by the horizontal spread of the points.</font></p>     ]]></body>
<body><![CDATA[<p><font face="Arial, Helvetica, sans-serif" size="2">The ordering of the health    status categories along this dimension agrees with the implied order, from «very    good» to «very bad», and their relative positions give scale values which can    be interpreted: for example, there is little difference between «bad» and «very    bad» but a very large difference between «good» and «regular».</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The age groups can now be    interpreted relative to the same dimension. We can thus see that there is only    a small change from age group 16-24 to age group 25-34, then a larger step to    age group 35-44, an even large step to age group 45-54, then the biggest step    to age group 55-64, and then smaller steps to group 65-74 and group <font face="Symbol">&sup3;    </font>75.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The health scale values    along the first axis (i.e., the principal coordinates) are centred and standardized    in a particular way in CA but can be linearly transformed to any other scale    to facilitate the interpretation. For example, we can transform these values    by a translation and scale change to have endpoints equal to 0 and 100, with    0 representing «very bad» and 100 «very good»:</font></p>     <p align=center><font face="Arial, Helvetica, sans-serif" size="2"><img src="/img/gs/v16n2/a07for03.gif"></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Notice that the category    «regular» is not in the middle of the scale, but very much towards the lower    end of the scale, at least in the perceptions of the respondents. Or, putting    it another way, it is clearly a big step in a negative direction to admit one's    health is «regular» as opposed to «good».</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Using the above scale values    one can establish a health status index and calculate average values for all    respondents in each age group:</font></p>     <p align=center><font face="Arial, Helvetica, sans-serif" size="2"><img src="/img/gs/v16n2/a07for04.gif"></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><a href="#fig2">Figure 2</a>    shows a conventional line plot of these values.</font></p>     <p><a name="fig2"></a></p>     <p>&nbsp;</p> <table width="379" border="0" align="center">   <tr>     <td>        <hr size="2" noshade>           ]]></body>
<body><![CDATA[<div align="center"><font face="Arial, Helvetica, sans-serif" size="2"><b>Figure          2. Plot of health status index (first dimension of correspondence analysis)          against age group.</b></font></div>       <hr size="1" noshade>     </td>   </tr>   <tr>     <td>            <div align="center"><font face="Arial, Helvetica, sans-serif" size="2"><img src="/img/gs/v16n2/a07fig02.gif"></font></div>       <hr size="1" noshade>     </td>   </tr> </table>     <p>&nbsp;</p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Because of the high sample    size in this survey, we can explore the data at least one level further by splitting    the age groups according to another variable. «Sex» is the most obvious one,    and <a href="#tab3">table 3</a> shows the crosstabulation of the seven age groups    split between males and females, tabulated again across the health categories.</font></p>     <p><a name="tab3"></a></p>     <p>&nbsp;</p>     <p align=center><font face="Arial, Helvetica, sans-serif" size="2"><img src="/img/gs/v16n2/a07tab03.gif"></font></p>     <p>&nbsp;</p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The symmetric map in <a href="/img/gs/v16n2/html/a07fig03.html">figure    3</a> shows immediately that females consistently rate themselves as unhealthier    than their male counterparts &#150;the female points are always to the left    of the male points of the corresponding age group, so that females of 65-74,    for example, are rating their health worse than males &gt; 75.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><a href="#tab1">Tables 1</a>    and <a href="#tab3">3</a> are contingency tables where the total of the table    is in each case the sample size. The following example is of a question which    has multiplle responses. The question is asked whether respondents have had    to reduce their normal leisure time activities because of some pain or other    symptom. For those that answer «yes», there follows a list of 18 possible symptoms,    17 specific ones and a category labelled «other». Since a respondent can indicate    more than one ailment, the variable «ailment» is not a single categorical variable,    but a set of 18 variables, one for each of the possible symptoms. There are    various ways to handle such a situation. In <a href="#tab4">table 4</a> we have    tabulated the distributions of the five health status categories for each subset    of respondents associated with the an ailment. Since these subsets can overlap    (more than one ailment possibly mentioned by a single respondent), the table's    total of 1369 is not the sample size but the number of ailments mentioned in    total. This is a problematic case if one wants to test association between the    rows and columns, but is still suitable for correspondence analysis which is    just depicting this association visually.</font></p>     ]]></body>
<body><![CDATA[<p><a name="tab4"></a></p>     <p>&nbsp;</p>     <p align=center><font face="Arial, Helvetica, sans-serif" size="2"><img src="/img/gs/v16n2/a07tab04.gif"></font></p>     <p>&nbsp;</p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><a href="/img/gs/v16n2/html/a07fig04.html">Figure    4</a> shows the symmetric map of this table. Again we find the five health status    categories spread along the first principal axis with relative positions similar    to those in the previous analyses. The ailments are thus scaled from left to    right in accordance with the associated health status: «chest problems», «ankles»,    «breathing problems» and «nerves» on the «bad» left side, and «teeth», «injuries»,    «throat» and «fever» on the «good» right side. The second axis is more important    here than in previous analyses, and is determined mostly by the status category    «very good» and the three ailments in the upper part of the map: «diarrhea»,    «injuries» and «teeth». This indicates a subgroup of people who do report problems,    but who also tend to report higher than average «very good» health, tending    to have one of these afflictions which is just a temporary problem. Or, putting    this another way, the ones with «very good» health are far from most of the    ailments, and can be characterised only by accidental injuries and dental problems.    Notice the position of «diarrhea», which is associated with a mixed group of    people, ones who view their health at the «very good» end of the scale, and    others at the opposite «very bad» end, and fewer than expected people with «regular»    health.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><B>Correspondence analysis    as a scaling method</B></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">We have already seen an    example in «Correspondance analysis» of what is called optimal scaling, where    we obtained values for the health status categories which lead to maximum separation,    or discrimination, of the age groups (or age-sex groups in the second example,    or different ailments in the third example). In <a href="/img/gs/v16n2/html/a07fig04.html">figure    4</a> we can consider the positions of ailments along the horizontal axis as    reflecting their degree of perceived severity, with the more severe ailments    on the left. In general, we can use CA to obtain optimal scale values for several    categorical variables that are interrelated.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">For example, a question    in the health survey asks respondents which of 16 diff erent types of medecines    they have taken during the previous two weeks (of the original 17 types, we    excluded birth-control pills which only apply to women). More than half of the    sample had not taken any medecines, so these respondents were excluded from    this analysis. This situation differs from the previous ones, because we are    not looking at the relation between the medecine consumption and another variable,    such as age or perceived health status. Here we are trying to reduce the dimensionality    of a set of variables in much the same way as in factor analysis, that is we    are looking for common factors which capture the relationships between the different    medecines. The objetive is identical to principal component analysis, apart    from the fact that the variables are categorical in nature, and have no obvious    quantifications, or scale values, assigned to the categories.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Multiple correspondence    analysis &#150;also known as <I>homogeneity analysis</I><SUP>8</SUP>&#150; is    a variant of correspondence analysis which looks for optimal scale values for    a set of categorical variables. To explain the optimality criterion inherent    in multiple correspondence analysis, let us suppose that we made the <I>ad hoc</I>    decision to assign the scale values 1 to each medecine taken and 0 to each medecine    not taken, for each of the 16 medecines. Then each of the <I>N</I> respondents    has a set of 16 scale values (which can be considered to form an <I>N </I><font face="Symbol">&acute;</font>    16 matrix), and we can calculate his or her overall score by adding up the scale    values, giving an additional column consisting of the <I>N</I> scores. For this    particular choice of scale values, the score is just the number of types of    medecine taken. As in a reliability study, we can now calculate the correlation    between the respondent score and each of the 16 scales, and measure how well    the score reflects the 16 scales. This measure is typically the average of the    squared correlations between the score vector and each of the 16 scales. Our    0/1 scale values are unlikely to maximize this criterion. Hence the objective    of multiple correspondence analysis is to find out which scale values lead to    a maximum value of this average squared correlation, so that in this sense the    score explains the most variance in each of the 16 scales. Once this score «factor»    has been identified we proceed to finding another set of scale values and associated    score, uncorrelated with the score already identified, which again maximizes    the average squared correlation, and so on.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">In this case the data set    is too large to report here, consisting of the numbers of respondents taking    each particular combination of medecines. The basic numerical results of the    analysis are given for the first three dimensions (i.e., factors) in <a href="#tab5">table    5</a>.</font></p>     ]]></body>
<body><![CDATA[<p><a name="tab5"></a></p>     <p>&nbsp;</p>     <p align=center><font face="Arial, Helvetica, sans-serif" size="2"><img src="/img/gs/v16n2/a07tab05.gif"></font></p>     <p align=center>&nbsp;</p>     <p><font face="Arial, Helvetica, sans-serif" size="2">In this table the eigenvalues,    or principal inertias, are the average squared correlations, for example 0.1031    is the average of the squared correlations for the first dimension. Another    way of thinking about the results is that the entries are coefficients of determination    (<I>R</I><SUP>2</SUP>) giving the variance explained of each variable by each    dimension (factor). Since the factors are uncorrelated, these <I>R</I><SUP>2</SUP>    can be added up row-wise to give explained variances for two factors, or three    factors, and so on. The dimensions are ordered in descending order of eigenvalue,    the quantity which is optimized at each step of the analysis.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The optimal scale values    for each medecine (not given here numerically) can be plotted, as before, in    a map (<a href="#fig5">fig. 5</a>). This gives an interesting view of the interrelationships    between the medecines, with the grouping at bottom right of the medecines for    chronic diseases, at the top for psychiatric and digestive problems and on the    left for the more common ailments of a transient nature.</font></p>     <p><a name="fig5"></a></p>     <p>&nbsp;</p> <table width="376" border="0" align="center">   <tr>     <td>        <hr size="2" noshade>           <div align="center"><font face="Arial, Helvetica, sans-serif" size="2"><b>Figure          5. Multiple correspondence analysis, showing optimal scale values in two          dimensions of «yes» responses to medecine types.</b></font></div>       <hr size="1" noshade>     </td>   </tr>   <tr>     <td>            <div align="center"><img src="/img/gs/v16n2/a07fig05.gif"></div>       <hr size="1" noshade>     </td>   </tr> </table>     ]]></body>
<body><![CDATA[<p>&nbsp;</p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Using <a href="#tab5">table    5</a> to identify the important points in the map, the first factor is a dimension    which groups together the following medecines, in order of explained variance:    medecines for blood pressure, for the heart, for lowering cholesterol and &#150;to    a lesser extent&#150; for diabetes as well as tranquillisers and sleeping pills.    It is interesting to note that medecines for minor ailments such as throat infection    &amp; flu, pains &amp; fever, and antibiotics, are on the opposite side of this    dimension. In other words, people who have been taking the former medecines    for chronic health complaints are usually not taking these latter ones for less    serious, transient, ailments.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The second factor groups    mainly the following medecines: tranquillisers &amp; sleeping pills, and antidepressants,    in other words the «psychiatric» dimension. Although not so well-explained by    this factor we also note high scale values for diarrhea and laxative medecines.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">As an analysis complimentary    to the mapping procedure, we can perform a hierarchical cluster analysis of    the 16 types of medecine. <a href="/img/gs/v16n2/html/a07fig06.html">Figure    6</a> shows the cluster tree, based on complete linkage and using the Jaccard    index to measure similarity between the medecines. We can see the same clusters    as in <a href="#fig5">figure 5</a>.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">In the optimal scaling we    can continue to interpret the factors beyond the second. For example, the third    factor separates out the medecines for flu, throat, pains and fever, by themselves.    These are the respondents who have had a bacterial or viral infection, and who    are not taking any other medecine.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">One issue which is fairly    controversial in multiple correspondence analysis is the percentage of variance    explained by each dimension. This problem has been thoroughly investigated by    Greenacre<SUP>3,4,9</SUP> and we give only the results here. If one calculates    the percentages in the usual way, the multiple correspondence analysis would    give percentages of 10.1, 8.1 and 7.5% for the first three dimensions, which    seem quite pessimistic. However, by taking into account an adjustment which    is fully explained in a practical context in Greenacre<SUP>3</SUP>, the percentages    of inertia turn out to be 49.0, 10.7 and 5.0%, respectively. We can thus conclude    that the two-dimensional map of <a href="#fig5">figure 5</a> explains at least    59.7% of the total inertia in the 16 variables, and not 18.2% as calculated    otherwise.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><B>Exploring missing data</B></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">Correspondence analysis    is frequently used to explore patterns of missing data in a survey, and to answer    questions such as: is there a specific group of respondents tending to refuse    to answer the same questions? Or, in other words, is non-response «correlated»    between questions? A way to answer these questions would be to set up a data    matrix of binary information, where for each respondent we simply code whether    the respondent has replied or not, using a one for a missing response and a    zero for an actual response, whatever that may be. We would code the data this    way because we are interested more in the occurrence of a non-response than    a response, but if we we wished to treat these two possibilities equally we    would use the coding in multiple correspondence analysis and introduce two columns    for each variable, a dummy variable for non-response and a dummy variable for    response. Either way, the analysis of these matrices will give an idea of which    questions have non-responses by the same people and also which respondents are    associated with which non-responses.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">In this particular survey,    the level of non-response is very low, so that such questions can not be investigated,    but there is one variable &#150;«Income»&#150; which does have a large number    of non-responses, in fact 1382, or almost 25% of the sample. Income, including    a special additional category of non-response, was thus crosstabulated with    the following biographical variables for which almost everyone gave complete    responses: sex, marital status, level of schooling, personal work situation,    and work situation of head of family (for respondents who are not family heads).    Although these are separate crosstabulations, the fact that they have one question    in common allows us to stack the tables one on top of each other (<a href="/img/gs/v16n2/a07tab06.gif">table    6</a>). The correspondence analysis map of this set of tables will show as best    as possible the relationship of each question with income, and we will be especially    interested in the position of the income non-response category.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><a href="/img/gs/v16n2/html/a07fig07.html">Figure    7</a> shows the resulting map. The income categories, labelled I1 to I6 in the    map, lie in their expected order, from lowest income on the right to the highest    income on the left. Notice that it is possible to change the sign of all the    coordinates on the first axis so that higher income is on the right &#150;this    does not alter the analysis or substantive interpretation at all. It is interesting    to see how the other categories are scaled from right to left in terms of their    income profiles, from «illiterate», «pensioner» and «widowed» on the right to    «head of household working», «working» and «student» on the left. The income    non-response point (denoted by I? in the map) lies well on the higher income    side, just below response I4 (150.000-200.000 ptas./month) with respect to the    first axis. This is an estimate of the average position of this non-response    group with respect to the other income groups. It is likely, however, that there    is a wide spread of incomes within the non-response group, and more formal ways    can be set up of estimating the income of individual respondents based on the    biographical data.</font></p>     ]]></body>
<body><![CDATA[<p><font face="Arial, Helvetica, sans-serif" size="2"><B>Trend data</B></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The usual way to display    trends is in the form of a line plot with the horizontal axis depicting the    time line and the vertical axis depicting the variable which is being observed    over time. For example, a typical graph would be the number of cases of measles    reported in Spain over the years 1989 to 1997, as given in <a href="#fig5">figure    5.1.1</a> of Regidor &amp; Gutiérrez-Fisac<SUP>10</SUP>. But in the table on    which this figure is based (<a href="#tab5">table 5.1.2 </a>of this publication),    the reported cases for each autonomous region in Spain are given for each year,    19 regions in all. To visualize and compare these trends would be difficult    since we would have to make 19 different line plots and then try to compare    them amongst one other and with the overall trend pattern. Correspondence analysis    can be used to interpret the different trend lines.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The symmetric map of these    data is given in <a href="/img/gs/v16n2/html/a07fig08.html">figure 8</a>.    Without actually seeing the data we can obtain an understanding of the differences    between the autonomous regions during this period. In this figure the centre    of the display corresponds to the trend of the whole country, or average row    profile. Thus a complete trend line is reduced to a point, and the points representing    the autonomous regions will show how each region deviates from this overall    pattern, with the year points facilitating the interpretation of these deviations.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">The centre point also represents    the average year pattern across the regions, and because the years have time    order, we can connect them to show a trajectory which moves around the space.    The trajectory traced out by the nine consecutive years is almost circular from    1989 to 1993. Then the years move towards the centre of the map (1994 to 1996),    which is closer to the average pattern and then 1997 returns to a position near    1993 and 1994. The most outlying autonomous regions are those that show the    greatest deviation from the average: Asturias in the initial years has more    than average incidence, Cantabria in 1991, to Galicia, Aragón and then the group    formed by Ceuta, La Rioja, Navarra and Melilla in 1992, and Canarias in 1993.    Regions near the centre such as Baleares and Extremadura do not differ as much    from the average trend.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><B>Conclusions</B></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">We have tried to give an    overview of how correspondence analysis can assist in deciphering the complex    information contained in a national health survey. From a simple cross-tabulation    to a multiway table and a set of intercorrelated categorical variables, correspondence    analysis provides a medium for exposing patterns in the data and suggesting    hypotheses. It also facilitates the quantification of categorical data, which    can assist with the model-building process. Optimal scales can be defined which    capture a maximum percentage of variation and condense the data at the same    time, and these scales can be used in other analyses which require interval    scales. The method also allows investigation of missing data, which can be considered    as an additional categorical response. In the visualization of trend data, the    points corresponding to successive time points are linked to show the pattern    in the changing profiles over time.</font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2"><B>Acknowledgements</B></font></p>     <p><font face="Arial, Helvetica, sans-serif" size="2">This work appeared originally    in an extended form as a report commissioned by the Fundación Banco Bilbao-Vizcaya-Argentaria.    We wish to thank Profs. Guillem López, Jaume Puig and Ángel López for their    assistance and comments on the manuscript.</font></p> <hr size="2" noshade>     <p><font face="Arial, Helvetica, sans-serif" size="2"><b>Bibliograf&iacute;a</b></font></p>     <!-- ref --><p><font face="Arial, Helvetica, sans-serif" size="2">1. Benzécri JP. Analyse    des données. Tome I: Analyse des correspondances. Tome II: La Classification.    Paris: Dunod, 1973.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2325495&pid=S0213-9111200200020001000001&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Arial, Helvetica, sans-serif" size="2"> 2. Greenacre MJ. Theory    and applications of correspondence analysis. London: Academic Press, 1984.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2325496&pid=S0213-9111200200020001000002&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Arial, Helvetica, sans-serif" size="2"> 3. Blasius J, Greenacre    MJ. Visualization of categorical data. San Diego: Academic Press, 1998.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2325497&pid=S0213-9111200200020001000003&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Arial, Helvetica, sans-serif" size="2"> 4. Greenacre MJ. Correspondence    analysis in practice. London: Academic Press, 1993.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2325498&pid=S0213-9111200200020001000004&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Arial, Helvetica, sans-serif" size="2"> 5. Greenacre MJ, Blasius    J. Correspondence analysis in the social sciences. London: Academic Press, 1994.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2325499&pid=S0213-9111200200020001000005&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Arial, Helvetica, sans-serif" size="2"> 6. Lebart L, Morineau A,    Warwick K. Multivariate descriptive statistical analysis. Chichester, UK: Wiley,    1984.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2325500&pid=S0213-9111200200020001000006&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Arial, Helvetica, sans-serif" size="2"> 7. Greenacre MJ. Correspondence    analysis in medical research. Statistical Methods in Medical Research, 1992;1:97-117.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2325501&pid=S0213-9111200200020001000007&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Arial, Helvetica, sans-serif" size="2"> 8. Gifi A. Nonlinear multivariate    analysis. Chichester, UK: Wiley, 1990.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2325502&pid=S0213-9111200200020001000008&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Arial, Helvetica, sans-serif" size="2"> 9. Greenacre MJ. Correspondence    analysis of multivariate categorical data by weighted least squares. Biometrika,    1988;75:457-67.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2325503&pid=S0213-9111200200020001000009&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --><!-- ref --><p><font face="Arial, Helvetica, sans-serif" size="2"> 10. Regidor E, Gutiérrez-Fizac    JL. Indicadores de salud. Cuarta evaluación en España del Programa Regional    Europeo Salud para Todos. Madrid: Ministerio de Sanidad y Consumo, 1999.</font>&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;[&#160;<a href="javascript:void(0);" onclick="javascript: window.open('/scielo.php?script=sci_nlinks&ref=2325504&pid=S0213-9111200200020001000010&lng=','','width=640,height=500,resizable=yes,scrollbars=1,menubar=yes,');">Links</a>&#160;]<!-- end-ref --> ]]></body><back>
<ref-list>
<ref id="B1">
<label>1</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Benzécri]]></surname>
<given-names><![CDATA[JP]]></given-names>
</name>
</person-group>
<source><![CDATA[Analyse des données: Analyse des correspondances]]></source>
<year>1973</year>
<publisher-loc><![CDATA[Paris ]]></publisher-loc>
<publisher-name><![CDATA[Dunod]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B2">
<label>2</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Greenacre]]></surname>
<given-names><![CDATA[MJ]]></given-names>
</name>
</person-group>
<source><![CDATA[Theory and applications of correspondence analysis]]></source>
<year>1984</year>
<publisher-loc><![CDATA[London ]]></publisher-loc>
<publisher-name><![CDATA[Academic]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B3">
<label>3</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Blasius]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
<name>
<surname><![CDATA[Greenacre]]></surname>
<given-names><![CDATA[MJ]]></given-names>
</name>
</person-group>
<source><![CDATA[Visualization of categorical data]]></source>
<year>1998</year>
<publisher-loc><![CDATA[San Diego ]]></publisher-loc>
<publisher-name><![CDATA[Academic]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B4">
<label>4</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Greenacre]]></surname>
<given-names><![CDATA[MJ]]></given-names>
</name>
</person-group>
<source><![CDATA[Correspondence analysis in practice]]></source>
<year>1993</year>
<publisher-loc><![CDATA[London ]]></publisher-loc>
<publisher-name><![CDATA[Academic]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B5">
<label>5</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Greenacre]]></surname>
<given-names><![CDATA[MJ]]></given-names>
</name>
<name>
<surname><![CDATA[Blasius]]></surname>
<given-names><![CDATA[J]]></given-names>
</name>
</person-group>
<source><![CDATA[Correspondence analysis in the social sciences]]></source>
<year>1994</year>
<publisher-loc><![CDATA[London ]]></publisher-loc>
<publisher-name><![CDATA[Academic]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B6">
<label>6</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Lebart]]></surname>
<given-names><![CDATA[L]]></given-names>
</name>
<name>
<surname><![CDATA[Morineau]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
<name>
<surname><![CDATA[Warwick]]></surname>
<given-names><![CDATA[K]]></given-names>
</name>
</person-group>
<source><![CDATA[Multivariate descriptive statistical analysis]]></source>
<year>1984</year>
<publisher-loc><![CDATA[Chichester ]]></publisher-loc>
<publisher-name><![CDATA[Wiley]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B7">
<label>7</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Greenacre]]></surname>
<given-names><![CDATA[MJ]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Correspondence analysis in medical research]]></article-title>
<source><![CDATA[Statistical Methods in Medical Research]]></source>
<year>1992</year>
<volume>1</volume>
<page-range>97-117</page-range></nlm-citation>
</ref>
<ref id="B8">
<label>8</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Gifi]]></surname>
<given-names><![CDATA[A]]></given-names>
</name>
</person-group>
<source><![CDATA[Nonlinear multivariate analysis]]></source>
<year>1990</year>
<publisher-loc><![CDATA[Chichester ]]></publisher-loc>
<publisher-name><![CDATA[Wiley]]></publisher-name>
</nlm-citation>
</ref>
<ref id="B9">
<label>9</label><nlm-citation citation-type="journal">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Greenacre]]></surname>
<given-names><![CDATA[MJ]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Correspondence analysis of multivariate categorical data by weighted least squares]]></article-title>
<source><![CDATA[Biometrika]]></source>
<year>1988</year>
<volume>75</volume>
<page-range>457-67</page-range></nlm-citation>
</ref>
<ref id="B10">
<label>10</label><nlm-citation citation-type="book">
<person-group person-group-type="author">
<name>
<surname><![CDATA[Regidor]]></surname>
<given-names><![CDATA[E]]></given-names>
</name>
<name>
<surname><![CDATA[Gutiérrez-Fizac]]></surname>
<given-names><![CDATA[JL]]></given-names>
</name>
</person-group>
<article-title xml:lang="en"><![CDATA[Indicadores de salud]]></article-title>
<source><![CDATA[Cuarta evaluación en España del Programa Regional Europeo Salud para Todos]]></source>
<year>1999</year>
<publisher-loc><![CDATA[Madrid ]]></publisher-loc>
<publisher-name><![CDATA[Ministerio de Sanidad y Consumo]]></publisher-name>
</nlm-citation>
</ref>
</ref-list>
</back>
</article>
