Acciones

Estadística descriptiva

De iMMAP-Colombia Wiki

Revisión del 14:25 9 ago 2011 de Villavec (discusión | contribuciones) (Página creada con 'La '''estadística descriptiva''' describe los atributos principales de un conjunto de datos de manera cuantitativa.<ref> (1995) ''Introductory Statistics, 2nd Edition'', …')
(dif) ← Revisión anterior | Revisión actual (dif) | Revisión siguiente → (dif)

La estadística descriptiva describe los atributos principales de un conjunto de datos de manera cuantitativa.<ref>

(1995) Introductory Statistics, 2nd Edition, Wiley. ISBN 0-471-31009-3</ref> Estadísticas descriptivas se distingue de estadística inferencial (o estadística inductiva), en que la estadística descriptiva busca resumir un conjunto de datos, en lugar de utilizar los datos para aprender sobre la población estadística que los datos deben representar.  Esto generalmente significa que la estadística descriptiva, a diferencia de la estadística inferencial, no están desarrollados basándose en la teoría de la probabilidad.<ref>

Dodge, Y (2003) The Oxford Dictionary of Statistical Terms OUP. ISBN 0-19-850994-4</ref>. Incluso cuando un análisis de datos recoge sus conclusiones principales utilizando estadística inferencial, estadística descriptiva generalmente también se cita. Por ejemplo en un trabajo reportando sobre un estudio involucrando sujetos humanos, típicamente aparece una tabla dando el tamaño de la muestra, tamaños de muestra por subgrupos importantes (e.g., para cada tratamiento o grupo de exposición), y características demográficas o clínicas tales como la edad promedia, la proporción de sujetos por género y la proporción de sujetos con co-morbilidad.

Uso en el análisis estadístico

La estadística descriptiva permite resumenes sobre la muestra y las mediciones. Junto con un análisis de gráficas sencillo, forman la base de un análisis cuantitativo de los datos.

Las estadísticas descriptivas resumen los datos. Por ejemplo, tirar un porcentaje en el baloncesto es una estadística descriptiva que resume el desempeño de un jugador o de un equipo. Este número es el número de tiros hechos dividido por el número de intentos. Un jugador quien tira 33% está haciendo aproximadamente un tiro de cada tres. Uno haciendo 25% está haciendo uno de cada cuatro. El porcentaje resume o describe multiples eventos discretos. O, considera el enemigo de muchos estudiante, el promedio puntaje. Este número único describe el desempeño de un estudiante a través de todos sus experiencias de cursos.<ref name="trochim">Plantilla:Cite web</ref>

Describir un conjunto grande de observaciones con un solo indicador pone en riesgo una distorción de los datos originales o una perdición de detalles importantes. Por ejemplo, el porcentaje de tiro no le dice si los tiros son de tres o dos, y un promedio de notas no le dice si el estudiante estuvo en cursos dificiles o faciles. A pesar de estas limitaciones, las estadísticas descriptivas ofrecen un resumen poderoso que permite algunas comparaciones a través de las personas u otras unidades.<ref name="trochim"/>

Univariate analysis

Univariate analysis involves the examination across cases of a single variable, focusing on three characteristics: the distribution; the central tendency; and the dispersion. It is common to compute all three for each study variable.

Distribution

The distribution is a summary of the frequency of individual or ranges of values for a variable. The simplest distribution would list every value of a variable and the number of cases who had that value. For instance, computing the distribution of gender in the study population means computing the percentages that are male and female. The gender variable has only two, making it possible and meaningful to list each one. However, this does not work for a variable such as income that has many possible values. Typically, specific values are not particularly meaningful (income of 50,000 is typically not meaningfully different from 51,000). Grouping the raw scores using ranges of values reduces the number of categories to something for meaningful. For instance, we might group incomes into ranges of 0-10,000, 10,001-30,000, etc.

Frequency distributions are depicted as a table or as a graph. Table 1 shows an age frequency distribution with five categories of age ranges defined. The same frequency distribution can be depicted in a graph as shown in Figure 2. This type of graph is often referred to as a histogram or bar chart.

Central tendency

The central tendency of a distribution locates the "center" of a distribution of values. The three major types of estimates of central tendency are the mean, the median, and the mode.

The mean is the most commonly used method of describing central tendency. To compute the mean, take the sum of the values and divide by the count. For example, the mean quiz score is determined by summing all the scores and dividing by the number of students taking the exam. For example, consider the test score values:

15, 20, 21, 36, 15, 25, 15

The sum of these 7 values is 147, so the mean is 147/7 =21.

The median is the score found at the middle of the set of values, i.e., that has as many cases with a larger value as have a smaller value. One way to compute the median is to sort the values in numerical order, and then locate the value in the middle of the list. For example, if there are 500 values, the median is the average of the two values in 250th and 251st positions. If there are 499 values, the value in 250th position is the median. Sorting the 7 scores above produces:

15, 15, 15, 20, 21, 25, 36

There are 7 scores and score #4 represents the halfway point. The median is 20. If there are an even number of observations, then the median is the mean of the two middle scores. In the example, if there were an 8th observation, with a value of 25, the median becomes the average of the 4th and 5th scores, in this case 20.5.

The mode is the most frequently occurring value in the set. To determine the mode, compute the distribution as above. The mode is the value with the greatest frequency. In the example, the modal value 15, occurs three times. In some distributions there is a "tie" for the highest frequency, i.e., there are multiple modal values. These are called multi-modal distributions.

Notice that the three measures typically produce different results. The term "average" obscures the difference between them and is better avoided. The three values are equal if the distribution is perfectly "normal" (i.e., bell-shaped).

Dispersion

Dispersion is the spread of values around the central tendency. There are two common measures of dispersion, the range and the standard deviation. The range is simply the highest value minus the lowest value. In our example distribution, the high value is 36 and the low is 15, so the range is 36 − 15 = 21.

The standard deviation is a more accurate and detailed estimate of dispersion because an outlier can greatly exaggerate the range (as was true in this example where the single outlier value of 36 stands apart from the rest of the values). The standard deviation shows the relation that set of scores has to the mean of the sample. Again let's take the set of scores:

15, 20, 21, 36, 15, 25, 15

to compute the standard deviation, we first find the distance between each value and the mean. We know from above that the mean is 21. So, the differences from the mean are:

15 − 21 = −6
20 − 21 = −1
21 − 21 = 0
36 − 21 = 15
15 − 21 = −6
25 − 21 = +4
15 − 21 = −6

Notice that values that are below the mean have negative differences and values above it have positive ones. Next, we square each difference:

(−6)2 = 36
(−1)2 = 1
(+0)2 = 0
(15)2 = 225
(−6)2 = 36
(+4)2 = 16
(−6)2 = 36

Now, we take these "squares" and sum them to get the sum of squares (SS) value. Here, the sum is 350. Next, we divide this sum by the number of scores minus 1. Here, the result is 350 / 6 = 58.3. This value is known as the variance. To get the standard deviation, we take the square root of the variance (remember that we squared the deviations earlier). This would be √58.3 = 7.63.

Although this computation may seem convoluted, it's actually quite simple. In English, we can describe the standard deviation as:

the square root of the sum of the squared deviations from the mean divided by the number of scores minus one

The standard deviation allows us to reach some conclusions about specific scores in our distribution. Assuming that the distribution of scores is close to "normal", the following conclusions can be reached:

  • approximately 68% of the scores in the sample fall within one standard deviation of the mean
  • approximately 95% of the scores in the sample fall within two standard deviations of the mean
  • approximately 99% of the scores in the sample fall within three standard deviations of the mean

For instance, since the mean in our example is 21 and the standard deviation is 7.63, we can from the above statement estimate that approximately 95% of the scores will fall in the range of 21 − (2×7.63) to 21 + (2×7.63) or between 5.74 and 36.26. Values beyond two standard deviations from the mean can be considered "outliers". 36 is the only such value in our distribution. Outliers help identify observations for further analysis or possible problems in the observations. Standard deviations also convert measures on very different scales, such as height and weight, into values that can be compared.

Other statistics

In research involving comparisons between groups, emphasis is often placed on the significance level for the hypothesis that the groups being compared differ to a degree greater than would be expected by chance. This significance level is often represented as a p-value, or sometimes as the standard score of a test statistic. In contrast, an effect size conveys the estimated magnitude and direction of the difference between groups, without regard to whether the difference is statistically significant. Reporting significance levels without effect sizes is problematic, since for large sample sizes even small effects of little practical importance can be statistically significant.

Examples of descriptive statistics

Most statistics can be used either as a descriptive statistic, or in an inductive analysis. For example, we can report the average reading test score for the students in each classroom in a school, to give a descriptive sense of the typical scores and their variation. If we perform a formal hypothesis test on the scores, we are doing inductive rather than descriptive analysis.

Some statistical summaries are especially common in descriptive analyses. Some examples follow.

See also

Plantilla:Portal

Plantilla:More footnotes

Notes

<references group=""></references>

External links

Plantilla:Statistics

ar:إحصاء وصفي ca:Estadística descriptiva de:Deskriptive Statistik es:Estadística descriptiva eu:Estatistika deskribatzaile fa:آمار توصیفی fr:Statistique descriptive ko:기술 통계학 id:Statistika deskriptif it:Statistica descrittiva he:סטטיסטיקה תאורית lv:Aprakstošā statistika lb:Deskriptiv Statistik ja:要約統計量 no:Deskriptiv statistikk pl:Statystyka opisowa pt:Estatística descritiva ru:Описательная статистика simple:Descriptive statistics sr:Дескриптивна студија su:Statistik deskriptif th:สถิติพรรณนา tr:Betimsel istatistik vi:Thống kê mô tả yi:באשרייבנדיקע סטאטיסטיק zh:描述统计学