The statistics of competition results in Excel
Recently, my son participated in a local academic
competition alongside other kids of same grade from various schools. During the
group contest, held in the same classroom, his team which has 4 kids from same
class felt confident as they competed against other school teams. However, to
their surprise, their school didn’t secure a spot in the top eight. The team
they considered less capable outperformed them.
As a data enthusiast mom, I’m eager to delve into the
reasons behind this unexpected outcome. I'm going to divide the observants into
4 groups:
Group1: participants from school A,
Group2: participants from school B,
Group3: participants from school C,
Group4: All the participants.
First, I'm going to study the Descriptive Statistics of
each group's scores.
Descriptive statistics refer to
the analysis, summary, and presentation of findings related to a data set
derived from a sample or the entire population. These statistics help us
understand and organize characteristics of the data. Let’s explore the main
categories of descriptive statistics:
- Frequency Distribution: This concerns the frequency of
each value in the data set. It tells us how often each value occurs.
- Measures of Central Tendency: These provide information about
the averages of the values. Common measures include:
- Mean (Average): The sum of all values divided
by the total number of values.
- Median: The middle value when the data
is arranged in ascending or descending order.
- Mode: The value that appears most
frequently.
- Measures of Variability: These describe how spread out the
values are. Key measures include:
- Range: The difference between the
maximum and minimum values.
- Variance: A measure of how much the
values deviate from the mean.
- Standard Deviation: The square root of the
variance.
Descriptive statistics help us gain insights into data
patterns, compare variables, and summarize information effectively. They serve
as a foundation for further statistical analysis and decision-making.
I used the excel add-in Data Analysis to run the
"Descriptive Analysis", and I got the result for the 4 groups as
below:
School A is my son's school from which there are 25 kids participating.
The average score is 54.7. School B is the school that I mentioned above from
which 29 kids participated. Their average score is 55.3 which is 0.6
higher than school A. I found it interesting that school A's median score
58 and maximum score 88 are higher than school B's median score 53 and maximum
score 80. Apart from that, school B's lowest score was 5 which is much lower
than school A's lowest score 19.
Let's keep studying the information. School A's standard
deviation is 18.55, and school B's is 15.87. A large standard deviation indicates that there are a lot of variances
in the observed data around the mean. This indicates that the data observed is
quite spread out. A small or low standard deviation would indicate instead that
much of the data observed is clustered tightly around the mean. So, comparing
school A and school B's standard deviation, school A's scores spread wider than
school B's scores.
Now, I'm using the boxplot to see how the scores of each group spread.
A boxplot,
also known as a box plot, box-and-whisker plot, or
simply a box-and-whisker diagram, is a standardized way of
displaying the distribution of a data set based on its five-number
summary. Let’s break down what this means:
- Five-Number Summary:
- The five key
values that define a boxplot are:
- Minimum: The smallest data point.
- First Quartile (Q1): The value below which 25% of
the data falls.
- Median (Q2): The middle value of the data set (50th
percentile).
- Third Quartile (Q3): The value below which 75% of
the data falls.
- Maximum: The largest data point.
- These five values provide insights into the central tendency, spread, and skewness of the data.
- Interpretation:
- Boxplots help
answer questions such as:
- Are there outliers in the data?
- Is the data symmetrically distributed?
- How tightly are the data points grouped?
- Is the data skewed?
- By comparing
the boxplot’s components, you gain insights into the data’s
characteristics.
Checking on the school B's scores, 5
has been identified as an outlier at the bottom. Then the score at the bottom
of the whisker is 31. 1st quartile 48, mean is 53, 3rd quartile is 69.
It's obvious that the score distribution of school B is toward the higher end,
and more concentrated; school A's scores distribution is toward the opposite
direction.
School C is the top 1 school which has
the best scores. Its 1st quartile is 63.25, median score is 75, 1st quartile is
83.75. And they have the highest score among all the participants.
The boxplot for all participants scores' boxplot is on the far right of the above figure.
Last, I created the histograms and bell curve.
Histograms
provide insights into the density of the underlying distribution of the data.
They help with density estimation, which means estimating the probability
density function of the underlying variable.
The histogram can reveal whether the
data is:
·
Symmetric
·
Skewed right
·
Skewed left
·
Bimodal
·
Multimodal
Histograms are powerful tools for understanding the
distribution of data, especially when comparing different categories or ranges!
A bell curve, also known as
a normal distribution or Gaussian distribution, is a symmetrical
probability distribution frequently encountered in various fields. When plotted
on a graph, the data follows a bell shape, with most values clustering around a
central region and tapering off as they move away from the center. The curve is
symmetric, meaning that the left and right sides mirror each other.
There are 10 bins, each bin's range is 10.
The histograms clearly show the scores distribution of each school and overall scores. School A's scores spread wider than school B. School C's distribution is skewed right which means they have more higher scores. Only all participants scores' histogram has the most near symmetric bell curve.
The
analysis provides valuable insights into the scores. If the students at School
A put in a little more effort and perhaps a touch of luck, they could achieve a
better rank next year.
Ref: The definitions of the statistics concepts are searched from Bing Copilot.
Comments
Post a Comment