The statistics of competition results in Excel

Recently, my son participated in a local academic competition alongside other kids of same grade from various schools. During the group contest, held in the same classroom, his team which has 4 kids from same class felt confident as they competed against other school teams. However, to their surprise, their school didn’t secure a spot in the top eight. The team they considered less capable outperformed them.

As a data enthusiast mom, I’m eager to delve into the reasons behind this unexpected outcome. I'm going to divide the observants into 4 groups: 

Group1: participants from school A,

Group2: participants from school B, 

Group3: participants from school C,

Group4: All the participants.

First, I'm going to study the Descriptive Statistics of each group's scores. 

Descriptive statistics refer to the analysis, summary, and presentation of findings related to a data set derived from a sample or the entire population. These statistics help us understand and organize characteristics of the data. Let’s explore the main categories of descriptive statistics:

  1. Frequency Distribution: This concerns the frequency of each value in the data set. It tells us how often each value occurs.
  2. Measures of Central Tendency: These provide information about the averages of the values. Common measures include:
    • Mean (Average): The sum of all values divided by the total number of values.
    • Median: The middle value when the data is arranged in ascending or descending order.
    • Mode: The value that appears most frequently.
  3. Measures of Variability: These describe how spread out the values are. Key measures include:
    • Range: The difference between the maximum and minimum values.
    • Variance: A measure of how much the values deviate from the mean.
    • Standard Deviation: The square root of the variance.

Descriptive statistics help us gain insights into data patterns, compare variables, and summarize information effectively. They serve as a foundation for further statistical analysis and decision-making. 

I used the excel add-in Data Analysis to run the "Descriptive Analysis", and I got the result for the 4 groups as below:

 

School A is my son's school from which there are 25 kids participating. The average score is 54.7. School B is the school that I mentioned above from which 29 kids participated.  Their average score is 55.3 which is 0.6 higher than school A.  I found it interesting that school A's median score 58 and maximum score 88 are higher than school B's median score 53 and maximum score 80. Apart from that, school B's lowest score was 5 which is much lower than school A's lowest score 19. 

Let's keep studying the information. School A's standard deviation is 18.55, and school B's is 15.87. A large standard deviation indicates that there are a lot of variances in the observed data around the mean. This indicates that the data observed is quite spread out. A small or low standard deviation would indicate instead that much of the data observed is clustered tightly around the mean. So, comparing school A and school B's standard deviation, school A's scores spread wider than school B's scores. 

Now, I'm using the boxplot to see how the scores of each group spread.

boxplot, also known as a box plotbox-and-whisker plot, or simply a box-and-whisker diagram, is a standardized way of displaying the distribution of a data set based on its five-number summary. Let’s break down what this means:

  1. Five-Number Summary:
    • The five key values that define a boxplot are:
      • Minimum: The smallest data point.
      • First Quartile (Q1): The value below which 25% of the data falls.
      • Median (Q2): The middle value of the data set (50th percentile).
      • Third Quartile (Q3): The value below which 75% of the data falls.
      • Maximum: The largest data point.
    • These five values provide insights into the central tendency, spread, and skewness of the data.
  2. Interpretation:
    • Boxplots help answer questions such as:
      • Are there outliers in the data?
      • Is the data symmetrically distributed?
      • How tightly are the data points grouped?
      • Is the data skewed?
    • By comparing the boxplot’s components, you gain insights into the data’s characteristics.

 


From the boxplot figure, I can see the school A's lowest score 19 is at the bottom of the whisker, the number at the bottom right of the rectangle is 44 which is the first quartile which means 25% scores are below 44. 58 is the median score and 63 at the top right of the rectangle is the 3rd quartile which means 75% scores are below 63. 54.72 in the rectangle is the average score. the highest score 88 is at the top of the whisker.

Checking on the school B's scores, 5 has been identified as an outlier at the bottom. Then the score at the bottom of the whisker is 31.  1st quartile 48, mean is 53, 3rd quartile is 69. It's obvious that the score distribution of school B is toward the higher end, and more concentrated; school A's scores distribution is toward the opposite direction. 

School C is the top 1 school which has the best scores. Its 1st quartile is 63.25, median score is 75, 1st quartile is 83.75. And they have the highest score among all the participants.

The boxplot for all participants scores' boxplot is on the far right of the above figure.

Last, I created the histograms and bell curve.

Histograms provide insights into the density of the underlying distribution of the data. They help with density estimation, which means estimating the probability density function of the underlying variable.

The histogram can reveal whether the data is:

·         Symmetric

·         Skewed right

·         Skewed left

·         Bimodal

·         Multimodal

Histograms are powerful tools for understanding the distribution of data, especially when comparing different categories or ranges!

A bell curve, also known as a normal distribution or Gaussian distribution, is a symmetrical probability distribution frequently encountered in various fields. When plotted on a graph, the data follows a bell shape, with most values clustering around a central region and tapering off as they move away from the center. The curve is symmetric, meaning that the left and right sides mirror each other.

There are 10 bins, each bin's range is 10. 





The histograms clearly show the scores distribution of each school and overall scores. School A's scores spread wider than school B. School C's distribution is skewed right which means they have more higher scores. Only all participants scores' histogram has the most near symmetric bell curve.

The analysis provides valuable insights into the scores. If the students at School A put in a little more effort and perhaps a touch of luck, they could achieve a better rank next year.



Ref: The definitions of the statistics concepts are searched from Bing Copilot.






Comments

Popular posts from this blog

Schema and Security Control in SQL server

Washington State's Gasoline Price Hit the historical High - PowerBI Dashboard