Monday, November 10, 2014

Measures of Central Tendency


Mode, Median, & Mean

          Univariate statistics are statistics that are not designed to inform people about relationships or multiple variables, but instead, these statistics focus on a single variable, hence the name univariate.  They ultimately are used to give information about the way the scores in a variable are distributed.  Among univariate statistics, the most common are those referred to as "averages," which are more correctly labeled as measures of central tendency.  Being that measures of central tendency are the averages of data collections, they therefore focus on the middle of the data.  They can be broken down into more specific measurements, and the three main measures are mode, median, and mean.
         
         The first measure of central tendency to discuss is the mode.  This measurement is supposed to be the simplest to find, and can be very useful.  The term mode basically stands for the most, as it is defined as the category with the greatest number of cases.  Because one can look at the set of data and determine the mode, this average makes no arithmetic requirements.  No matter how many numbers there are, if one number appears even just one more time than any other number, it is the mode of that particular set of data.  While there is usually only one mode for any set of data and it is simple to find, things can get a somewhat more complicated when there are two or more categories that are tied for the largest number of cases.  If this is the case, there is more than one mode, or multiple modes.  If only two categories are tied and there are then two modes, the distribution is labeled as bimodal.  If three are tied, the distribution is then labeled as trimodal.  If there or four or more distributions tied, the case is labeled as multimodal.  The mode can be very useful because it only requires nominal-level data.  With these requirements, it can be used with any level of data.  One example of the way that a mode is useful is that it can tell a most frequent value, such as what is the most frequent crime.  A search of the modal category of index crimes in the UCR would give us the correct answer (which is theft, by the way).  You could also use the mode when predicting the results of throwing or rolling a pair of dice (such as in the game of craps or even monopoly).  When coming up with all of the possibilities of combinations to give a specific sum, seven is the mode of this particular exercise.  Seven can be rolled in seven different ways, which is more than any other sum.  With that being said, a mode can be useful in predicting categories, or even outcomes.  That is pretty useful being that it is such a simple statistic.
          The next measure of central tendency is the median, which stands for the middle of a set of data.  The median is defined as the midpoint case in an ordered distribution.  To obtain the median, one must place all of the numbers in a set of data in order from smallest to largest.  Once this is done, one just simply finds the middle value by counting the numbers.  It is most simple to find the median if the number of values in a given set is odd, because there would be one value in the exact middle.  For example, if the set has the numbers 0, 3, 5, 7, and 9, there are 5 numbers.  The third number would then be the median, which is 3.  However, the ability to find the mean can become a little more complicated when the number of values in a given set is even, making two values the middle of the data.  If a case listed the numbers 1 through 12, then 6 and 7 would share the middle.  When this type of median occurs, the median is referred to as the artificial median, meaning that it is not the middle of the data, but it is the halfway point between the two numbers which were found in the middle.  The median of the list containing the numbers 1 through 12 would then be 6.5, because that is what is between 6 and 7.  If one wanted to find the median using a formula, they could use (n + 1) / 2.  The n represents the number of cases.  This formula works well with large data sets, and if the data set is smaller, it is easier to determine the median visually. 
          Unlike the mode, the median does require at least ordinal-level data, which is why the values must be placed in order.  This is also why the median is referred to as the ordinal-level average.  One thing that is more beneficial when looking at the median is that it is not influenced by extreme cases in the data set.  If a data set contains the numbers 7, 11, 14, 16, 20, 25, and 50, the number 50 is considered an extreme.  When using the median, the median is a useful average because it is 16 and actually fits the rest of the data set.  This median meets the characteristics of the data set and is a beneficial representation.  Any time an extreme is present, the median would be the wisest average to use to give the most accurate result.
          The last measure of central tendency to discuss is the mean.  The mean is the most common average used.  It is defined as the arithmetic average of a set of scores.  Basically, the mean is a calculated score that requires at least interval-level data because it uses “real” values or magnitudes.  That is the main thing to keep in mind about the mean, it cannot be calculated with only nominal or ordinal-level data because you cannot perform any arithmetic (add, multiply, subtract, or divide) without interval-level data, and it is an arithmetic average.  Any time there is interval or ratio data, the mean is an excellent measure of central tendency.  The mean is taught is school and is most commonly known, which is why it is most commonly used.  Students and teachers both use the mean to average a final grade in a class over a certain period of time.  The mean is calculated by summing up all of the scores (or numbers) in a set and dividing them by the number of scores present.  For example, if a student made an 80, 96, 93, and 89 on four tests, the student our teacher would find the final grade by adding the four scores up to get the total of 358, and dividing by four because there are four tests graded.  The final grade would be 89.5.  The way of calculating the mean gives it one very important property because of its arithmetic base.  It is the one point that is closest to not one, or two, but all of the score in a set of data.
          The mean is the most common average used; although, it can be a problem.  Unlike the median, the mean can be greatly affected by an extreme in the data set.  If a data set contains the scores 1, 5, 7, 10, 2, 1, 4, 11, 8, 127, and 3, the total is 179.  After dividing by 11, the mean would equal 16.5 which is higher than most of the scores in the data set.  An extreme throws the average way off, and can create a poor representation of a data set.  When looking at the mean, one can see even further that when looking at a data set containing an extreme, it is wiser to use the median instead of the mean.  Other than needing interval or ratio-level data and the problem with extremes when determining, the mean is the most popular and is also very easily interpreted.  It requires no order, just direct calculations.  More importantly, the mean is said to be the foundation of ones more sophisticated and most powerful statistics.  
          While a mean is a form of central tendency used in univariate statistics, it can also be used when comparing two or more groups of individuals.  The mean can further tell if they are alike or different.  Because this comparison is so simple and most people understand the mean, it is most commonly used.   The mean compares two or more groups by using their central points.  For example, one study used the mean to determine if males or females were more likely to be victimized by drunk drivers.  The study showed that males had a mean occurrence of lifetime DWI victimization of 0.16, and females had a mean occurrence of lifetime DWI victimization of 0.14.  From this information, it was easy to tell that males are slightly more likely to be a DWI victim in their lifetime than females.
          All three of the main measures of central tendency, mode, median, and mean, are easily calculated and simple to understand.  It is important that one understands these terms when looking for average or proper representations of any set of data.  While central tendency measurements as a whole compare two or more groups of individuals to see if they are alike or different, univariate statistics can be used to do the same thing, but with a single variable.  It is important that a person knows when to use which measure of central tendency and with which set of data so that poor representations are not present during research and when presenting studies to someone else, or even the public.
          My information came from Chapter 4: Measures of Central Tendency.  I feel that the author did an excellent job explaining each of the measurement throughout the chapter.  Each type was explained in great detail.  The author gave thorough examples, and also listed the problems or "glitches" of each different average.  Being someone who already knew all about these measures of central tendency, I thought I would be bored throughout this chapter; however, the author wrote in a way that made it simple, yet interesting and it kept me captivated.  I can also see that if someone had no previous knowledge of the subject matter, they would be able to follow along easily, stay captivated, and learn the material to a decent understanding.  I can honestly say that I did not have a problem with this author's writing, and I hope that when people read mine in this blog, they feel the same way.

         
          

1 comment:

  1. Sounds like you're off to a great start, I hope to see some pictures in the future!

    ReplyDelete