New Stats 1 Tips‎ > ‎

Defining Statistics & Statistical Thinking

posted Sep 1, 2014, 7:05 PM by Prof Kiernan

Statistics can be defined as the science of data. The study of statistics is the universal process of data generation, analysis, presentation, and even how to interpret the data. This means statistics is not like any other math class you've ever taken before. The Urbandictionary.com put it simplest when they defined statistics as:

"The math course that is essentially the lovechild mathematics and English... And sometimes psychology."(2011)

 

Statistics is much more than gambling and surveys. Everyone uses statistics daily basis even without realizing it. Statistics is a course where the numbers are not nearly as important as the thought process used to generate the numbers. Essentially, the best statisticians are skeptical of all computations where raw data is not present and are looking for flaws in data generation preparation, analysis, and conclusions being made with the data. The following topics include common concerns people have when deciding which statistics are flawed.

Prepare Data

To prepare the data for use you must consider the answers to a series of questions to avoid wasting time analyzing raw data that is flawed. The most important questions used for preparing data are context based questions: What do the data mean? What is the goal of the study?  You should then, consider the source of the data: Is the source objective? Is the source biased? The key idea here is to be vigilant and skeptical of studies from sources that may be biased. Sampling Methods must be considered as well: Does the method chosen greatly influence the validity of the conclusion? Voluntary response (or self-selected) samples often have bias (those with special interest are more likely to participate). Are other methods are more likely to produce good results?

 

Analyze Data

The first step of actual data analysis is to create the appropriate graphs (covered in chapter 2). Once these graphs have been created, you should apply statistical methods (the rest of the book explains the statistical methods). Most of the formulas required to compute the numerical values are extremely daunting (some are not possible by hand), thus statisticians rely heavily on technology (computers, graphing calculators, tables). With technology, good analysis does not require strong math skills, but it does require using common sense and paying attention to sound statistical methods.

 

Make Conclusions

The whole point of statistical analysis is to decide if the data is significant (different from normal data). Data that is statisticly significant will not happen based on coincidence. Occasionally data can be statistically significant without being practically significant (useful in the real world). 

 

 

PEOPLE MISUSE STATISTICS
Simply put, people dont understand statistics use them to prove points all of the time. When people don't use the correct methods their conclusions end up being flawed. Below is a list of the most common misuses of Statistics.

MISLEADING CONCLUSIONS (Correlation does not imply causation)

Concluding that one variable causes the other variable when in fact the variables are only correlated or associated together (covered in Chapter 10). In essence, two variables that may seem to be related, are temperature and violent crimes (as it gets hotter outside, the number of violent crimes will increase). However, we cannot conclude the one causes the other based solely on the numerical calculation of the relationship between temperature and the number of violent crimes. There may be another factor involved (like discomfort) that explains the relationship. Which is where the mantra used in many social science classes "Correlation does not imply causation" comes from.


SMALL SAMPLE SIZES
Conclusions should never be based on tiny sample sizes. If you are looking to decide if a study technique works for all college students, and you only study 8 college students, your data is pretty much useless. If your data is to be useful it should be based on a reasonable percentage of the population. In other words, the smaller the sample size (i.e. number of participants in a study) the less useful the data will be. In the real world, you want to have an experiment that has a reasonable number of participants.

For example, if Prof K. surveys 20 people to figure out which Walt Disney World restaurants people like eating at. and comes to the decision that the Be Our Guest restaurant is the most popular counter service restaurant in the Magic Kingdom. Thanks to Google.com Prof. K found that there are approximately 17 MILLION visitors to Walt Disney World every year. What does that say about the validity of Prof. K's data?

LOADED QUESTIONS
If survey questions are not worded well, the results can be misleading.
In a famous Psychological Exp
Source: http://www.simplypsychology.org/loftus-results.jpg
eriment Elizabeth Loftus & John Palmer found that by changing one word in a question could change the responses of people Click here for more info on this experiment In the experiment participants had to watch several videos of car accidents then each participant was asked "How fast the cars were going when they ____?" Several different words were used to complete the question and the responses changed based on the word used. When participants were asked "How fast were the cars going when they contacted?" the average response was 32 mph, yet when the participants were asked How fast were the cars going when they smashed?" the average response was 42 mph. That's a 10 mph difference based on the same video footage.

Image Source: http://www.simplypsychology.org/loftus-results.jpg

ORDER OF QUESTIONS
The order of the questions can change the results as well. Sometimes, questions are unintentionally loaded by such factors as the order of the items being considered.
For Example: Would you say traffic contributes more or less to air pollution than industry?
Results in: traffic - 45%; industry - 27%
When the order is reversed the results change to: industry - 57%; traffic - 24%

NONRESPONSES
Your data may be flawed due to not having the right mix of people answering the question. This occurs when someone either refuses to respond to a survey question or is unavailable. People who refuse to talk to pollsters have a view of the world around them that is markedly different than those who will let pollsters into their homes, or have the time to answer the polls. Think about it, who has time to answer a poll that takes 15 minutes (without getting reimbursed for your time)?

MISSING DATA
Missing data can change your numbers drastically. People frequently drop out of studies for many reasons that aren't related to the study. Think about how many reasons you could have for missing a final exam in a Biology course. Now think about how your course average would change due to a zero for the final exam. Finally think about what would happen to your average if you missed the Biology midterm and final exams. Simply put, when people drop out of studies they change the statistics.

PRESCISE NUMBERS
Simply put, just because a number is exact, doesn't mean it hasn't been estimated. A number can be an estimate but should always be referred to as an estimate. Think about the many ways people can round any 4 digit number (i.e. 1,675.43 could be rounded to 2,000 or 1,700 or 1,680 these numbers are all correct yet they are all different estimates) Now, think about how you could be manipulated into buying one computer over another computer by someone saying one computer costs $1,700 vs $1680.

PERCENTAGES
Simply put, many people don't understand fractions or percentages and misuse them regularly.

Misleading or unclear percentages are sometimes used. Textbook Example – Continental Airlines ran an ad claiming “We’ve already improved 100% in the last six months” with respect to lost baggage. Does this mean Continental made no mistakes?


Comments