### New Stats 1 Tips

#### TIPS FOR MAKING NOTE SHEETS FOR CLASS

posted Oct 3, 2015, 3:09 PM by Prof Kiernan

 SQUEEZE AS MUCH INFO IN AS POSSIBLE:Use the smallest font you can read without needing a magnifying glass. (see pdf at the bottom of the page with different sized fonts).Leave space to hand write in equations or symbolsDraw pictures, use screen shots, make tables smaller (but still readable)Use columns or tables to ensure you have used as much space as possibleUse the smallest margins possible. Most printers will print with 0.2" margins.ORGANIZE IT: Use color coding Make each topic a different color to save time when searching for information.Draw boxes around each topic/formulaDON'T FORGET TO INCLUDE: VOCABULARY and SYMBOLS. Many people forget the vocabulary and symbols on their sheets.CALCULATOR INSTRUCTIONS. If you have the time, use a calculator font to make sure you know which button to push... Here's the ti 83 & ti 84 keys in a font. I've used this font to make the instructions for the correlation calculator instructions.FORMULAS & EXAMPLES. Especially for topics you frequently mess up on.TRY IT OUT BEFORE THE EXAMPrint a copy at home or email a copy to yourself saved as a PDF file so you don't lose any of your formatting.

#### Exam 3 Formula Sheet

posted Nov 22, 2014, 5:32 PM by Prof Kiernan

 Here's a copy of the suggested formula sheet for Exam 3. Make sure you know how to use the formulas on this sheet for the exam.

#### Defining Statistics & Statistical Thinking

posted Sep 1, 2014, 7:05 PM by Prof Kiernan

Statistics can be defined as the science of data. The study of statistics is the universal process of data generation, analysis, presentation, and even how to interpret the data. This means statistics is not like any other math class you've ever taken before. The Urbandictionary.com put it simplest when they defined statistics as:

"The math course that is essentially the lovechild mathematics and English... And sometimes psychology."(2011)

Statistics is much more than gambling and surveys. Everyone uses statistics daily basis even without realizing it. Statistics is a course where the numbers are not nearly as important as the thought process used to generate the numbers. Essentially, the best statisticians are skeptical of all computations where raw data is not present and are looking for flaws in data generation preparation, analysis, and conclusions being made with the data. The following topics include common concerns people have when deciding which statistics are flawed.

## Prepare Data

To prepare the data for use you must consider the answers to a series of questions to avoid wasting time analyzing raw data that is flawed. The most important questions used for preparing data are context based questions: What do the data mean? What is the goal of the study?  You should then, consider the source of the data: Is the source objective? Is the source biased? The key idea here is to be vigilant and skeptical of studies from sources that may be biased. Sampling Methods must be considered as well: Does the method chosen greatly influence the validity of the conclusion? Voluntary response (or self-selected) samples often have bias (those with special interest are more likely to participate). Are other methods are more likely to produce good results?

## Analyze Data

The first step of actual data analysis is to create the appropriate graphs (covered in chapter 2). Once these graphs have been created, you should apply statistical methods (the rest of the book explains the statistical methods). Most of the formulas required to compute the numerical values are extremely daunting (some are not possible by hand), thus statisticians rely heavily on technology (computers, graphing calculators, tables). With technology, good analysis does not require strong math skills, but it does require using common sense and paying attention to sound statistical methods.

## Make Conclusions

The whole point of statistical analysis is to decide if the data is significant (different from normal data). Data that is statisticly significant will not happen based on coincidence. Occasionally data can be statistically significant without being practically significant (useful in the real world).

PEOPLE MISUSE STATISTICS
Simply put, people dont understand statistics use them to prove points all of the time. When people don't use the correct methods their conclusions end up being flawed. Below is a list of the most common misuses of Statistics.

MISLEADING CONCLUSIONS (Correlation does not imply causation)

Concluding that one variable causes the other variable when in fact the variables are only correlated or associated together (covered in Chapter 10). In essence, two variables that may seem to be related, are temperature and violent crimes (as it gets hotter outside, the number of violent crimes will increase). However, we cannot conclude the one causes the other based solely on the numerical calculation of the relationship between temperature and the number of violent crimes. There may be another factor involved (like discomfort) that explains the relationship. Which is where the mantra used in many social science classes "Correlation does not imply causation" comes from.

SMALL SAMPLE SIZES
Conclusions should never be based on tiny sample sizes. If you are looking to decide if a study technique works for all college students, and you only study 8 college students, your data is pretty much useless. If your data is to be useful it should be based on a reasonable percentage of the population. In other words, the smaller the sample size (i.e. number of participants in a study) the less useful the data will be. In the real world, you want to have an experiment that has a reasonable number of participants.

For example, if Prof K. surveys 20 people to figure out which Walt Disney World restaurants people like eating at. and comes to the decision that the Be Our Guest restaurant is the most popular counter service restaurant in the Magic Kingdom. Thanks to Google.com Prof. K found that there are approximately 17 MILLION visitors to Walt Disney World every year. What does that say about the validity of Prof. K's data?

If survey questions are not worded well, the results can be misleading.
In a famous Psychological Exp
eriment Elizabeth Loftus & John Palmer found that by changing one word in a question could change the responses of people Click here for more info on this experiment In the experiment participants had to watch several videos of car accidents then each participant was asked "How fast the cars were going when they ____?" Several different words were used to complete the question and the responses changed based on the word used. When participants were asked "How fast were the cars going when they contacted?" the average response was 32 mph, yet when the participants were asked How fast were the cars going when they smashed?" the average response was 42 mph. That's a 10 mph difference based on the same video footage.

Image Source: http://www.simplypsychology.org/loftus-results.jpg

ORDER OF QUESTIONS
The order of the questions can change the results as well. Sometimes, questions are unintentionally loaded by such factors as the order of the items being considered.
For Example: Would you say traffic contributes more or less to air pollution than industry?
Results in: traffic - 45%; industry - 27%
When the order is reversed the results change to: industry - 57%; traffic - 24%

NONRESPONSES
Your data may be flawed due to not having the right mix of people answering the question. This occurs when someone either refuses to respond to a survey question or is unavailable. People who refuse to talk to pollsters have a view of the world around them that is markedly different than those who will let pollsters into their homes, or have the time to answer the polls. Think about it, who has time to answer a poll that takes 15 minutes (without getting reimbursed for your time)?

MISSING DATA
Missing data can change your numbers drastically. People frequently drop out of studies for many reasons that aren't related to the study. Think about how many reasons you could have for missing a final exam in a Biology course. Now think about how your course average would change due to a zero for the final exam. Finally think about what would happen to your average if you missed the Biology midterm and final exams. Simply put, when people drop out of studies they change the statistics.

PRESCISE NUMBERS
Simply put, just because a number is exact, doesn't mean it hasn't been estimated. A number can be an estimate but should always be referred to as an estimate. Think about the many ways people can round any 4 digit number (i.e. 1,675.43 could be rounded to 2,000 or 1,700 or 1,680 these numbers are all correct yet they are all different estimates) Now, think about how you could be manipulated into buying one computer over another computer by someone saying one computer costs \$1,700 vs \$1680.

PERCENTAGES
Simply put, many people don't understand fractions or percentages and misuse them regularly.

Misleading or unclear percentages are sometimes used. Textbook Example – Continental Airlines ran an ad claiming “We’ve already improved 100% in the last six months” with respect to lost baggage. Does this mean Continental made no mistakes?

#### Important things to remember when you get to Stats 2

posted May 3, 2014, 8:57 PM by Prof Kiernan

 Here are some VITAL concepts you should remember when you get to Stats 2.This information may be helpful while you are preparing for the final.

#### Hypothesis Testing 101

posted Apr 22, 2014, 7:35 PM by Prof Kiernan   [ updated Jul 14, 2014, 3:30 PM ]

Steps for Hypothesis testing:

1.     Write down what’s given

i.e. sample standard deviation, sample size, population proportion

2.     Figure out what table & formula you should use

3.     Draw the picture!

Left tail is less than

Right tail is greater than

Two tail is equal to or not equal to

4.     Use the tables to find the critical value and add it to the picture

Use α to find the critical value in a one tailed test

Use, 𝛼/2 to find the critical value in a two tailed test

5.     Write the hypothesis

a.     The null hypothesis H0 always has an equal sign

b.     The alternative hypothesis has either a less than sign <, greater than sign> or not equal sign

6.     Use formula to find the test statistic Z, t, X 2 etc.

7.     Decide whether to reject or fail to reject the null hypothesis.

If the test statistic falls in the shaded critical region, reject the null hypothesis

If the test statistic does NOT fall in the shaded critical region, “Fail to reject” the null hypothesis

8.     State your conclusion:

 Original claim is the null hypothesis Original claim is alternative hypothesis Reject H0 “There is sufficient evidence to warrant rejection of the claim that… (original claim)” This is the only time that the original claim can be rejected. “The sample data supports the claim that…(original claim)” This is the only time that the original claim can be supported Fail to reject H0 “There is NOT sufficient evidence to warrant rejection of the claim that… (original claim)” “There is NOT sufficient sample evidence to support the claim that…(original claim)”

Source:  Triola, M. F. (2003). Elementary Statistics (9th ed.), Pearson Education, Inc.

#### Finding the mean and variance of a probability distribution

posted Mar 17, 2014, 6:22 PM by Prof Kiernan

To find the mean of a probability distribution you need to use this formula:

In English this means you need to multiply each x value by the corresponding probability and get the sum of the results.

To find the variance of a probability distribution you need to use this formula:
In English this means you need to do the following:
First: subtract the mean from each x value and square each answer
Second: Multiply each answer from the first step by each probability
Third: get the sum of the answers from the second step

Note: remember the standard deviation is the square root of the variance.

Example 1: Find the mean and variance of the following probability distribution.

 x P(x) 0 0.1 1 0.3 2 0.4 3 0.2

To find the mean of the distribution we need to add another vertical column onto our table and a total row at the bottom of our table. and compute the x*P(x) for each x value then get the total of the column as our mean.

 x P(x) x*P(x) 0 0.1 0*0.1=0 1 0.3 1*0.3=0.3 2 0.4 2*0.4=0.8 3 0.2 3*0.2=0.6 Total 1.7

Thus the mean of example 1 is  1.7 .

To find the variance of the distribution we need to add 3 new vertical columns onto our original table and a total row at the bottom of the table. The computations in each of the new columns are as follows:
In the first new column, subtract the mean from each x value (remember our mean is  1.7 )
 x P(x) x-mean 0 0.1 0 -  1.7  = -1.7 1 0.3 1 -  1.7  = -0.7 2 0.4 2 -  1.7  = 0.3 3 0.2 3 -  1.7  = 1.3 Total

In the second new column, square each answer from the first new column.
 x P(x) x-mean (x-mean)2 0 0.1 0 -  1.7  = -1.7 (-1.7)2 = 2.89 1 0.3 1 -  1.7  = -0.7 (-0.7)2 =0.49 2 0.4 2 -  1.7  = 0.3 0.32 =0.09 3 0.2 3 -  1.7  = 1.3 1.32 =1.69 Total

In the third new column, multiply each answer from the second new column by each probability and finally get the sum of the answers from this step.
 x P(x) x-mean (x-mean)2 (x-mean)2*P(x) 0 0.1 0 -  1.7  = -1.7 (-1.7)2 = 2.89 2.89*0.1 = 0.289 1 0.3 1 -  1.7  = -0.7 (-0.7)2 =0.49 0.49*0.3 =0.147 2 0.4 2 -  1.7  = 0.3 0.32 =0.09 0.09*0.4 =0.036 3 0.2 3 -  1.7  = 1.3 1.32 =1.69 1.69*0.2 = 0.338 Total 0.81

Thus our variance for example 1 is 0.81. Which means our standard deviation for this example is the square root of our variance or:

#### Checking a probability distribution for validity

posted Mar 17, 2014, 4:50 PM by Prof Kiernan

When you are asked if a probability distribution (table) is valid you need to answer 3 questions.

1. Does the sum of P(x) add up to any number other than 1 ?
2. Are there any negative probabilities?
3. Are there any probabilities larger than 1?

If you answer YES to any of the questions above your table is NOT a probability distribution.
If you answer NO to all of the questions above your table is a probability distribution.

Example 1:
 x P(x) 0 0.129 1 0.257 2 0.659 3 0.008 4 -0.053

1. No, the sum of P(x) adds up to 1
0.129+0.257+0.659+0.008+(-0.053) = 1.000
2. Yes, there are negative probabilities.
3. No, there not probabilities larger than 1.
Since we said there are negative probabilities, example 1 is NOT a probability distribution.

Example 2:
 x P(x) -2 0.2 -1 0.2 0 0.2 1 0.2 2 0.2

1. No, the sum of P(x) adds up to 1
0.2+0.2+0.2+0.2+0.2 = 1.000
2. No, there aren't any negative probabilities. The negative numbers are values for x, not the probability of x.
3. No, there aren't any probabilities larger than 1.
Since we said No to all of the questions, example 2 is a probability distribution.

Example 3:
 x P(x) 1 0.200 2 0.200 3 0.200 4 0.200 5 0.199

1. Yes, the sum of P(x) adds up to 0.999
0.2+0.2+0.2+0.2+0.199 = 0.999
2. No, there aren't any negative probabilities. The negative numbers are values for x, not the probability of x.
3. No, there aren't any probabilities larger than 1.
Since we said Yes to the first question, example 3 is NOT a probability distribution.

#### Correlation does not imply causation

posted Feb 6, 2014, 7:42 PM by Prof Kiernan   [ updated Feb 6, 2014, 7:44 PM ]

 An alternate explanation of why having a strong correlation doesn't imply causation.Source: XKCD Comics

#### Finding Weighted Means and Averages For Frequency Distributions

posted Jan 31, 2014, 1:51 PM by Prof Kiernan   [ updated Feb 5, 2014, 5:58 PM ]

Find the mean of this frequency distribution:

 class frequency 10-19 8 20-29 16 30-39 21 40-49 11 50-59 4

1. The first thing you need to do is to find the midpoint for each class.
 Class Midpoint 10-19 (10+19)/2 = 14.5 20-29 (20+29)/2 = 24.5 30-39 (30+39)/2 = 34.5 40-49 (40+49)/2 = 44.5 50-59 (50+59)/2 = 54.5

2. Next multiply each midpoint by the corresponding frequency and get the total.
 Midpoint Frequency Midpoint * frequency 14.5 8 14.5 * 8 =    116 24.5 16 24.5 * 16 =    392 34.5 21 34.5 * 21 = 724.5 44.5 11 44.5 * 11 = 489.5 54.5 4 54.5 * 4 =    218 Total: 60 Total: 1940

3. Finally divide the total of the Midpoint * frequency column by the total of the frequency column to get your mean.
1940 / 60 = 32.333333333...
Always round to one value beyond what your original data was (since our classes were whole numbers we should round to 1 decimal place). So our final answer for the mean of this frequency distribution is 32.3.

#### Calculator Shortcuts for graphs and histograms

posted Jan 26, 2014, 1:53 PM by Prof Kiernan

 To Enter data into a list: Press stat Press enter Put data into a list (remember the list number that displays at the top of the screen L1, L2, L3, L4, L5, or L6) To view graphsPress 2nd Press Y= (its just below your screen) Press enter Press enter Press the down arrow    Use your left and right arrow keys to highlight the histogram or type of graph you want to see Once you’ve highlighted the correct graph press enter Press the down arrow  the Xlist: should say the list number (to type in a list number you need to press 2nd  then the number of the list Freq should be: 1   Press zoom Press 9 Press trace Use your left and right arrow keys to view the different values for the histogram or graph Min = is the lower limit Max < is the upper limit n = is the frequency for that class The 2 screens  on your calculator will look like this:To clear a list:   Press stat Press 4Press 2nd Press the number of the list you want to clearPress enter

1-10 of 27