# Mini-Project 3Spring 2016

Mini-Project 3 Math 243 Spring 2016(Total: 60 points)Overview: You will produce an experimental binomial distribution, compare it with the theoreticalapproximation, and then look at confidence intervals based on some of your data, and play a “guessmy number” game with a classmate. Then similarly you will make experimental views of samplingdistributions for means, and make confidence intervals, using the data set from mini project one.We use bold face type to indicate work which will ultimately be turned in.Part one, simulation of a binomial distribution, and confidence intervalsTake the last two numbers of your UO ID. We will call this P, and use it (divided by 100) as yourbasic proportion. You can think of this as the “parameter” for this part of the mini-project, thetrue proportion (say of voters who support some candidate).Produce a table of random numbers between 1 and 100, which is 50 columns wide by 120 rowsdeep. On Google sheets you will need to widen the spreadsheet, which you can do by clicking thesquare under the fx symbol and above the 1st row label in order to highlight the sheet, and theninserting 28 columns.In a column to the right use a counting function, such as countif on Google sheets, to count thenumber of entries in each row which are less than P + 1. Then make a histogram of this column ofthese counts, with bin size one.Copy this histogram to a word document and then in the first section of that documentanswer the following questions:• How does the mean of this data compare with the theoretical prediction?• How does the standard deviation of this data compare with the standard deviationof the normal approximation?• How does the overall shape of this data compare with the normal approximation?Next, we take your first twelve rows and think of them as twelve random samples. For each one ofthese twelve, take your observed value, convert it to a proportion, and calculate a 90% confidenceinterval based on that proportion. (This will go quickly after you’ve done the first couple.)In the next section of your word document, list your twelve confidence intervals (aspercentages). Report in how many of those twelve the true percentage P lies. Is whatyou observed surprising or not?Then play a game with a classmate: tell them the first confidence interval and have them guessyour number P. Record their guess but do not tell them whether they were right or wrong. Havethem guess again after learning of the second confidence interval, the third one, etc.Record your classmate’s guesses, in order, in the word document. At which pointwere they “sure” of the true P?Finally, calculate how many trials would be needed to produce a proportion which is within onepercentage point (so a margin of error of 0.005) with 90% confidence.Give your answer for this number of trials with a bit of explanation in the word document.(You do not need to use equation editor – you can use the words “square root” insteadof the symbol, etc. You also do not need to run the experiment to see if this sample size gets towithin one percent.)Part two, sampling distributionsFor this part you are to use again the data set from the first mini-project which was assigned toyou, if there is a column with at least 200 entries. If there is not such a column, use a “nearby”data set from the list which does have some column with 200 entries.Calculate the mean and standard deviation of this column, and make a histogram. (You can usewhat you did for mini-project one.)Warm up by first learning how to take a single random sample from this column. Make a randomnumber between 1 and the height of the column (at least 200). Then use an index function to callup the value of the cell indexed by a random number in column A1. In Google sheets, if your datais in column C, this will be done with the function “=INDEX($C$1:$C$300,A1)”. Here 300 waschosen because the column has less than 300 entries. Any number greater than or equal to thenumber of entries would do. Note that if the values of your column started in the fourth row, thenthe command would be “=INDEX($C$4:$C$300,A1)”.We use this to make experimental sampling distributions, one of sample size 9, with forty samples,and one of sample size 25, with forty samples. Here is one way to do this. First, take yourcolumn and delete any empty cells (moving things up). Make a single table of random numbersbetween 1 and the size of your column (again at least 200), which is 25 columns wide by 40 rows tall.For the sample size of 9, make a 9 by 40 table of numbers all sampled from your column, based onthe first 9 columns. In Google sheets, the copy function allows one to do this. (Keeping the columnfixed even when we use the copy function is the reason we use the $’s in the INDEX command.)It is best to label this as “Sampled values” in your spreadsheet. Then for each of those rows takethe mean. That will give a column of size 40, which is your sampled distribution of sample means.Calculate its mean, its standard deviation, and make a histogram.Next, make a full 25 by 40 table of sampled values. Take their means to again get a column of size40, and calculate its mean, its standard deviation, and make a histogram.Copy your original histogram, the histogram for sample size nine, and the histogramfor sample size 25 all to a word document. In that document, report as well the meanand standard deviation of each. Then discuss how this compares with the theoreticalpredictions:• What happens to the mean for the sample distributions?• What happens to the standard deviation for the sample distributions?• What happens to the shape as the sample size increases?Finally, Report 90% confidence intervals based on the first twelve samples (the firsttwelve values in each column where you recorded sample means) of each size. Howmany contain the true mean? Is that result surprising or expected?Of course, in this case we have the original data set and so access to any parameters, not just thestatistics obtained by sampling. But by looking at sampling distributions and confidence intervalsin this case, we can understand what happens even when the population is large.SummaryYou will turn in one word document for the binomial/ proportion/ polling work, and one worddocument for the sampling distribution work.The binomial/ proportion document will contain a histogram showing the results of counting randomnumbers less than your number over 120 trials, some discussion of how the observed experimentcompares with theory, as well as twelve confidence intervals and discussions of those. Finally,you find how large a sample would be needed to be 90% confident to within half a percentage point.The sampling distribution document will contain three histograms showing the original data set, anexperimental view of the sampling distribution of size 9, and and experimental view of the samplingdistribution of size 25. You will also give some discussion of how the observed experiments comparewith theory, and twenty four confidence intervals (twelve for each sample size) and discussions ofthose.