QUIZ 2 Part I: The R function hist() is able to handle only quantitative data. For qualitative/categorical data, in one of the hw problems, I showed you a trick for making a histogram using the plot() function. Maybe you didn't do that hw problem at all, so to make sure you know how to make a histogram for categorical data a) Write code to read the categorical data in qz2_1_dat.txt (on the course website) into R, calle the first column x, and use the plot() function to make a histogram for x. Hint: you need to literally look at the txt file. dat = read.table("http://sites.stat.washington.edu/marzban/390/spring21/qz2_1_dat.txt", header=T) x = dat[,1] plot(as.factor(x)) # set.seed(123) # x = c(rep("A",10), rep("B", 20), rep("C",15), rep("D",10), rep("F",2)) # x = sample(x, length(x), rep=F) # write(paste("grade1","grade2"), file="qz2_1_dat.txt") # write(t(cbind(x,sort(x))), file="qz2_1_dat.txt", ncol=2, app=T) Part II: The function plot() does more than just making histograms. To get some practice with it, do this: X = c(1, 2, 4, 7) Y = c(2, 4, 6, 8) plot(X,Y, type="p") plot(Y, type="p") Don't get distracted by the x, y labels. I can explain what plot() has done, and you can even look at the help pages on plot() to see what it is designed to do; BUT DON'T! Instead, learn how to "reverse engineer" things. Specifically, look at the figure than was made by plot(), and figure out what it did. There are many other ways to make a histogram of categorical data. Another one involves the function table(). Don't look at the help pages, because you'll be overwhelmed with mumbo jumbo. Instead, b) Run table(x), look at what it has returned, and reverse engineer what it has done in this case. Explain it in words. It has returned the frequency of each level of x. c) Write code to make a histogram of x, this time using the function table(). Feel free to experiment, and again, don't worry about the labels of the x, y axes, or the width of the bars, because for us anything that graphically displays frequencies is a good-enough histogram. plot(table(x)) # Note that, the following "works" but it actually does not give the correct histogram; the frequencies are on the x-axis! hist(table(x), breaks=20) Part III: All R functions, even the graphical ones, return something. And the returned values can be used to do more things. To learn how to do that, run the following: dat = read.table("http://sites.stat.washington.edu/marzban/390/spring21/qz2_2_dat.txt") y = dat[,1] H = hist(y) # direct the returned values into something we call H. names(H) # This tells us what is in H. Most of what's returned you should recognize already. To confirm, let's look at the values of one them, say, mids. This is how it's done: H$mids By comparing the values of H$mids with the actual histogram, and by using common sense (e.g., the name "mids"), you can guess that these numbers denote the middle of the bars. d) Based on everything we learned above, write code to make a *relative* frequency histogram of y. Although, usually, a histogram does not have to be displayed with vertical bars, for this question it should be. So, consult the help pages to see how to display vertical bars. Hint: Look at "type." The width of the bars, however, is NOT important. The axis labels and titles are also NOT important. plot(H$mids, H$counts/length(y), type="h") e) Write code to make a histogram of y with about 100 breaks; BUT, instead of showing the frequencies on the y-axis, show the log() of the frequencies. H = hist(y, breaks=100) plot(H$mids, log(H$counts), type = "h") Morals: The important content in a histogram is a graphical display of frequencies. The rest, e.g., whether the frequencies are displayed as bars, or point, etc. is all irrelevant. And those frequencies can be obtained by different R functions. Also, the function hist() may not make the kind of histogram you may want, but by saving what it returns, one can make histogram of anything one desires by transforming the x and/or y coordinates. Specifically, when the histogram of some variable is exponential-looking, it's common practice to show the log of the frequencies; and if/when the resulting histogram looks "linear," then you're in luck, because the histogram of x belongs to a special class of histogram called "exponential" (not just exponential-looking).