For this week’s optional practice problems, we have provided another dataset for you to use.
midwest
is a dataset that’s available in R as part of
the ggplot2
pacakge, similar to the
complete_old
dataset we have been using from the
ratdat
package. The midwest
dataset has
demographic information for counties from several Midwest states from
the 2000 U.S. Census.
?midwest
str
to learn more about data types in the
midwest
dataset. Of the primary vector types we discussed
in this week’s workshop (character, interger, numeric, logical), which
are represented in this dataset?Answer: character, integer, and numeric. There are no logical vectors.
str(midwest)
## tibble [437 × 28] (S3: tbl_df/tbl/data.frame)
## $ PID : int [1:437] 561 562 563 564 565 566 567 568 569 570 ...
## $ county : chr [1:437] "ADAMS" "ALEXANDER" "BOND" "BOONE" ...
## $ state : chr [1:437] "IL" "IL" "IL" "IL" ...
## $ area : num [1:437] 0.052 0.014 0.022 0.017 0.018 0.05 0.017 0.027 0.024 0.058 ...
## $ poptotal : int [1:437] 66090 10626 14991 30806 5836 35688 5322 16805 13437 173025 ...
## $ popdensity : num [1:437] 1271 759 681 1812 324 ...
## $ popwhite : int [1:437] 63917 7054 14477 29344 5264 35157 5298 16519 13384 146506 ...
## $ popblack : int [1:437] 1702 3496 429 127 547 50 1 111 16 16559 ...
## $ popamerindian : int [1:437] 98 19 35 46 14 65 8 30 8 331 ...
## $ popasian : int [1:437] 249 48 16 150 5 195 15 61 23 8033 ...
## $ popother : int [1:437] 124 9 34 1139 6 221 0 84 6 1596 ...
## $ percwhite : num [1:437] 96.7 66.4 96.6 95.3 90.2 ...
## $ percblack : num [1:437] 2.575 32.9 2.862 0.412 9.373 ...
## $ percamerindan : num [1:437] 0.148 0.179 0.233 0.149 0.24 ...
## $ percasian : num [1:437] 0.3768 0.4517 0.1067 0.4869 0.0857 ...
## $ percother : num [1:437] 0.1876 0.0847 0.2268 3.6973 0.1028 ...
## $ popadults : int [1:437] 43298 6724 9669 19272 3979 23444 3583 11323 8825 95971 ...
## $ perchsd : num [1:437] 75.1 59.7 69.3 75.5 68.9 ...
## $ percollege : num [1:437] 19.6 11.2 17 17.3 14.5 ...
## $ percprof : num [1:437] 4.36 2.87 4.49 4.2 3.37 ...
## $ poppovertyknown : int [1:437] 63628 10529 14235 30337 4815 35107 5241 16455 13081 154934 ...
## $ percpovertyknown : num [1:437] 96.3 99.1 95 98.5 82.5 ...
## $ percbelowpoverty : num [1:437] 13.15 32.24 12.07 7.21 13.52 ...
## $ percchildbelowpovert: num [1:437] 18 45.8 14 11.2 13 ...
## $ percadultpoverty : num [1:437] 11.01 27.39 10.85 5.54 11.14 ...
## $ percelderlypoverty : num [1:437] 12.44 25.23 12.7 6.22 19.2 ...
## $ inmetro : int [1:437] 0 0 0 1 0 0 0 0 0 1 ...
## $ category : chr [1:437] "AAR" "LHR" "AAR" "ALU" ...
Create a graph from the midwest
data that compares
population density between states. Follow the instructions below:
state
on the x-axis to popdensity
on the y-axis.geom_jitter
to create the scatter plot layer.
state
.shape =
). Shape is identified
using integers, and these are some of your point shape options:size =
).geom_boxplot
to create the boxplot layer.
fill
color.outlier.shape
to avoid double-plotting
outliers.theme_classic()
,
theme_bw()
)theme
function so
that:
The code and graph you created should look similar to the following:
ggplot(data = midwest,
aes(x = state,
y = popdensity)) +
geom_jitter(aes(color = state),
shape = 18,
alpha = 0.6,
size = 3) +
geom_boxplot(outlier.shape = NA,
fill = NA) +
theme_bw() +
labs(title = "Midwest Population Demographics (2000)",
x = "State",
y = "Population Density (person/unit area)") +
theme(plot.title = element_text(size = 16,
face = "bold"),
legend.position = "none")
Compute the following calculations on the midwest
dataset:
poptotal
)?max(midwest$poptotal)
## [1] 5105067
popdensity
)?min(midwest$popdensity)
## [1] 85.05
percollege
)? The
quantile
function will be helpful here.quantile(midwest$percollege, prob = c(0.25,0.5,0.75))
## 25% 50% 75%
## 14.11372 16.79756 20.54989
popadults
)?mean(midwest$popadults)
## [1] 60972.61
weird_seq
.weird_seq <- seq(from = 8, to = 85, by = 7)
# print out list values
weird_seq
## [1] 8 15 22 29 36 43 50 57 64 71 78 85
year_seq
.year_seq <- seq(from = 1900, to = 2025, by = 5)
# print out list values
year_seq
## [1] 1900 1905 1910 1915 1920 1925 1930 1935 1940 1945 1950 1955 1960 1965 1970
## [16] 1975 1980 1985 1990 1995 2000 2005 2010 2015 2020 2025
seq_length
.seq_length <- seq(from = 12, to = 22, length.out = 47)
# print out list values
seq_length
## [1] 12.00000 12.21739 12.43478 12.65217 12.86957 13.08696 13.30435 13.52174
## [9] 13.73913 13.95652 14.17391 14.39130 14.60870 14.82609 15.04348 15.26087
## [17] 15.47826 15.69565 15.91304 16.13043 16.34783 16.56522 16.78261 17.00000
## [25] 17.21739 17.43478 17.65217 17.86957 18.08696 18.30435 18.52174 18.73913
## [33] 18.95652 19.17391 19.39130 19.60870 19.82609 20.04348 20.26087 20.47826
## [41] 20.69565 20.91304 21.13043 21.34783 21.56522 21.78261 22.00000
What was the most challenging aspect of this week’s workshop? Were you able to overcome it? If not, what assistance do you need to continue working through it?
What was the most rewarding aspect of this week’s workshop?