If you already have access to a copy of R and RStudio, you can skip to Required Packages
For OSU learners: since your access to TIGER was only temporary, you will need to download R and RStudio on your own computer if you want to continue working in this program.
For UCO learners: you may choose to download R and RStudio on your own computer, or continue to work in Buddy. If you want to download your own copies of the software, follow the instructions in this section.
You can find installations for Windows, MacOS, and Linux at the Comprehensive R Archive Network website.
After you have R installed on your computer, you can then install RStudio Desktop from the Posit website. If you scroll down, you can find installations for MacOS and Linux.
For OSU learners: since these practice problems are not taking place on TIGER, make sure you install the necessary packages on your personal version of R and RStudio.
For UCO learners: if you are continuing to use
Buddy, you should already have tidyverse
installed. If you
are working on your personal versions of R and RStudio, you will also
need to install the necessary packages if you do not have them installed
already.
install.packages("tidyverse") # only if you do not have it installed
library(tidyverse) # load the package
For this week’s optional practice problems, we have provided another dataset for you to use. This time, you will be downloading this file yourself and importing it into R.
This dataset is from Studying African Farmer-Led Irrigation (SAFI), a study that conducted interviews in Tanzania and Mozambique to assess farming and irrigation methods. Learn more about the dataset variables.
Download
to
download the datasetIf you are familiar with setting your working directory in R, use your preferred method to set your working directory to the folder that the dataset is in and skip to Read in the Dataset.
For Buddy users: You will need to import the file into Buddy using the import tool.
If you are new to R or have have limited experience importing data, you will need to set your working directory. We outline this process below.
To avoid making things too complicated, the reason this process was so seamless using TIGER and Buddy is because all of our files were in the main “folder” on our “computer”. If you are using your own computer and folders, we need to make sure R is paying attention to the correct location on your computer so it can find the files.
The simplest way to do this is to set the “Working Directory”. This is the location on your computer that R focuses its attention on. You can do this programmatically through code if you know the exact path to the folder with your data in it. For example:
setwd("C:/Users/username/OSU/Workshop_Files/Intro_R")
Alternatively, you can manually search for and set your working
directory by going to the Session
tab in RStudio >>
Set Working Directory
>>
Choose Directory
>> select the folder that contains
your dataset.
You can check that your working directory is set to the correct location by running the following code:
getwd()
Projects are an alternative to repeatedly setting the working directory. If you want to learn more, read about RStudio Projects in the RStudio User Guide.
Now that your working directory is set, you should be able to see the
dataset we downloaded if you go to the Files
tab on the
lower right of the RStudio interface. If you see a file called
SAFI_clean.csv
, everything is in order.
Import the dataset using the read_csv()
function since
it is a CSV file. Don’t forget to put the file name in " "
and include the file extension .csv
. Store the dataset as
an object called survey_data
so that we can reference it in
the next section.
survey_data <- read_csv("SAFI_clean.csv")
In this section, you will be writing code to subset, transform, and
create new variables from our survey_data
dataset we
imported.
bigger_households
:
village
, no_membrs
,
respondent_wall_type
, and rooms
no_membrs
)earth_households
:
respondent_wall_type
is either
“muddaub”, “burntbricks”, and “sunbricks”membrs_per_room
that
contains a calculation of the number of household members per room
(no_membrs / rooms
)wall_as_factor
that
contains the same data as respondent_wall_type
but
converted into a factor insteadfct_relevel()
)no_membrs
,
rooms
, membrs_per_room
, and
wall_as_factor
group_by()
and summarize()
approach to make the following comparisons between villages
(village
) from the original dataset
(survey_data
):
years_liv
)n()
)survey_50_years
:
str
to verify that interview_date
is a
date data typeyear
by extracting the
year from interview_date
. (Hint: the
function year()
can extract the year from a date)year
to a factorsurvey_50_years
subset that has the following
features:
year
(as a factor) on the x-axis and
no_membrs
on the y-axisoutlier.shape = NA
)geom_jitter
layer to plot the raw data points
and specify that
village
(but only
for the jitter layer)alpha =
)size =
)years_graph
Your exact customizations will vary, but the graph generally should look like the following:
survey_50_years
and the graph
you just made.
ggsave
and write_csv
will be
useful here.What was the most challenging aspect of this week’s workshop? Were you able to overcome it? If not, what assistance do you need to continue working through it?
What was the most rewarding aspect of this week’s workshop?