1 Before you start…
Please make a brand-new project for this exercise. This will be our exercise project where we’ll do all our exercises for the remainder of the course.
- To make a new project: Go to
File
->New Project
->New Directory
->New Project
Pick a new directory for your project to live. Choose a location that makes sense to you so you’ll be able to find it again. If you don’t change the location, the default setting will be used and your project will be created in your Documents/
folder.
Click Create project.
Now that you have a new project setup (you should see your project name in the top right of RStudio), create a new R script and save it.
- To make a new R script:
- Go to
File
->New File
->R Script
-or- - Click the top left icon with the green plus sign
- Go to
You can now save the script by clicking CTRL + S or by clicking the floppy disk icon. Name your script something pithy like ex_1_day_1.R
or frankie.R
.
You’re ready to go!
2 Inventory duty
We love our job as baking-bot, but the thing we love more than anything is taking inventory of ingredients. We like to play with the inventory data and present beautiful summaries to the lead baking-bot.
The bakery is expecting some big orders for the upcoming holidays and wants to be prepared. We’ve been asked to download the current inventory and determine if we are running low on any ingredients.
Let’s read in the inventory data!
3 Getting to know the data
Now that we have the data let’s take some steps to better understand the information in the file.
- How many rows are there, how many columns?
- What data types are the columns?
- What’s the general range of the values?
- ⭐ The lead baking-bot especially wants to know what the maximum value is for
NUMBER_OF_UNITS_IN_STOCK
for very important planning purposes.
Feel free to use any and all of the functions you know, but also feel free to use the hints below for inspiration!
Get to know the data
When we first get a dataset it is fun to sit down and get to know it better. Below are a few functions to choose from:
glimpse
summmary
head
tail
class
# glimpse() gives us that data type for each column and # a few sample values for each column glimpse() # summary() shows us missing values and stats for numeric columns summary()
glimpse(inventory) summary(inventory)
WHOA. We have 378 units of something in stock?! What are we making, cupcakes for a whole school?! A cake big enough to cover a bus?! Time to go investigate…
4 Filtering to the good stuff
How do we go about trying to figure out what’s going on with that wacky number? We need to be able to narrow down just to the row that meets the condition where NUMBER_OF_UNITS_IN_STOCK
is equal to 378. What tool do we have that will let us do that?
Investigation underway
Filter the inventory data to just the row that meets the condition of NUMBER_OF_UNITS_IN_STOCK
is equal to 378
# Remember to use the double equal sign (==) for "is equal to" filter(inventory, PLACE A CONDITION HERE)
filter(inventory, NUMBER_OF_UNITS_IN_STOCK == 378)
Look at that. We have a total row - that would a wrench in our summaries. That row is also adding NA
s to other columns which might make some of our analysis harder in the future. Let’s get rid of this summary row and keep only the real data.
NOTE: Lots of functions you’ll meet in R will have the additional argument
na.rm = TRUE
, which allows us to ignore NA values when calculating things like means, maxes, etc. Super handy!
Filter time
Filter the inventory data to drop the TOTAL row.
# Remember the not equal sign is "!=" inventory <- filter(inventory, PLACE A CONDITION HERE)
inventory <- filter(inventory, INGREDIENT != "TOTAL")
Let’s try summary
again and see if we can get some better results, hopefully no more NAs or outrageous numbers.
## INGREDIENT PRICE_PER_UNIT SIZE_UNITS
## Length:28 Min. : 0.490 Length:28
## Class :character 1st Qu.: 1.890 Class :character
## Mode :character Median : 2.795 Mode :character
## Mean : 3.639
## 3rd Qu.: 3.515
## Max. :17.790
## NUMBER_OF_UNITS_IN_STOCK ORDER_THRESHOLD
## Min. : 2.00 Min. : 2
## 1st Qu.: 8.75 1st Qu.: 5
## Median :12.00 Median :10
## Mean :13.50 Mean :11
## 3rd Qu.:18.25 3rd Qu.:15
## Max. :37.00 Max. :30
Much better. Okay, let’s see what we’ve been asked to do.
5 Place that order!
We need to use the data we received to determine what items to order. If the number of units we currently have in-stock is less than the order threshold, we need to place an order right away.
- Do we have any ingredients (rows in the data) that meet this criteria?
Time to order?
Filter the inventory data to the rows that meet the condition where the number of units in stock is less than the order threshold.
# You will need to compare 2 columns in the data order_now <- filter(inventory, PLACE A CONDITION HERE) order_now
order_now <- filter(inventory, NUMBER_OF_UNITS_IN_STOCK < ORDER_THRESHOLD) order_now
That’s just what I expected. We’ve been getting lots of orders for our Gluten-free Applesauce and Walnut muffins, and also for our Triple-Chocolate Brownies. Time for an urgent supply order so we don’t run out!
The last thing to do is to write/save the filtered data.
Excellent! We’ll send this out so we get those ingredients right away. Even better, next inventory time we’ll have these scripted steps that we can use again and again and again…well you get the idea.
NICE WORK!