class: center, bottom background-image: url("start.jpg") <span style="font-size: 50pt; ">See What You R</span> <br><br> <span style="font-size: 20pt; ">Budhaditya Basu</span> <img src="rgcb.png" width="15%" align="left"/> --- class: center, middle # What is the purpose of this Hands-on session? -- To generate following plots -- To get a hang of R codes -- To build a R community within RGCB --- class: center, middle background-image: url("stack.jpg") background-size: contain --- class: center, middle # Let's have a glimpse of the plots that R can deliver... --- class: center, middle background-image: url("PlotsCombined.jpg") background-size: contain --- class: center, middle background-image: url("R_course.jpg") background-size: contain --- class: center, middle .footnote[
] # R is a language and environment <br>generally used for -- Data Analysis -- Data Visualization -- Statistics -- Making professional CV/Resume -- Visual Blogs -- Professional Website -- Academic Paper/Book writing -- And many more.. --- # What is R for a Biologist? .footnote[
] -- .dark-pink[Highthroughput data analysis] - RNA-Seq, ChIP-Seq etc. - Single-cell RNA-Seq - Proteomics study -- .dark-pink['Out-of-the-box' plotting capabilities] - Better representation of data - Better insights of complex data - Interesting data visualization -- .dark-pink[Hundreds of .underline['Bioconductor'] packages for biological data analysis] - An open source platform for R packages related to biological data analysis --- class: center, middle <span style="font-size: 90pt; color: steelblue ">1. Install R </span> --
=
-- Install **R** from R project homepage: https://cran.r-project.org/ -- You are recommended to use the [RStudio IDE](https://www.rstudio.com/products/rstudio/) --- # Rstudio has four main panels -- ![](r_panels.png) --- # Rstudio has four main panels -- - .dark-pink[Scripts] are the records of codes (like a recipe) -- - .dark-pink[Console] is the place where a code is executed (cooking) -- - .dark-pink[Environment] is like a kitchen where ingredients (data) and finished dishes (output) can be found. -- - .dark-pink[Files] are the ingredients and .dark-pink[packages] are like tools (saucepan) in a kitchen. --- # How to execute a code? -- - To execute a line of code where cursor currently resides, press <br> **Ctrl + Enter** or use **Run** toolbar button -- <img src="run_code.jpg" width="80%" align="center"/> -- - Multiple lines of code can be selected and executed by similar way. --- class: center, middle # Set the working directory .footnote[
] -- Before running any code, we need to set the working directory. -- Go to .dark-blue[Session > Set Working Directory > Choose Directory] -- <img src="setwd.jpg" width="80%" align="center"/> --- class: center, middle # Set the working directory .footnote[
] Alternatively, we can set the working directory by using <br> `setwd("path to directory")` -- <img src="setwd_1.png" width="80%" align="center"/> --- class: center, middle # Get Started .footnote[
] -- Open R Studio -- .dark-blue[File > New File > R Script] --- #Variable Assignment .footnote[
] -- .pull-left[ ```r num <- 2 ``` - `num` holds the value 2 ] -- .pull-right[ - Here, `num` is a <span style="color: deeppink;"> "variable"</span> - <- is <span style="color: deeppink;">"assignment operator"</span> ] -- .pull-left[ Now, if we execute the variable `num`, it will return the value 2 ] -- .pull-right[ ```r num ``` ``` ## [1] 2 ``` ] -- .pull-left[ ```r 150 -> fruits fruits ``` ``` ## [1] 150 ``` ] -- .pull-right[ - Both operators are same. - Simply "left-form" and "right-form" ] --- #Variable types in R .footnote[
] -- - <span style="color: blue;">int</span> for integers. - <span style="color: blue;">dbl</span> for doubles or real numbers. - <span style="color: blue;">chr</span> for character - <span style="color: blue;">dttm</span> for date-time - <span style="color: blue;">lgl</span> for logical vectors like TRUE or FALSE - <span style="color: blue;">fct</span> for categorical variables --- #Convention for naming variables .footnote[
] -- .pull-left[ ```r my_name <- "Budhaditya" my_weight <- 66.678 print(my_name) ``` ``` ## [1] "Budhaditya" ``` ] -- .pull-right[ - It cannot include <span style="color: deeppink;">space</span>. <br><br> e.g. <span style="color: blue;">"my name"</span> is not valid. But <br><br><span style="color: blue;">"my_name"</span> or <span style="color: blue;">"my.name"</span> is perfect ] -- .pull-left[ ```r "cat" == "cat" ``` ``` ## [1] TRUE ``` ```r "cat" == "CAT" ``` ``` ## [1] FALSE ``` ] -- .pull-right[ <br><br><br><br> - Variable is <span style="color: deeppink;">case sensitive</span>. ] --- #Vector .footnote[
] -- .pull-left[ ```r my_value <- c(1, 3, 9, 27) my_value ``` ``` ## [1] 1 3 9 27 ``` ] -- .pull-right[ - Variable that stores multiple values are <span style="color: deeppink;">"Vector"</span> - c ( ) is called <span style="color: deeppink;">"Combine function"</span> ] -- .pull-left[ ```r my_value[3] ``` ``` ## [1] 9 ``` ] -- .pull-right[ - Square brackets [] identify the position within a vector **(index)** - This index can be used to extract relevant values ] -- .pull-left[ ```r my_value[3] <- "gene" my_value ``` ``` ## [1] "1" "3" "gene" "27" ``` ] -- .pull-right[ - **Index** can be used to modify the vector ] --- class: center, middle <span style="font-size: 90pt; color: steelblue; ">2. Packages </span> <img src="package.jpg" width="30%" align="center"/> --- #Installing and Loading of Packages .footnote[
] <img src="floppy.jpg" width="30%" align="right"/> -- - Packages extend the capabilities of <span style="font-size: 15pt; color: deeppink; "> R </span> -- - <span style="font-size: 15pt; color: deeppink; "> "Installing" </span> is downloading the package into your computer. -- - <span style="font-size: 15pt; color: deeppink; "> "Loading" </span> is telling R to use it. <br> -- - ...you need to <span style="font-size: 15pt; color: deeppink; "> "install" </span> a package only once. -- - ...but you have to <span style="font-size: 15pt; color: deeppink; "> "load" </span> it every time. <br> -- - <span style="font-size: 15pt; color: deeppink; "> "Install" </span> using `install.packages()` -- - <span style="font-size: 15pt; color: deeppink; "> "Load" </span> using `library()` --- class: center, middle background-image: url("thoughts.jpg") background-size: cover #...Time for break --- #R Packages .footnote[
] -- - "Packages" are collection of functions and data sets -- - <span style="font-size: 15pt; color: deeppink; "> "dplyr" </span> is used for data manipulation. -- - <span style="font-size: 15pt; color: deeppink; "> "ggplot2" </span> for data visualization. -- - <span style="font-size: 15pt; color: deeppink; "> "tidyverse" </span> is a bundle of 8 packages together. <br> -- <img src="tidyverse_packages.png" width="60%" height="50%" style="display: block; margin: auto;" /> --- #R function .footnote[
] -- - <span style="font-size: 15pt; color: deeppink; "> "Functions" </span> are meant for doing calculation. -- .pull-left[ ```r sqrt(25) ``` ``` ## [1] 5 ``` ```r log2(8) ``` ``` ## [1] 3 ``` ```r round(x = 3.141592, digits = 2) ``` ``` ## [1] 3.14 ``` ] -- <br><br> .pull-right[ - `sqrt()`, `log2()` or `round()` are the examples of <span style="font-size: 15pt; color: deeppink; "> functions </span><br><br> - (<span style="font-size: 15pt; color: seagreen; "> "Arguments" </span>) are kept within the parentheses.<br><br> - If multiple arguments present, that must be separated by comma. ] --- class: inverse, center, middle background-image: url('data.jpg') <span style="font-size: 90pt; color: white; ">3. Data Frames </span> --- #Data Frames .footnote[
] <img src="palmer.png" width="30%" align="right"/> -- .pull-left[ `install.packages("palmerpenguins")` ```r library(palmerpenguins) *df <- penguins ``` <br><br> - `penguins` data set is assigned to the variable named `df` ] --- #Data Frames .footnote[
] - `head()` function shows first few rows of the data frame `df` -- ```r head(df) ``` --- #Data Frames .footnote[
] -- .pull-left[ ```r df[1, 3] ``` ``` ## # A tibble: 1 x 1 ## bill_length_mm ## <dbl> ## 1 39.1 ``` ] -- .pull-right[ - Give me the value stored in <span style="font-size: 15pt; color: deeppink; "> first row, third column</span> of `df` ] --- #Data Frames .footnote[
] - Give me the values of <span style="font-size: 15pt; color: deeppink; "> first row</span> of `df` (all columns) ```r df[1,] ``` --- #Data Frames .footnote[
] -- .pull-left[ ```r df[,3] ``` ``` ## # A tibble: 344 x 1 ## bill_length_mm ## <dbl> ## 1 39.1 ## 2 39.5 ## 3 40.3 ## 4 NA ## 5 36.7 ## 6 39.3 ## 7 38.9 ## 8 39.2 ## 9 34.1 ## 10 42 ## # i 334 more rows ``` ] -- .pull-right[ - Give me the values of <span style="font-size: 15pt; color: deeppink; "> third column </span> of `df` (all rows) ] --- #Data Frames .footnote[
] -- .pull-left[ ```r df[1:2, 2:4] ``` ``` ## # A tibble: 2 x 3 ## island bill_length_mm bill_depth_mm ## <fct> <dbl> <dbl> ## 1 Torgersen 39.1 18.7 ## 2 Torgersen 39.5 17.4 ``` ] -- .pull-right[ - Give me the values of <span style="font-size: 15pt; color: deeppink; "> rows 1-2 and columns 2-4 </span> of `df` (all rows) ] --- #Data Frames .footnote[
] -- .pull-left[ ```r df[, "island"] ``` ``` ## # A tibble: 344 x 1 ## island ## <fct> ## 1 Torgersen ## 2 Torgersen ## 3 Torgersen ## 4 Torgersen ## 5 Torgersen ## 6 Torgersen ## 7 Torgersen ## 8 Torgersen ## 9 Torgersen ## 10 Torgersen ## # i 334 more rows ``` ] -- .pull-right[ - Prints the data stored in "island" column ] --- # Matrix -- - In a data frame the columns contain different types of data, but in a matrix all the elements are the same type of data. -- - A matrix in R is like a mathematical matrix, containing all the same type of thing (usually numbers). -- - For example when we work with RNA-Seq data we use a matrix of read counts. So it will be worth our time to learn to use matrices as well. -- ```r is.data.frame(mtcars) ``` ``` ## [1] TRUE ``` -- ```r is.matrix(mtcars) ``` ``` ## [1] FALSE ``` --- # Matrix -- ### Convert Data frame to matrix ```r mat <- as.matrix(mtcars) ``` -- ### Check the matrix ```r is.matrix(mat) ``` ``` ## [1] TRUE ``` --- # Generic X-Y Plotting ```r barplot(c(2,5,7,10)) ``` ![](intro2R_files/figure-html/unnamed-chunk-43-1.png)<!-- --> --- # Generic X-Y Plotting ```r library(RColorBrewer) ``` ``` ## Warning: package 'RColorBrewer' was built under R version 4.1.3 ``` ```r barplot(c(2,5,7,10), col = brewer.pal(n = 4, name = "RdBu")) ``` ![](intro2R_files/figure-html/unnamed-chunk-44-1.png)<!-- --> -- - To know more about `RColorBrewer` use `help("RColorBrewer")` --- # Generic X-Y Plotting ```r head(ToothGrowth) ``` ``` ## len supp dose ## 1 4.2 VC 0.5 ## 2 11.5 VC 0.5 ## 3 7.3 VC 0.5 ## 4 5.8 VC 0.5 ## 5 6.4 VC 0.5 ## 6 10.0 VC 0.5 ``` -- - The length of the odontoblasts (teeth) in each of 10 guinea pigs at three Vitamin C dosage levels (0.5, 1, and 2 mg) with two delivery methods (orange juice or ascorbic acid (Vitamin C)). --- # Generic X-Y Plotting ```r boxplot(len ~ dose, data = ToothGrowth, frame = FALSE, col = brewer.pal(n = 3, name = "Blues")) ``` ![](intro2R_files/figure-html/unnamed-chunk-46-1.png)<!-- --> --- # Generic X-Y Plotting ```r boxplot(len ~ dose, data = ToothGrowth, frame = FALSE, col = c("#999999", "#E69F00", "#56B4E9")) ``` ![](intro2R_files/figure-html/unnamed-chunk-47-1.png)<!-- --> --- # Generic X-Y Plotting ```r dat <- data.frame(group = c("Male", "Female", "Child"), value = c(25, 25, 50)) ``` -- ```r pie(dat$value, labels = dat$group, radius = 1, #$ sign used to select a column col = brewer.pal(n = 3, name = "PiYG")) ``` ![](intro2R_files/figure-html/unnamed-chunk-49-1.png)<!-- --> --- # Generic X-Y Plotting ```r hist(mtcars$mpg, col = "steelblue", xlab = "", main = "Histogram") ``` ![](intro2R_files/figure-html/unnamed-chunk-50-1.png)<!-- --> --- # Generic X-Y Plotting ```r # Compute the density data dens <- density(mtcars$mpg) # plot density plot(dens, frame = FALSE, col = "steelblue", main = "Density plot of mpg") ``` ![](intro2R_files/figure-html/unnamed-chunk-51-1.png)<!-- --> --- # Generic X-Y Plotting ```r # Create some variables x <- 1:10 y1 <- x*x y2 <- 2*y1 ``` --- # Generic X-Y Plotting ```r # Create a first line #p= points, l= lines, b = both, plot(x, y1, type = "b", frame = FALSE, pch = 19, col = "red", xlab = "x", ylab = "y") # Add a second line lines(x, y2, pch = 18, col = "blue", type = "b", lty = 2)#lty=1; solid line, lty=2;dashed line # Add a legend to the plot legend("topleft", legend=c("Line 1", "Line 2"), col=c("red", "blue"), lty = 1:2, cex=0.8) ``` ![](intro2R_files/figure-html/unnamed-chunk-53-1.png)<!-- --> --- # Generic X-Y Plotting ```r wt <- mtcars$wt # weight in pounds mpg <- mtcars$mpg #miles per gallon ``` -- ```r plot(wt, mpg, main = "Scatter plot", xlab = "X axis title", ylab = "Y axis title", pch = 19, frame = F) ``` ![](intro2R_files/figure-html/unnamed-chunk-55-1.png)<!-- --> --- # Generic X-Y Plotting - We have so far created the following plots ![](intro2R_files/figure-html/unnamed-chunk-56-1.png)<!-- --> --- # How to save a plot? - To save the plot as .tiff file, use ```r tiff(filename = "Boxplot.tiff", height = 4, width = 6, units = "in", res = 300) boxplot(len ~ dose, data = ToothGrowth, frame = FALSE, col = c("#999999", "#E69F00", "#56B4E9")) dev.off() ``` - To save the plot as .jpg file, use ```r jpeg(filename = "Boxplot.jpg", height = 4, width = 6, units = "in", res = 300) boxplot(len ~ dose, data = ToothGrowth, frame = FALSE, col = c("#999999", "#E69F00", "#56B4E9")) dev.off() ``` --- #Pipe %>% .footnote[
] -- .pull-left[ - We <span style="font-size: 15pt; color: deeppink; "> "pipe" </span> our data through several functions. - The data set flows through the cascade of operations <img src="pipe.jpg" width="70%" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r library(tidyverse) df %>% group_by(species) %>% drop_na()%>% summarize(mean(bill_length_mm)) ``` ``` ## # A tibble: 3 x 2 ## species `mean(bill_length_mm)` ## <fct> <dbl> ## 1 Adelie 38.8 ## 2 Chinstrap 48.8 ## 3 Gentoo 47.6 ``` ] -- <img src="culmen_depth.png" width="35%" style="display: block; margin: auto;" /> --- class: inverse, center, middle background-image: url('tidy.jpg') <span style="font-size: 90pt; color: white; ">4. Tidying Data </span> --- #dplyr verbs .footnote[
] -- .pull-left[ - `select()` <span style="color: deeppink; "> << </span> - `select()` function selects only user defined columns from a data frame <br><br> <img src="dplyr_logo.png" width="35%" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r iris %>% * select(Sepal.Length, * Petal.Width)%>% head() ``` ``` ## Sepal.Length Petal.Width ## 1 5.1 0.2 ## 2 4.9 0.2 ## 3 4.7 0.2 ## 4 4.6 0.2 ## 5 5.0 0.2 ## 6 5.4 0.4 ``` ] --- #dplyr verbs .footnote[
] -- .pull-left[ - `select()` - `filter()` <span style="color: deeppink; "> << </span> - `filter()` function helps to subset a data frame that satisfy a user defined condition. <br><br> <img src="filter.jpg" width="100%" height="100%" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r iris %>% select(Sepal.Length, Petal.Width)%>% * filter(Petal.Width > 2.3) ``` ``` ## Sepal.Length Petal.Width ## 1 6.3 2.5 ## 2 7.2 2.5 ## 3 5.8 2.4 ## 4 6.3 2.4 ## 5 6.7 2.4 ## 6 6.7 2.5 ``` ] --- #dplyr verbs .footnote[
] -- .pull-left[ - `select()` - `filter()` - `arrange()` <span style="color: deeppink; "> << </span> <br><br><br> - It arranges low to high ] -- .pull-right[ ```r iris %>% select(Sepal.Length, Petal.Width)%>% filter(Petal.Width > 2.3)%>% * arrange(Petal.Width) ``` ``` ## Sepal.Length Petal.Width ## 1 5.8 2.4 ## 2 6.3 2.4 ## 3 6.7 2.4 ## 4 6.3 2.5 ## 5 7.2 2.5 ## 6 6.7 2.5 ``` ] --- #dplyr verbs .footnote[
] -- .pull-left[ - `select()` - `filter()` - `arrange(desc())` <span style="color: deeppink; "> << </span> <br><br><br> - It arranges high to low ] -- .pull-right[ ```r iris %>% select(Sepal.Length, Petal.Width)%>% filter(Petal.Width > 2.3)%>% * arrange(desc(Petal.Width)) ``` ``` ## Sepal.Length Petal.Width ## 1 6.3 2.5 ## 2 7.2 2.5 ## 3 6.7 2.5 ## 4 5.8 2.4 ## 5 6.3 2.4 ## 6 6.7 2.4 ``` ] --- #dplyr verbs .footnote[
] -- .pull-left[ - `select()` - `filter()` - `arrange(desc())` - `rename()` <span style="color: deeppink; "> << </span> <br><br><br> - It renames the column name ] -- .pull-right[ ```r iris %>% select(Sepal.Length, Petal.Width)%>% filter(Petal.Width > 2.3)%>% arrange(desc(Petal.Width))%>% * rename(petal_width = Petal.Width, * sepal_length = Sepal.Length) ``` ``` ## sepal_length petal_width ## 1 6.3 2.5 ## 2 7.2 2.5 ## 3 6.7 2.5 ## 4 5.8 2.4 ## 5 6.3 2.4 ## 6 6.7 2.4 ``` ] --- #dplyr verbs .footnote[
] -- .pull-left[ - `select()` - `filter()` - `arrange(desc())` - `rename()` - `mutate()` <span style="color: deeppink; "> << </span> <br><br><br> - It creates one or more column ] -- .pull-right[ ```r iris %>% select(Petal.Length, Petal.Width)%>% filter(Petal.Width > 2.3)%>% arrange(desc(Petal.Width))%>% rename(petal_width = Petal.Width, petal_length = Petal.Length)%>% * mutate(petal = petal_width + petal_length) ``` ``` ## petal_length petal_width petal ## 1 6.0 2.5 8.5 ## 2 6.1 2.5 8.6 ## 3 5.7 2.5 8.2 ## 4 5.1 2.4 7.5 ## 5 5.6 2.4 8.0 ## 6 5.6 2.4 8.0 ``` ] --- #dplyr verbs .footnote[
] -- .pull-left[ - `select()` - `filter()` - `arrange(desc())` - `rename()` - `mutate()` - `summarise()`<span style="color: deeppink; "> << </span> ] --- #dplyr verbs .footnote[
] ```r penguins %>% group_by(species) %>% drop_na()%>% select(bill_length_mm, body_mass_g, flipper_length_mm)%>% * summarize(across(everything(), mean)) ``` ``` ## # A tibble: 3 x 4 ## species bill_length_mm body_mass_g flipper_length_mm ## <fct> <dbl> <dbl> <dbl> ## 1 Adelie 38.8 3706. 190. ## 2 Chinstrap 48.8 3733. 196. ## 3 Gentoo 47.6 5092. 217. ``` - across() function used as nested function here to apply summarize() function to multiple column<br><br> - everything() used here for selecting all variables/columns --- #dplyr verbs .footnote[
] -- .pull-left[ - `select()` - `filter()` - `arrange(desc())` ] .pull-right[ - `rename()` - `mutate()` - `summarise()` - `join()`<span style="color: deeppink; "> << </span> ] -- .pull-left[ ```r df1 <- data.frame( Gene = c("gene1", "gene2", "gene3"), Sample1 = c(5, 6, 13)) df1 ``` ``` ## Gene Sample1 ## 1 gene1 5 ## 2 gene2 6 ## 3 gene3 13 ``` ] -- .pull-right[ ```r df2 <- data.frame( Gene = c("gene1", "gene2", "gene3"), Sample2 = c(15, 38, 3)) df2 ``` ``` ## Gene Sample2 ## 1 gene1 15 ## 2 gene2 38 ## 3 gene3 3 ``` ] --- #dplyr verbs .footnote[
] -- .pull-left[ - `select()` - `filter()` - `arrange(desc())` ] .pull-right[ - `rename()` - `mutate()` - `summarise()` - `join()`<span style="color: deeppink; "> << </span> ] ```r full_join(df1, df2, by = "Gene") ``` ``` ## Gene Sample1 Sample2 ## 1 gene1 5 15 ## 2 gene2 6 38 ## 3 gene3 13 3 ``` --- #Changing rownames to column with dplyr .footnote[
] -- ```r data <- datasets::USArrests head(data, 2) ``` ``` ## Murder Assault UrbanPop Rape ## Alabama 13.2 236 58 21.2 ## Alaska 10.0 263 48 44.5 ``` ```r data <- USArrests %>% * rownames_to_column(var = "State") head(data, 2) ``` ``` ## State Murder Assault UrbanPop Rape ## 1 Alabama 13.2 236 58 21.2 ## 2 Alaska 10.0 263 48 44.5 ``` --- #Reshaping data with tidyr .footnote[
] -- .pull-left[ - `gather()` <span style="color: deeppink; "> << </span> - `gather()` makes "wide" data longer <br><br> <img src="tidyr.png" width="35%" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r *tidy_data <- gather( data = data, key = "Crime", value = "Estimate", - State) head(tidy_data) ``` ``` ## State Crime Estimate ## 1 Alabama Murder 13.2 ## 2 Alaska Murder 10.0 ## 3 Arizona Murder 8.1 ## 4 Arkansas Murder 8.8 ## 5 California Murder 9.0 ## 6 Colorado Murder 7.9 ``` ] --- #Reshaping data with tidyr .footnote[
] -- .pull-left[ - `gather()` - `spread()` <span style="color: deeppink; "> << </span> - `spread()` makes "long" data wider <br><br> <img src="tidyr.png" width="35%" style="display: block; margin: auto;" /> ] -- .pull-right[ ```r *orig_data <- spread( data = tidy_data, key = "Crime", value = "Estimate") head(orig_data[,1:3]) ``` ``` ## State Assault Murder ## 1 Alabama 236 13.2 ## 2 Alaska 263 10.0 ## 3 Arizona 294 8.1 ## 4 Arkansas 190 8.8 ## 5 California 276 9.0 ## 6 Colorado 204 7.9 ``` ] --- #Data Import and Export using readr .footnote[
] -- .pull-left[ <img src="readr.png" width="30%" style="display: block; margin: auto;" /> - `read_csv()` is used for data importing into R environment. - `write_csv()` is meant for saving any data frame as csv file into the current working directory. ] -- .pull-right[ ```r library(readr) *file <- read_csv( "filename.csv") *write_csv(x = data, file = "filename.csv") ``` ] --- #Data Import and Export using readxl .footnote[
] .pull-left[ <img src="readxl.png" width="30%" style="display: block; margin: auto;" /> - `read_xlsx()` is used for data importing into R environment. ] -- .pull-right[ ```r library(readxl) *file <- read_xlsx( "filename.xlsx") ``` ] --- class: center, middle background-image: url("break.jpg") background-size: cover --- class: inverse, center, middle background-image: url('viz1.jpg') <span style="font-size: 90pt; color: white; ">5. Data Visualization </span> --- #Data Visualization with {ggplot2} .footnote[
] .pull-left[ <img src="ggplot.png" width="35%" style="display: block; margin: auto;" /> ] -- .pull-right[ <img src="layers.png" width="100%" height="100%" style="display: block; margin: auto;" /> - Graphics has distinct layers ] -- <br><br> - Resources:: [Introduction to ggplot2](https://englelab.gatech.edu/useRguide/introduction-to-ggplot2.html) --- #Data Visualization with {ggplot2} .footnote[
] .pull-left[ <img src="ge_data.png" width="100%" height="100%" style="display: block; margin: auto;" /> - Data layer - It has only got input data. But the ggplot2 does not know what to plot. - It returns an empty plot ] -- .pull-right[ ```r library(ggplot2) ggplot(data = penguins) ``` ![](intro2R_files/figure-html/unnamed-chunk-104-1.png)<!-- --> ] --- #Data Visualization with {ggplot2} .footnote[
] .pull-left[ <img src="ge_aes.png" width="100%" height="100%" style="display: block; margin: auto;" /> - Aesthetic Layer - X and Y coordinates are defined ] -- .pull-right[ ```r library(ggplot2) g <- ggplot(data = penguins, aes( x = flipper_length_mm, y = body_mass_g)) g ``` ![](intro2R_files/figure-html/unnamed-chunk-107-1.png)<!-- --> ] --- #Data Visualization with {ggplot2} .footnote[
] .pull-left[ <img src="ge_geom.png" width="100%" height="100%" style="display: block; margin: auto;" /> - Geometries Layer - Here, we define what kind of plot we want (boxplot, histogram etc.) ] -- .pull-right[ ```r *g + geom_point() ``` ![](intro2R_files/figure-html/unnamed-chunk-110-1.png)<!-- --> ] --- #Data Visualization with {ggplot2} .footnote[
] .pull-left[ <img src="ge_facet.png" width="100%" height="100%" style="display: block; margin: auto;" /> - Facet wrapping creates subplots ] -- .pull-right[ ```r g + geom_point()+ * facet_wrap(~ species) ``` ![](intro2R_files/figure-html/unnamed-chunk-113-1.png)<!-- --> ] --- #Data Visualization with {ggplot2} .footnote[
] -- ```r *g + geom_point(aes(color = sex))+ facet_wrap(~ species) ``` ![](intro2R_files/figure-html/unnamed-chunk-115-1.png)<!-- --> --- #Themes .footnote[
] ![](intro2R_files/figure-html/unnamed-chunk-117-1.png)<!-- --> --- #Data Visualization with {ggplot2} .footnote[
] -- ```r g + geom_point(aes(color = sex))+ * facet_wrap(~ species)+ theme_minimal() ``` ![](intro2R_files/figure-html/unnamed-chunk-119-1.png)<!-- --> --- #Data Visualization with {ggplot2} .footnote[
] -- ```r *ggplot(data = penguins %>% drop_na(), aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = sex))+ facet_wrap(~ species)+ theme_minimal() ``` ![](intro2R_files/figure-html/unnamed-chunk-121-1.png)<!-- --> --- #Data Visualization with {ggplot2} .footnote[
] -- ```r ggplot(data = penguins %>% drop_na(), aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = sex))+ facet_wrap(~ species)+ theme_minimal()+ * labs(title = "Penguin flipper and body mass", x = "Flipper length (mm)",#renaming x axis y = "Body mass (g)",#renaming y axis color = "Penguin sex") #renaming legend title ``` ![](intro2R_files/figure-html/unnamed-chunk-123-1.png)<!-- --> --- #Data Visualization with {ggplot2} .footnote[
] .panelset[.panel[.panel-name[] .panel[.panel-name[Code] ```r ggplot(data = penguins %>% drop_na(), aes(x = flipper_length_mm, y = body_mass_g)) + geom_point(aes(color = sex), alpha = 0.50)+ facet_wrap(~ species)+ theme_minimal()+ labs(title = "Penguin flipper and body mass", x = "Flipper length (mm)", y = "Body mass (g)", color = "Penguin sex")+ scale_color_manual(values = c("darkorange","cyan4"))+ theme(axis.title.x = element_text(size = 20), axis.title.y = element_text(size = 20), legend.title = element_text(size = 15), plot.title = element_text(size = 25)) ``` ] .panel[.panel-name[Plot] ![](intro2R_files/figure-html/unnamed-chunk-125-1.png)<!-- --> ] ]] --- #Data Visualization with {ggplot2} .footnote[
] ![](intro2R_files/figure-html/unnamed-chunk-129-1.png)<!-- --> --- #Data Visualization with {ggplot2} .footnote[
] -- ```r ggplot(data = penguins, aes(x = species, y = flipper_length_mm))+ geom_boxplot(width = 0.2)+ geom_jitter(width = 0.1)+ * ggdist::stat_halfeye(width = 0.3,justification = - .4) ``` ![](intro2R_files/figure-html/unnamed-chunk-131-1.png)<!-- --> --- #Data Visualization with {ggplot2} .footnote[
] .panelset[.panel[.panel-name[] .panel[.panel-name[Code] ```r ggplot(data = penguins, aes(x = species, y = flipper_length_mm, fill = species))+ geom_boxplot(width = 0.2, alpha = 0.25, aes(color = species) )+ geom_jitter(width = 0.1, size = 2, alpha = 0.25, aes(color = species))+ ggdist::stat_halfeye(width = 0.3,justification = - .4, alpha = 0.25)+ scale_fill_manual(values = c("darkorange","purple","cyan4"))+ scale_color_manual(values = c("darkorange","purple","cyan4"))+ theme_minimal()+ theme(legend.position = "none")+ labs(x = "", y = "Flipper length (mm)") ``` ] .panel[.panel-name[Plot] ![](intro2R_files/figure-html/unnamed-chunk-133-1.png)<!-- --> ] ]] --- #Data Visualization with {ggplot2} .footnote[
] -- .panelset[.panel[.panel-name[] .panel[.panel-name[Code] ```r ggplot(data = penguins, aes(x = species, y = flipper_length_mm, fill = species))+ geom_boxplot(width = 0.2, alpha = 0.25, aes(color = species))+ geom_point(shape = 95 ,width = 0.1, size = 10, alpha = 0.25, aes(color = species))+ ggdist::stat_halfeye(width = 0.3, justification = - .4, alpha = 0.25)+ scale_fill_manual(values = c("darkorange","purple","cyan4"))+ scale_color_manual(values = c("darkorange","purple","cyan4"))+ theme_minimal()+ theme(legend.position = "none")+ labs(x = "", y = "Flipper length (mm)") ``` ] .panel[.panel-name[Plot] ![](intro2R_files/figure-html/unnamed-chunk-136-1.png)<!-- --> ] ]] --- #Acknowledgement - I owe every bit of coding knowledge because of the `rstats` community. - This slide was designed using [xaringan](https://github.com/yihui/xaringan) - This presentation is mostly inspired from [Allison Horst](https://www.allisonhorst.com/), [Allison Hill](https://www.apreshill.com/), [Cédric Scherer](https://www.cedricscherer.com/), [Danielle Navarro](https://djnavarro.net/) - The data sets used in this package were <br><br> 1. penguins from `palmerpenguins` package <br><br> 2. iris, USarrests from `datasets` package in built in R. --- #My Inspiration <img src="danielle.jpg" width="30%" align= "left"/><br><br><br> <pre class="tab"><span style="font-size: 30pt; ">Danielle Navarro</span></pre> <br><br><br> <img src="cedric.png" width="30%" align="right"/><br><br><br> <pre class="tab"><span style="font-size: 30pt; "> Cédric Scherer </span></pre> --- class: center, middle background-image: url("rocket.jpg") background-size: cover --- class: center, middle background-image: url("end.jpg") background-size: cover