palmerpinguins (https://allisonhorst.github.io/palmerpenguins) is an R package to provide a great dataset for data exploration & visualization, as an alternative to iris.
Let’s look into it !
library(palmerpenguins)
library(dplyr)
library(ggplot2)
The penguins data contains the following
glimpse(penguins)
## Rows: 344
## Columns: 8
## $ species <fct> Adelie, Adelie, Adelie, Adelie, Adelie, Adelie, Adel~
## $ island <fct> Torgersen, Torgersen, Torgersen, Torgersen, Torgerse~
## $ bill_length_mm <dbl> 39.1, 39.5, 40.3, NA, 36.7, 39.3, 38.9, 39.2, 34.1, ~
## $ bill_depth_mm <dbl> 18.7, 17.4, 18.0, NA, 19.3, 20.6, 17.8, 19.6, 18.1, ~
## $ flipper_length_mm <int> 181, 186, 195, NA, 193, 190, 181, 195, 193, 190, 186~
## $ body_mass_g <int> 3750, 3800, 3250, NA, 3450, 3650, 3625, 4675, 3475, ~
## $ sex <fct> male, female, female, NA, female, male, female, male~
## $ year <int> 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007, 2007~
We have 3 species into this datasets: Adelie, Gentoo, and Chinstrap that are not equally reparted
penguins %>%
count(species)
| species | n |
|---|---|
| Adelie | 152 |
| Chinstrap | 68 |
| Gentoo | 124 |
We can get some interesting mean metrics for each species:
mean_metric <- penguins %>%
group_by(species) %>%
select(-year) %>%
summarize(across(where(is.numeric), mean, na.rm = TRUE))
mean_metric
| species | bill_length_mm | bill_depth_mm | flipper_length_mm | body_mass_g |
|---|---|---|---|---|
| Adelie | 38.79139 | 18.34636 | 189.9536 | 3700.662 |
| Chinstrap | 48.83382 | 18.42059 | 195.8235 | 3733.088 |
| Gentoo | 47.50488 | 14.98211 | 217.1870 | 5076.016 |
The Gentoo species is heavier than other !
Is this equally distributed between Sex ?
ggplot(penguins, aes(x = flipper_length_mm,
y = body_mass_g)) +
geom_point(aes(color = sex)) +
theme_minimal() +
scale_color_manual(values = c("darkorange","cyan4"), na.translate = FALSE) +
labs(title = "Penguin flipper and body mass",
subtitle = "Dimensions for male and female Adelie, Chinstrap and Gentoo Penguins at Palmer Station LTER",
x = "Flipper length (mm)",
y = "Body mass (g)",
color = "Penguin sex") +
theme(legend.position = "bottom",
legend.background = element_rect(fill = "white", color = NA),
plot.title.position = "plot",
plot.caption = element_text(hjust = 0, face= "italic"),
plot.caption.position = "plot") +
facet_wrap(~species)
ggplot(penguins, aes(x = island, fill = species)) +
geom_bar(alpha = 0.8) +
scale_fill_manual(values = c("darkorange","purple","cyan4"),
guide = FALSE) +
theme_minimal() +
facet_wrap(~species, ncol = 1) +
coord_flip()
One of the species lives on all island and the others are specific to one island only.
Alison Horst, and Alison Presman-Hill for the package palmerpinguins and the article content from which the example above are taken and inspired.
Alison Horst for the two illustrations.
Source of content: https://allisonhorst.github.io/palmerpenguins