《区域水环境污染数据分析实践》
Data analysis practice of regional water environment pollution
2025-04-09
… is an R package to visualize data created by Hadley Wickham in 2005
… is part of the {tidyverse}
| Component | Function | Explanation |
|---|---|---|
| Data |
ggplot(data)
|
The raw data that you want to visualise. |
| Aesthetics |
aes()
|
Aesthetic mappings between variables and visual properties. |
| Geometries |
geom_*()
|
The geometric shapes representing the data. |
| Component | Function | Explanation |
|---|---|---|
| Data |
ggplot(data)
|
The raw data that you want to visualise. |
| Aesthetics |
aes()
|
Aesthetic mappings between variables and visual properties. |
| Geometries |
geom_*()
|
The geometric shapes representing the data. |
| Statistics |
stat_*()
|
The statistical transformations applied to the data. |
| Scales |
scale_*()
|
Maps between the data and the aesthetic dimensions. |
| Coordinate System |
coord_*()
|
Maps data into the plane of the data rectangle. |
| Facets |
facet_*()
|
The arrangement of the data into a grid of plots. |
| Visual Themes |
theme() / theme_*()
|
The overall visual defaults of a plot. |
Bike sharing counts in London, UK, powered by TfL Open Data
| Variable | Description | Class |
|---|---|---|
| date | Date encoded as `YYYY-MM-DD` | date |
| day_night | `day` (6:00am–5:59pm) or `night` (6:00pm–5:59am) | character |
| year | `2015` or `2016` | factor |
| month | `1` (January) to `12` (December) | factor |
| season | `winter`, `spring`, `summer`, or `autumn` | factor |
| count | Sum of reported bikes rented | integer |
| is_workday | `TRUE` being Monday to Friday and no bank holiday | logical |
| is_weekend | `TRUE` being Saturday or Sunday | logical |
| is_holiday | `TRUE` being a bank holiday in the UK | logical |
| temp | Average air temperature (°C) | double |
| temp_feel | Average feels like temperature (°C) | double |
| humidity | Average air humidity (%) | double |
| wind_speed | Average wind speed (km/h) | double |
| weather_type | Most common weather type | character |
ggplot2::ggplot()aes(.)= link variables to graphical properties
x, y)color, fill)shape, linetype)size)alpha)group)aes(.)aes() outside as component
= interpret aesthetics as graphical representations
ggplot(
bikes,
aes(x = temp_feel, y = count)
) +
geom_point(
color = "#28a87d",
alpha = .5
)
ggplot(
bikes,
aes(x = temp_feel, y = count)
) +
geom_point(
aes(color = season),
alpha = .5
)ggplot(
bikes,
aes(x = temp_feel, y = count)
) +
geom_point(
aes(color = season),
alpha = .5
)
ggplot(
bikes,
aes(x = temp_feel, y = count,
color = season)
) +
geom_point(
alpha = .5
)ggplot(bikes, aes(x = temp_feel, y = count)) +
stat_smooth(geom = "smooth")
ggplot(bikes, aes(x = temp_feel, y = count)) +
geom_smooth(stat = "smooth")ggplot(bikes, aes(x = season)) +
stat_count(geom = "bar")
ggplot(bikes, aes(x = season)) +
geom_bar(stat = "count")ggplot(bikes, aes(x = date, y = temp_feel)) +
stat_identity(geom = "point")
ggplot(bikes, aes(x = date, y = temp_feel)) +
geom_point(stat = "identity")# A tibble: 1,454 × 14
date day_night year month season count is_workday is_weekend
<date> <chr> <fct> <fct> <fct> <int> <lgl> <lgl>
1 2015-01-04 day 2015 1 winter 6830 FALSE TRUE
2 2015-01-04 night 2015 1 winter 2404 FALSE TRUE
3 2015-01-05 day 2015 1 winter 14763 TRUE FALSE
4 2015-01-05 night 2015 1 winter 5609 TRUE FALSE
5 2015-01-06 day 2015 1 winter 14501 TRUE FALSE
6 2015-01-06 night 2015 1 winter 6112 TRUE FALSE
7 2015-01-07 day 2015 1 winter 16358 TRUE FALSE
8 2015-01-07 night 2015 1 winter 4706 TRUE FALSE
9 2015-01-08 day 2015 1 winter 9971 TRUE FALSE
10 2015-01-08 night 2015 1 winter 5630 TRUE FALSE
# ℹ 1,444 more rows
# ℹ 6 more variables: is_holiday <lgl>, temp <dbl>, temp_feel <dbl>,
# humidity <dbl>, wind_speed <dbl>, weather_type <chr>
Modified from canva.com
= split variables to multiple panels
Facets are also known as:
= translate between variable ranges and property ranges
The scale_*() components control the properties of all the
aesthetic dimensions mapped to the data.
Consequently, there are scale_*() functions for all aesthetics such as:
positions via scale_x_*() and scale_y_*()
colors via scale_color_*() and scale_fill_*()
sizes via scale_size_*() and scale_radius_*()
shapes via scale_shape_*() and scale_linetype_*()
transparency via scale_alpha_*()
The scale_*() components control the properties of all the
aesthetic dimensions mapped to the data.
The extensions (*) can be filled by e.g.:
continuous(), discrete(), reverse(), log10(), sqrt(), date() for positions
continuous(), discrete(), manual(), gradient(), gradient2(), brewer() for colors
continuous(), discrete(), manual(), ordinal(), area(), date() for sizes
continuous(), discrete(), manual(), ordinal() for shapes
continuous(), discrete(), manual(), ordinal(), date() for transparency
Illustration by Allison Horst
Continuous:
quantitative or numerical data
Discrete:
qualitative or categorical data
Continuous:
quantitative or numerical data
Discrete:
qualitative or categorical data
colour x y PANEL group
1 #3ca7d9 16439 6830 1 1
2 #3ca7d9 16439 2404 1 1
3 #3ca7d9 16440 14763 1 1
4 #3ca7d9 16440 5609 1 1
5 #3ca7d9 16441 14501 1 1
200 #1ec99b 16538 8830 1 2
201 #1ec99b 16539 24019 1 2
202 #1ec99b 16539 10500 1 2
203 #1ec99b 16540 25640 1 2
204 #1ec99b 16540 11830 1 2
205 #1ec99b 16541 22216 1 2
400 #F7B01B 16638 12079 1 3
401 #F7B01B 16639 26646 1 3
402 #F7B01B 16639 12446 1 3
403 #F7B01B 16640 11312 1 3
404 #F7B01B 16640 4722 1 3
405 #F7B01B 16641 22748 1 3
= interpret the position aesthetics
coord_cartesian()coord_fixed()coord_flip()coord_polar()coord_map() and coord_sf()coord_trans()ggplot(
bikes,
aes(x = season, y = count)
) +
geom_boxplot() +
coord_cartesian(
ylim = c(NA, 15000)
)
ggplot(
bikes,
aes(x = season, y = count)
) +
geom_boxplot() +
scale_y_continuous(
limits = c(NA, 15000)
)ggplot(
bikes,
aes(x = temp_feel, y = temp)
) +
geom_point() +
coord_fixed()
ggplot(
bikes,
aes(x = temp_feel, y = temp)
) +
geom_point() +
coord_fixed(ratio = 4)ggplot(
bikes,
aes(x = weather_type)
) +
geom_bar() +
coord_cartesian()
ggplot(
bikes,
aes(x = weather_type)
) +
geom_bar() +
coord_flip()ggplot(
bikes,
aes(y = weather_type)
) +
geom_bar() +
coord_cartesian()
ggplot(
bikes,
aes(x = weather_type)
) +
geom_bar() +
coord_flip()ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = weather_type,
fill = weather_type)
) +
geom_bar() +
coord_polar()
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = weather_type,
fill = weather_type)
) +
geom_bar() +
coord_cartesian()ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = fct_infreq(weather_type),
fill = weather_type)
) +
geom_bar(width = 1) +
coord_polar()
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = fct_infreq(weather_type),
fill = weather_type)
) +
geom_bar(width = 1) +
coord_cartesian()ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = fct_infreq(weather_type),
fill = weather_type)
) +
geom_bar() +
coord_polar(theta = "x")
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = fct_infreq(weather_type),
fill = weather_type)
) +
geom_bar() +
coord_polar(theta = "y")ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = 1, fill = weather_type)
) +
geom_bar(position = "stack") +
coord_polar(theta = "y")
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = 1, fill = weather_type)
) +
geom_bar(position = "stack") +
coord_cartesian() ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = 1,
fill = fct_rev(fct_infreq(weather_type)))
) +
geom_bar(position = "stack") +
coord_polar(theta = "y") +
scale_fill_discrete(name = NULL)
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = 1,
fill = fct_rev(fct_infreq(weather_type)))
) +
geom_bar(position = "stack") +
coord_cartesian() +
scale_fill_discrete(name = NULL)ggplot(
bikes,
aes(x = temp, y = count,
group = day_night)
) +
geom_point() +
geom_smooth(method = "lm") +
coord_trans(y = "log10")
ggplot(
bikes,
aes(x = temp, y = count,
group = day_night)
) +
geom_point() +
geom_smooth(method = "lm") +
scale_y_log10()Illustration by Allison Horst
theme_std <- theme_set(theme_minimal(base_size = 18))
theme_update(
# text = element_text(family = "Pally"),
panel.grid = element_blank(),
axis.text = element_text(color = "grey50", size = 12),
axis.title = element_text(color = "grey40", face = "bold"),
axis.title.x = element_text(margin = margin(t = 12)),
axis.title.y = element_text(margin = margin(r = 12)),
axis.line = element_line(color = "grey80", size = .4),
legend.text = element_text(color = "grey50", size = 12),
plot.tag = element_text(size = 40, margin = margin(b = 15)),
plot.background = element_rect(fill = "white", color = "white")
)
bikes_sorted <-
bikes %>%
filter(!is.na(weather_type)) %>%
group_by(weather_type) %>%
mutate(sum = sum(count)) %>%
ungroup() %>%
mutate(
weather_type = forcats::fct_reorder(
str_to_title(str_wrap(weather_type, 5)), sum
)
)
p1 <- ggplot(
bikes_sorted,
aes(x = weather_type, y = count, color = weather_type)
) +
geom_hline(yintercept = 0, color = "grey80", size = .4) +
stat_summary(
geom = "point", fun = "sum", size = 12
) +
stat_summary(
geom = "linerange", ymin = 0, fun.max = function(y) sum(y),
size = 2, show.legend = FALSE
) +
coord_flip(ylim = c(0, NA), clip = "off") +
scale_y_continuous(
expand = c(0, 0), limits = c(0, 8500000),
labels = scales::comma_format(scale = .0001, suffix = "K")
) +
scale_color_viridis_d(
option = "magma", direction = -1, begin = .1, end = .9, name = NULL,
guide = guide_legend(override.aes = list(size = 7))
) +
labs(
x = NULL, y = "Sum of reported bike shares", tag = "P1",
) +
theme(
axis.line.y = element_blank(),
axis.text.y = element_text(family = "Pally", color = "grey50", face = "bold",
margin = margin(r = 15), lineheight = .9)
)
p1p2 <- bikes_sorted %>%
filter(season == "winter", is_weekend == TRUE, day_night == "night") %>%
group_by(weather_type, .drop = FALSE) %>%
mutate(id = row_number()) %>%
ggplot(
aes(x = weather_type, y = id, color = weather_type)
) +
geom_point(size = 4.5) +
scale_color_viridis_d(
option = "magma", direction = -1, begin = .1, end = .9, name = NULL,
guide = guide_legend(override.aes = list(size = 7))
) +
labs(
x = NULL, y = "Reported bike shares on\nweekend winter nights", tag = "P2",
) +
coord_cartesian(ylim = c(.5, NA), clip = "off")
p2my_colors <- c("#cc0000", "#000080")
p3 <- bikes %>%
group_by(week = lubridate::week(date), day_night, year) %>%
summarize(count = sum(count)) %>%
group_by(week, day_night) %>%
mutate(avg = mean(count)) %>%
ggplot(aes(x = week, y = count, group = interaction(day_night, year))) +
geom_line(color = "grey65", size = 1) +
geom_line(aes(y = avg, color = day_night), stat = "unique", size = 1.7) +
annotate(
geom = "text", label = c("Day", "Night"), color = my_colors,
x = c(5, 18), y = c(125000, 29000), size = 8, fontface = "bold", family = "Pally"
) +
scale_x_continuous(breaks = c(1, 1:10*5)) +
scale_y_continuous(labels = scales::comma_format()) +
scale_color_manual(values = my_colors, guide = "none") +
labs(
x = "Week of the Year", y = "Reported bike shares\n(cumulative # per week)", tag = "P3",
)
p3text <- tibble::tibble(
x = 0, y = 0, label = "Lorem ipsum dolor sit amet, **consectetur adipiscing elit**, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation <b style='color:#000080;'>ullamco laboris nisi</b> ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat <b style='color:#cc0000;'>cupidatat non proident</b>, sunt in culpa qui officia deserunt mollit anim id est laborum."
)
pt <- ggplot(text, aes(x = x, y = y)) +
ggtext::geom_textbox(
aes(label = label),
box.color = NA, width = unit(23, "lines"),
color = "grey40", size = 6.5, lineheight = 1.4
) +
coord_cartesian(expand = FALSE, clip = "off") +
theme_void()
pt# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
p <- ggplot(
data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
geom_point(aes(color = species, shape = species)) +
geom_smooth(method = "lm") +
labs(
title = "Body mass and flipper length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Flipper length (mm)", y = "Body mass (g)",
color = "Species", shape = "Species"
) +
scale_color_colorblind()