《区域水环境污染数据分析实践》
Data analysis practice of regional water environment pollution
2025-04-09
… is an R package to visualize data created by Hadley Wickham in 2005
… is part of the {tidyverse}
Component | Function | Explanation |
---|---|---|
Data |
ggplot(data)
|
The raw data that you want to visualise. |
Aesthetics |
aes()
|
Aesthetic mappings between variables and visual properties. |
Geometries |
geom_*()
|
The geometric shapes representing the data. |
Component | Function | Explanation |
---|---|---|
Data |
ggplot(data)
|
The raw data that you want to visualise. |
Aesthetics |
aes()
|
Aesthetic mappings between variables and visual properties. |
Geometries |
geom_*()
|
The geometric shapes representing the data. |
Statistics |
stat_*()
|
The statistical transformations applied to the data. |
Scales |
scale_*()
|
Maps between the data and the aesthetic dimensions. |
Coordinate System |
coord_*()
|
Maps data into the plane of the data rectangle. |
Facets |
facet_*()
|
The arrangement of the data into a grid of plots. |
Visual Themes |
theme() / theme_*()
|
The overall visual defaults of a plot. |
Bike sharing counts in London, UK, powered by TfL Open Data
Variable | Description | Class |
---|---|---|
date | Date encoded as `YYYY-MM-DD` | date |
day_night | `day` (6:00am–5:59pm) or `night` (6:00pm–5:59am) | character |
year | `2015` or `2016` | factor |
month | `1` (January) to `12` (December) | factor |
season | `winter`, `spring`, `summer`, or `autumn` | factor |
count | Sum of reported bikes rented | integer |
is_workday | `TRUE` being Monday to Friday and no bank holiday | logical |
is_weekend | `TRUE` being Saturday or Sunday | logical |
is_holiday | `TRUE` being a bank holiday in the UK | logical |
temp | Average air temperature (°C) | double |
temp_feel | Average feels like temperature (°C) | double |
humidity | Average air humidity (%) | double |
wind_speed | Average wind speed (km/h) | double |
weather_type | Most common weather type | character |
ggplot2::ggplot()
aes(.)
= link variables to graphical properties
x
, y
)color
, fill
)shape
, linetype
)size
)alpha
)group
)aes(.)
aes()
outside as component
= interpret aesthetics as graphical representations
ggplot(
bikes,
aes(x = temp_feel, y = count)
) +
geom_point(
color = "#28a87d",
alpha = .5
)
ggplot(
bikes,
aes(x = temp_feel, y = count)
) +
geom_point(
aes(color = season),
alpha = .5
)
ggplot(
bikes,
aes(x = temp_feel, y = count)
) +
geom_point(
aes(color = season),
alpha = .5
)
ggplot(
bikes,
aes(x = temp_feel, y = count,
color = season)
) +
geom_point(
alpha = .5
)
ggplot(bikes, aes(x = temp_feel, y = count)) +
stat_smooth(geom = "smooth")
ggplot(bikes, aes(x = temp_feel, y = count)) +
geom_smooth(stat = "smooth")
ggplot(bikes, aes(x = season)) +
stat_count(geom = "bar")
ggplot(bikes, aes(x = season)) +
geom_bar(stat = "count")
ggplot(bikes, aes(x = date, y = temp_feel)) +
stat_identity(geom = "point")
ggplot(bikes, aes(x = date, y = temp_feel)) +
geom_point(stat = "identity")
# A tibble: 1,454 × 14
date day_night year month season count is_workday is_weekend
<date> <chr> <fct> <fct> <fct> <int> <lgl> <lgl>
1 2015-01-04 day 2015 1 winter 6830 FALSE TRUE
2 2015-01-04 night 2015 1 winter 2404 FALSE TRUE
3 2015-01-05 day 2015 1 winter 14763 TRUE FALSE
4 2015-01-05 night 2015 1 winter 5609 TRUE FALSE
5 2015-01-06 day 2015 1 winter 14501 TRUE FALSE
6 2015-01-06 night 2015 1 winter 6112 TRUE FALSE
7 2015-01-07 day 2015 1 winter 16358 TRUE FALSE
8 2015-01-07 night 2015 1 winter 4706 TRUE FALSE
9 2015-01-08 day 2015 1 winter 9971 TRUE FALSE
10 2015-01-08 night 2015 1 winter 5630 TRUE FALSE
# ℹ 1,444 more rows
# ℹ 6 more variables: is_holiday <lgl>, temp <dbl>, temp_feel <dbl>,
# humidity <dbl>, wind_speed <dbl>, weather_type <chr>
Modified from canva.com
= split variables to multiple panels
Facets are also known as:
= translate between variable ranges and property ranges
The scale_*()
components control the properties of all the
aesthetic dimensions mapped to the data.
Consequently, there are scale_*()
functions for all aesthetics such as:
positions via scale_x_*()
and scale_y_*()
colors via scale_color_*()
and scale_fill_*()
sizes via scale_size_*()
and scale_radius_*()
shapes via scale_shape_*()
and scale_linetype_*()
transparency via scale_alpha_*()
The scale_*()
components control the properties of all the
aesthetic dimensions mapped to the data.
The extensions (*
) can be filled by e.g.:
continuous()
, discrete()
, reverse()
, log10()
, sqrt()
, date()
for positions
continuous()
, discrete()
, manual()
, gradient()
, gradient2()
, brewer()
for colors
continuous()
, discrete()
, manual()
, ordinal()
, area()
, date()
for sizes
continuous()
, discrete()
, manual()
, ordinal()
for shapes
continuous()
, discrete()
, manual()
, ordinal()
, date()
for transparency
Illustration by Allison Horst
Continuous:
quantitative or numerical data
Discrete:
qualitative or categorical data
Continuous:
quantitative or numerical data
Discrete:
qualitative or categorical data
colour x y PANEL group
1 #3ca7d9 16439 6830 1 1
2 #3ca7d9 16439 2404 1 1
3 #3ca7d9 16440 14763 1 1
4 #3ca7d9 16440 5609 1 1
5 #3ca7d9 16441 14501 1 1
200 #1ec99b 16538 8830 1 2
201 #1ec99b 16539 24019 1 2
202 #1ec99b 16539 10500 1 2
203 #1ec99b 16540 25640 1 2
204 #1ec99b 16540 11830 1 2
205 #1ec99b 16541 22216 1 2
400 #F7B01B 16638 12079 1 3
401 #F7B01B 16639 26646 1 3
402 #F7B01B 16639 12446 1 3
403 #F7B01B 16640 11312 1 3
404 #F7B01B 16640 4722 1 3
405 #F7B01B 16641 22748 1 3
= interpret the position aesthetics
coord_cartesian()
coord_fixed()
coord_flip()
coord_polar()
coord_map()
and coord_sf()
coord_trans()
ggplot(
bikes,
aes(x = season, y = count)
) +
geom_boxplot() +
coord_cartesian(
ylim = c(NA, 15000)
)
ggplot(
bikes,
aes(x = season, y = count)
) +
geom_boxplot() +
scale_y_continuous(
limits = c(NA, 15000)
)
ggplot(
bikes,
aes(x = temp_feel, y = temp)
) +
geom_point() +
coord_fixed()
ggplot(
bikes,
aes(x = temp_feel, y = temp)
) +
geom_point() +
coord_fixed(ratio = 4)
ggplot(
bikes,
aes(x = weather_type)
) +
geom_bar() +
coord_cartesian()
ggplot(
bikes,
aes(x = weather_type)
) +
geom_bar() +
coord_flip()
ggplot(
bikes,
aes(y = weather_type)
) +
geom_bar() +
coord_cartesian()
ggplot(
bikes,
aes(x = weather_type)
) +
geom_bar() +
coord_flip()
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = weather_type,
fill = weather_type)
) +
geom_bar() +
coord_polar()
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = weather_type,
fill = weather_type)
) +
geom_bar() +
coord_cartesian()
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = fct_infreq(weather_type),
fill = weather_type)
) +
geom_bar(width = 1) +
coord_polar()
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = fct_infreq(weather_type),
fill = weather_type)
) +
geom_bar(width = 1) +
coord_cartesian()
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = fct_infreq(weather_type),
fill = weather_type)
) +
geom_bar() +
coord_polar(theta = "x")
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = fct_infreq(weather_type),
fill = weather_type)
) +
geom_bar() +
coord_polar(theta = "y")
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = 1, fill = weather_type)
) +
geom_bar(position = "stack") +
coord_polar(theta = "y")
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = 1, fill = weather_type)
) +
geom_bar(position = "stack") +
coord_cartesian()
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = 1,
fill = fct_rev(fct_infreq(weather_type)))
) +
geom_bar(position = "stack") +
coord_polar(theta = "y") +
scale_fill_discrete(name = NULL)
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = 1,
fill = fct_rev(fct_infreq(weather_type)))
) +
geom_bar(position = "stack") +
coord_cartesian() +
scale_fill_discrete(name = NULL)
ggplot(
bikes,
aes(x = temp, y = count,
group = day_night)
) +
geom_point() +
geom_smooth(method = "lm") +
coord_trans(y = "log10")
ggplot(
bikes,
aes(x = temp, y = count,
group = day_night)
) +
geom_point() +
geom_smooth(method = "lm") +
scale_y_log10()
Illustration by Allison Horst
theme_std <- theme_set(theme_minimal(base_size = 18))
theme_update(
# text = element_text(family = "Pally"),
panel.grid = element_blank(),
axis.text = element_text(color = "grey50", size = 12),
axis.title = element_text(color = "grey40", face = "bold"),
axis.title.x = element_text(margin = margin(t = 12)),
axis.title.y = element_text(margin = margin(r = 12)),
axis.line = element_line(color = "grey80", size = .4),
legend.text = element_text(color = "grey50", size = 12),
plot.tag = element_text(size = 40, margin = margin(b = 15)),
plot.background = element_rect(fill = "white", color = "white")
)
bikes_sorted <-
bikes %>%
filter(!is.na(weather_type)) %>%
group_by(weather_type) %>%
mutate(sum = sum(count)) %>%
ungroup() %>%
mutate(
weather_type = forcats::fct_reorder(
str_to_title(str_wrap(weather_type, 5)), sum
)
)
p1 <- ggplot(
bikes_sorted,
aes(x = weather_type, y = count, color = weather_type)
) +
geom_hline(yintercept = 0, color = "grey80", size = .4) +
stat_summary(
geom = "point", fun = "sum", size = 12
) +
stat_summary(
geom = "linerange", ymin = 0, fun.max = function(y) sum(y),
size = 2, show.legend = FALSE
) +
coord_flip(ylim = c(0, NA), clip = "off") +
scale_y_continuous(
expand = c(0, 0), limits = c(0, 8500000),
labels = scales::comma_format(scale = .0001, suffix = "K")
) +
scale_color_viridis_d(
option = "magma", direction = -1, begin = .1, end = .9, name = NULL,
guide = guide_legend(override.aes = list(size = 7))
) +
labs(
x = NULL, y = "Sum of reported bike shares", tag = "P1",
) +
theme(
axis.line.y = element_blank(),
axis.text.y = element_text(family = "Pally", color = "grey50", face = "bold",
margin = margin(r = 15), lineheight = .9)
)
p1
p2 <- bikes_sorted %>%
filter(season == "winter", is_weekend == TRUE, day_night == "night") %>%
group_by(weather_type, .drop = FALSE) %>%
mutate(id = row_number()) %>%
ggplot(
aes(x = weather_type, y = id, color = weather_type)
) +
geom_point(size = 4.5) +
scale_color_viridis_d(
option = "magma", direction = -1, begin = .1, end = .9, name = NULL,
guide = guide_legend(override.aes = list(size = 7))
) +
labs(
x = NULL, y = "Reported bike shares on\nweekend winter nights", tag = "P2",
) +
coord_cartesian(ylim = c(.5, NA), clip = "off")
p2
my_colors <- c("#cc0000", "#000080")
p3 <- bikes %>%
group_by(week = lubridate::week(date), day_night, year) %>%
summarize(count = sum(count)) %>%
group_by(week, day_night) %>%
mutate(avg = mean(count)) %>%
ggplot(aes(x = week, y = count, group = interaction(day_night, year))) +
geom_line(color = "grey65", size = 1) +
geom_line(aes(y = avg, color = day_night), stat = "unique", size = 1.7) +
annotate(
geom = "text", label = c("Day", "Night"), color = my_colors,
x = c(5, 18), y = c(125000, 29000), size = 8, fontface = "bold", family = "Pally"
) +
scale_x_continuous(breaks = c(1, 1:10*5)) +
scale_y_continuous(labels = scales::comma_format()) +
scale_color_manual(values = my_colors, guide = "none") +
labs(
x = "Week of the Year", y = "Reported bike shares\n(cumulative # per week)", tag = "P3",
)
p3
text <- tibble::tibble(
x = 0, y = 0, label = "Lorem ipsum dolor sit amet, **consectetur adipiscing elit**, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation <b style='color:#000080;'>ullamco laboris nisi</b> ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat <b style='color:#cc0000;'>cupidatat non proident</b>, sunt in culpa qui officia deserunt mollit anim id est laborum."
)
pt <- ggplot(text, aes(x = x, y = y)) +
ggtext::geom_textbox(
aes(label = label),
box.color = NA, width = unit(23, "lines"),
color = "grey40", size = 6.5, lineheight = 1.4
) +
coord_cartesian(expand = FALSE, clip = "off") +
theme_void()
pt
# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
p <- ggplot(
data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
geom_point(aes(color = species, shape = species)) +
geom_smooth(method = "lm") +
labs(
title = "Body mass and flipper length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Flipper length (mm)", y = "Body mass (g)",
color = "Species", shape = "Species"
) +
scale_color_colorblind()