《区域水环境污染数据分析实践》
Data analysis practice of regional water environment pollution
2026-05-21
… is an R package to visualize data created by Hadley Wickham in 2005
… is part of the {tidyverse}
| Component | Function | Explanation |
|---|---|---|
| Data |
ggplot(data)
|
The raw data that you want to visualise. |
| Aesthetics |
aes()
|
Aesthetic mappings between variables and visual properties. |
| Geometries |
geom_*()
|
The geometric shapes representing the data. |
| Component | Function | Explanation |
|---|---|---|
| Data |
ggplot(data)
|
The raw data that you want to visualise. |
| Aesthetics |
aes()
|
Aesthetic mappings between variables and visual properties. |
| Geometries |
geom_*()
|
The geometric shapes representing the data. |
| Statistics |
stat_*()
|
The statistical transformations applied to the data. |
| Scales |
scale_*()
|
Maps between the data and the aesthetic dimensions. |
| Coordinate System |
coord_*()
|
Maps data into the plane of the data rectangle. |
| Facets |
facet_*()
|
The arrangement of the data into a grid of plots. |
| Visual Themes |
theme() / theme_*()
|
The overall visual defaults of a plot. |
Bike sharing counts in London, UK, powered by TfL Open Data
| Variable | Description | Class |
|---|---|---|
| date | Date encoded as `YYYY-MM-DD` | date |
| day_night | `day` (6:00am–5:59pm) or `night` (6:00pm–5:59am) | character |
| year | `2015` or `2016` | factor |
| month | `1` (January) to `12` (December) | factor |
| season | `winter`, `spring`, `summer`, or `autumn` | factor |
| count | Sum of reported bikes rented | integer |
| is_workday | `TRUE` being Monday to Friday and no bank holiday | logical |
| is_weekend | `TRUE` being Saturday or Sunday | logical |
| is_holiday | `TRUE` being a bank holiday in the UK | logical |
| temp | Average air temperature (°C) | double |
| temp_feel | Average feels like temperature (°C) | double |
| humidity | Average air humidity (%) | double |
| wind_speed | Average wind speed (km/h) | double |
| weather_type | Most common weather type | character |
ggplot2::ggplot()aes(.)= link variables to graphical properties
x, y)color, fill)shape, linetype)size)alpha)group)aes(.)aes() outside as component
= interpret aesthetics as graphical representations
# A tibble: 1,454 × 14
date day_night year month season count is_workday is_weekend
<date> <chr> <fct> <fct> <fct> <int> <lgl> <lgl>
1 2015-01-04 day 2015 1 winter 6830 FALSE TRUE
2 2015-01-04 night 2015 1 winter 2404 FALSE TRUE
3 2015-01-05 day 2015 1 winter 14763 TRUE FALSE
4 2015-01-05 night 2015 1 winter 5609 TRUE FALSE
5 2015-01-06 day 2015 1 winter 14501 TRUE FALSE
6 2015-01-06 night 2015 1 winter 6112 TRUE FALSE
7 2015-01-07 day 2015 1 winter 16358 TRUE FALSE
8 2015-01-07 night 2015 1 winter 4706 TRUE FALSE
9 2015-01-08 day 2015 1 winter 9971 TRUE FALSE
10 2015-01-08 night 2015 1 winter 5630 TRUE FALSE
# ℹ 1,444 more rows
# ℹ 6 more variables: is_holiday <lgl>, temp <dbl>, temp_feel <dbl>,
# humidity <dbl>, wind_speed <dbl>, weather_type <chr>
= split variables to multiple panels
Facets are also known as:
= translate between variable ranges and property ranges
The scale_*() components control the properties of all the
aesthetic dimensions mapped to the data.
Consequently, there are scale_*() functions for all aesthetics such as:
positions via scale_x_*() and scale_y_*()
colors via scale_color_*() and scale_fill_*()
sizes via scale_size_*() and scale_radius_*()
shapes via scale_shape_*() and scale_linetype_*()
transparency via scale_alpha_*()
The scale_*() components control the properties of all the
aesthetic dimensions mapped to the data.
The extensions (*) can be filled by e.g.:
continuous(), discrete(), reverse(), log10(), sqrt(), date() for positions
continuous(), discrete(), manual(), gradient(), gradient2(), brewer() for colors
continuous(), discrete(), manual(), ordinal(), area(), date() for sizes
continuous(), discrete(), manual(), ordinal() for shapes
continuous(), discrete(), manual(), ordinal(), date() for transparency
Continuous:
quantitative or numerical data
Discrete:
qualitative or categorical data
Continuous:
quantitative or numerical data
Discrete:
qualitative or categorical data
x y colour PANEL group
1 16439 6830 #3ca7d9 1 1
2 16439 2404 #3ca7d9 1 1
3 16440 14763 #3ca7d9 1 1
4 16440 5609 #3ca7d9 1 1
5 16441 14501 #3ca7d9 1 1
200 16538 8830 #1ec99b 1 2
201 16539 24019 #1ec99b 1 2
202 16539 10500 #1ec99b 1 2
203 16540 25640 #1ec99b 1 2
204 16540 11830 #1ec99b 1 2
205 16541 22216 #1ec99b 1 2
400 16638 12079 #F7B01B 1 3
401 16639 26646 #F7B01B 1 3
402 16639 12446 #F7B01B 1 3
403 16640 11312 #F7B01B 1 3
404 16640 4722 #F7B01B 1 3
405 16641 22748 #F7B01B 1 3
= interpret the position aesthetics
coord_cartesian()coord_fixed()coord_flip()coord_polar()coord_map() and coord_sf()coord_trans()ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = 1, fill = fct_rev(fct_infreq(weather_type)))
) +
geom_bar(position = "stack") +
coord_polar(theta = "y") +
scale_fill_discrete(name = NULL)
ggplot(
filter(bikes, !is.na(weather_type)),
aes(x = 1, fill = fct_rev(fct_infreq(weather_type)))
) +
geom_bar(position = "stack") +
coord_cartesian() +
scale_fill_discrete(name = NULL)theme_std <- theme_set(theme_minimal(base_size = 18))
theme_update(
# text = element_text(family = "Pally"),
panel.grid = element_blank(),
axis.text = element_text(color = "grey50", size = 12),
axis.title = element_text(color = "grey40", face = "bold"),
axis.title.x = element_text(margin = margin(t = 12)),
axis.title.y = element_text(margin = margin(r = 12)),
axis.line = element_line(color = "grey80", size = .4),
legend.text = element_text(color = "grey50", size = 12),
plot.tag = element_text(size = 40, margin = margin(b = 15)),
plot.background = element_rect(fill = "white", color = "white")
)
bikes_sorted <-
bikes %>%
filter(!is.na(weather_type)) %>%
group_by(weather_type) %>%
mutate(sum = sum(count)) %>%
ungroup() %>%
mutate(
weather_type = forcats::fct_reorder(
str_to_title(str_wrap(weather_type, 5)),
sum
)
)
p1 <- ggplot(
bikes_sorted,
aes(x = weather_type, y = count, color = weather_type)
) +
geom_hline(yintercept = 0, color = "grey80", size = .4) +
stat_summary(
geom = "point",
fun = "sum",
size = 12
) +
stat_summary(
geom = "linerange",
ymin = 0,
fun.max = function(y) sum(y),
size = 2,
show.legend = FALSE
) +
coord_flip(ylim = c(0, NA), clip = "off") +
scale_y_continuous(
expand = c(0, 0),
limits = c(0, 8500000),
labels = scales::comma_format(scale = .0001, suffix = "K")
) +
scale_color_viridis_d(
option = "magma",
direction = -1,
begin = .1,
end = .9,
name = NULL,
guide = guide_legend(override.aes = list(size = 7))
) +
labs(
x = NULL,
y = "Sum of reported bike shares",
tag = "P1",
) +
theme(
axis.line.y = element_blank(),
axis.text.y = element_text(
family = "Pally",
color = "grey50",
face = "bold",
margin = margin(r = 15),
lineheight = .9
)
)
p1p2 <- bikes_sorted %>%
filter(season == "winter", is_weekend == TRUE, day_night == "night") %>%
group_by(weather_type, .drop = FALSE) %>%
mutate(id = row_number()) %>%
ggplot(
aes(x = weather_type, y = id, color = weather_type)
) +
geom_point(size = 4.5) +
scale_color_viridis_d(
option = "magma",
direction = -1,
begin = .1,
end = .9,
name = NULL,
guide = guide_legend(override.aes = list(size = 7))
) +
labs(
x = NULL,
y = "Reported bike shares on\nweekend winter nights",
tag = "P2",
) +
coord_cartesian(ylim = c(.5, NA), clip = "off")
p2my_colors <- c("#cc0000", "#000080")
p3 <- bikes %>%
group_by(week = lubridate::week(date), day_night, year) %>%
summarize(count = sum(count)) %>%
group_by(week, day_night) %>%
mutate(avg = mean(count)) %>%
ggplot(aes(x = week, y = count, group = interaction(day_night, year))) +
geom_line(color = "grey65", size = 1) +
geom_line(aes(y = avg, color = day_night), stat = "unique", size = 1.7) +
annotate(
geom = "text",
label = c("Day", "Night"),
color = my_colors,
x = c(5, 18),
y = c(125000, 29000),
size = 8,
fontface = "bold",
family = "Pally"
) +
scale_x_continuous(breaks = c(1, 1:10 * 5)) +
scale_y_continuous(labels = scales::comma_format()) +
scale_color_manual(values = my_colors, guide = "none") +
labs(
x = "Week of the Year",
y = "Reported bike shares\n(cumulative # per week)",
tag = "P3",
)
p3text <- tibble::tibble(
x = 0,
y = 0,
label = "Lorem ipsum dolor sit amet, **consectetur adipiscing elit**, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation <b style='color:#000080;'>ullamco laboris nisi</b> ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat <b style='color:#cc0000;'>cupidatat non proident</b>, sunt in culpa qui officia deserunt mollit anim id est laborum."
)
pt <- ggplot(text, aes(x = x, y = y)) +
ggtext::geom_textbox(
aes(label = label),
box.color = NA,
width = unit(23, "lines"),
color = "grey40",
size = 6.5,
lineheight = 1.4
) +
coord_cartesian(expand = FALSE, clip = "off") +
theme_void()
pt# A tibble: 344 × 8
species island bill_length_mm bill_depth_mm flipper_length_mm body_mass_g
<fct> <fct> <dbl> <dbl> <int> <int>
1 Adelie Torgersen 39.1 18.7 181 3750
2 Adelie Torgersen 39.5 17.4 186 3800
3 Adelie Torgersen 40.3 18 195 3250
4 Adelie Torgersen NA NA NA NA
5 Adelie Torgersen 36.7 19.3 193 3450
6 Adelie Torgersen 39.3 20.6 190 3650
7 Adelie Torgersen 38.9 17.8 181 3625
8 Adelie Torgersen 39.2 19.6 195 4675
9 Adelie Torgersen 34.1 18.1 193 3475
10 Adelie Torgersen 42 20.2 190 4250
# ℹ 334 more rows
# ℹ 2 more variables: sex <fct>, year <int>
p <- ggplot(
data = penguins,
mapping = aes(x = flipper_length_mm, y = body_mass_g)
) +
geom_point(aes(color = species, shape = species)) +
geom_smooth(method = "lm") +
labs(
title = "Body mass and flipper length",
subtitle = "Dimensions for Adelie, Chinstrap, and Gentoo Penguins",
x = "Flipper length (mm)",
y = "Body mass (g)",
color = "Species",
shape = "Species"
) +
scale_color_colorblind()#| results: ‘asis’ #| echo: false rmdify::slideend( wechat = FALSE, type = “public”, tel = FALSE, thislink = “../” ) ` ``
