# Step plots with ggplot2

16 Aug 2015I have been reviewing studies of infant body composition. There are several different ways to measure body composition in infants, and I wanted to get a sense of how popular the methods were and how this had changed over time. This is the plot I eventually arrived at.

Here is how I made it.

Load the neccessary packages and simulate the data. The rows represent individual publications, which are described by their **Type** (e.g. the measurement method they used) and the **Year** they were published.

```
require(plyr)
require(ggplot2)
require(RColorBrewer)
Df <- data.frame(Type = replicate(100,
sample(c("A","B","C", "D", "E"), 1)),
Year = sample(c(1980:2014), 100, replace = TRUE))
```

You also need a variable reflecting the cumulative number of each type of publication for each year, and another variable giving the final tally for each method.

```
Df$count <- 1
Df <- Df[order(Df[, 2]), ] # Sort by year
Df <- ddply(Df, .(Type), transform, cumsumType = cumsum(count))
Df <- ddply(Df, .(Type), transform, maxCount = max(cumsumType))
Df$Type <- reorder(Df$Type, Df$maxCount, max)
Df$Type <- factor(Df$Type, levels = rev(levels(Df$Type)))
```

The last two lines reorder the levels of `Df$Type`

by the final tally, in descending order. This helps align the plot legend to match the order the lines will appear on the plot in the final year.

First I tried a simple line plot.

```
ggplot(Df, aes(x = Year, y = cumsumType, colour = Type,
group = Type)) +
geom_line() +
ylab("Total number of publications") +
ggtitle("Cumulative number of publications by measurement method") +
scale_color_brewer(name = "Method", palette = "Set1")
```

Then I decided I wanted it to look like a step function, so I tried `geom_step`

.

```
ggplot(Df,aes(x = Year,color = Type)) +
geom_step(aes(y = cumsumType)) +
ylab("Total number of publications") +
ggtitle("Cumulative number of publications by measurement method") +
scale_color_brewer(name = "Method", palette = "Set1")
```

However, I wanted all the lines to extend horizontally to the end of the plot, so I tried `stat-bin`

, but this happened.

```
ggplot(Df,aes(x = Year, color = Type)) +
stat_bin(aes(y = cumsum(..count..)), geom="step") +
ylab("Total number of publications") +
ggtitle("Cumulative number of publications by measurement method") +
scale_color_brewer(name = "Method", palette = "Set1")
```

The best solution I could come up with was to plot each line in its own layer with `stat_bin`

. Here is the complete code.

```
ggplot(Df,aes(x = Year, color = Type)) +
stat_bin(data = subset(Df, Type == "A"),
aes(y = cumsum(..count..)),
geom = "step", size = 3, alpha = 0.3) +
stat_bin(data = subset(Df, Type == "B"),
aes(y = cumsum(..count..)),
geom = "step", size = 3, alpha = 0.3) +
stat_bin(data = subset(Df, Type == "C"),
aes(y = cumsum(..count..)),
geom = "step", size = 3, alpha = 0.3) +
stat_bin(data = subset(Df, Type == "D"),
aes(y = cumsum(..count..)),
geom = "step", size = 3, alpha = 0.3) +
stat_bin(data = subset(Df, Type == "E"),
aes(y = cumsum(..count..)),
geom = "step", size = 3, alpha = 0.3) +
stat_bin(data = subset(Df, Type == "A"),
aes(y = cumsum(..count..)),
geom = "step", size = 1) +
stat_bin(data = subset(Df, Type == "B"),
aes(y = cumsum(..count..)),
geom = "step", size = 1) +
stat_bin(data = subset(Df, Type == "C"),
aes(y = cumsum(..count..)),
geom = "step", size = 1) +
stat_bin(data = subset(Df, Type == "D"),
aes(y = cumsum(..count..)),
geom = "step", size = 1) +
stat_bin(data = subset(Df, Type == "E"),
aes(y = cumsum(..count..)),
geom = "step", size = 1) +
coord_cartesian(xlim = c(1980, 2015)) +
ylab("Total number of publications") +
ggtitle("Cumulative number of publications by measurement method") +
scale_color_brewer(name = "Method", palette = "Set1",
breaks = levels(Df$Type))
```

Because the line for each **Type** is plotted in its own layer, you need to specify the breaks for the color scale to get the legend in the correct order (descending, by final tally).