Boxplot in R Programming

The box plot or boxplot in R programming is a convenient way to graphically visualizing the numerical data group by specific data. Let us see how to Create a R boxplot, Remove outlines, Format its color, adding names, adding the mean, and drawing horizontal boxplot in R Programming language with example.

Syntax of a Boxplot in R

The syntax to draw the Boxplot in R Programming is

boxplot(formula, data = NULL,.., subset, na.action = NULL)

and the complex syntax behind this R Boxplot is:

boxplot(x, ....., range = 1.5, width = NULL, varwidth = FALSE,
     notch = FALSE, outline = TRUE, col = NULL, log = "",
     border = par("fg"), names, plot = TRUE, 
     pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5), 
     horizontal = FALSE, add = FALSE, at = TRUE)

There are many arguments supported by the Boxplot in R programming language, and the following are some of the arguments:

  • data: Please specify the DataFrame, or List that contains the data to draw boxplot. In this example, it is airquality
  • subset: You can restrict the R barplots by specifying the vector of values. In this example, you can restrict the barplot box for August month.
  • x: Please specify the data from which you want to draw the R box plot. Here, you can use a numeric vector, or a list containing the numeric vector.
  • range: This R Programming argument decides how far the whisker extends out of the box.
  • width: It is optional, use this to specify a vector that contains the widths of each box.
  • varwidth: It is a Boolean argument. If it is TRUE, boxes draw with widths proportional to the square roots of the no. of observations in the group.
  • border: It is an optional argument. Please specify the vector of color you want to add to the outlines of the boxplot borders.
  • plot: It is a Boolean argument. If it is FALSE, it returns the summaries on which the R boxplots based on.
  • log: You have to specify a character string of three options. If X-Axis is to be logarithmic then “x”, If Y-Axis is to be logarithmic “y”, if both X-Axis and Y-Axis are to be logarithmic, then specify either “xy” or “yx”
  • add: It is a Boolean argument, and by default, it is FALSE. If it is TRUE, the boxplot should add to an already existing plot.
  • horizontal: It is a Boolean argument. If it is FALSE, the R boxplot drew vertically. If it is TRUE, the boxplot drew horizontally.
  • at: It is a numeric vector, which gives the locations where the boxplot drew. It is very helpful when we are adding a new boxplot to the existing plot region.

Before we get into the example, let us see the data that we are going to use for this R box plot example. airquality is the date set provided by the R

Boxplot in R Programming 0

Return Value of a Boxplot in R Programming

In general, before we start creating a R boxplot, let us see how the data divided by the box plot. It returns the stats, outliners, groups, and names.

# R Boxplot Data 
airquality

return.value <- boxplot(airquality$Wind)
return.value
Boxplot in R Programming 1

Create a Boxplot in R Programming

In this example, we create a Boxplot using the airquality data set, which is provided by the R Studio. If you require import data from external files, then refer R Read CSV article to understand the importing of the CSV file.

# R Boxplot Data 
airquality

boxplot(airquality$Wind)
Boxplot in R Programming 2

airquality data set returns the output as a List. So, we are using the $ to extract the data from List.

boxplot(airquality$Wind)

Use Formula to create a Boxplot in R

In this example, we create a Boxplot using the formula argument.

  • formula: It should be something like value~group, where value is the vector of numeric values, and the group is the column you want to use as a group by. For example, you want to draw a boxplot for countrywide sales, then value = sales and group = country.
# R Boxplot Example 
airquality

boxplot(airquality$Wind~airquality$Month)
Boxplot in R Programming 3

Assigning names to Boxplot in R Programming

In this example, we assign names to R Box plot, X-Axis, and Y-Axis using main, xlab, and ylab

  • main: You can change, or provide the Title for your Boxplot.
  • xlab: Please specify the label for the X-Axis
  • ylab: Please specify the label for the Y-Axis
  • las: Used to change the Y-axis values direction.
# R Boxplot Example - Changing Names
airquality

boxplot(airquality$Wind~airquality$Month,
        main = "Airquality Boxplot",
        xlab = "Months",
        ylab = "Wind",
        las = 1
        )
Boxplot in R Programming 4

Change Colors of a Boxplot in R

In this example, we change the R Boxplot box colors using the col argument

  • col: Please specify the color you want to use for your Boxplot. Type colors() in your console to get the list of colors available in R programming
  • names: Please specify the names for the boxes. Here, we are changing the Month numbers to Month names
# R Boxplot Example - Changing Colors, Assigning new Names
airquality

boxplot(airquality$Wind~airquality$Month,
        main = "Airquality Boxplot",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1", 
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", "August", "September")
        )
Boxplot in R Programming 5

Removing Outlines of Boxplot in R

In this R Box plot example, we remove the Outlines using an outline argument.

  • outline: It is a Boolean argument. If it is TRUE, Boxplot draws the outlines (that are extra dots outside the box), and if it is false, all the outlines in the boxplot removed.
# R Boxplot Example - Removing Outlines
airquality

boxplot(airquality$Wind~airquality$Month,
        outline = FALSE,
        main = "Airquality Boxplot",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )
Boxplot in R Programming 6

Calculating & Adding Mean to Boxplot in R

In this R Boxplot example, we calculate the Mean of each box, and how to add those mean values to existing boxplot using points function.

# R Boxplot Example - Adding Mean
airquality

boxplot(airquality$Wind~airquality$Month,
        main = "Airquality Boxplot",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )

meanval <- by(airquality$Wind, airquality$Month, mean)
points(meanval, col = "white", pch = 8, cex = 1.5)
Boxplot in R Programming 7

The following statement finds the mean value of Wind, group by Month numbers.

meanval <- by(airquality$Wind, airquality$Month, mean)

The following statement add that means value to the boxes. pch = 8 means star character, cex is the size of the character, and col is for color.

points(meanval, col = "white", pch = 8, cex = 1.5)

Notch argument in R Boxplot

In this example, we draw a line on each side of the boxes using the notch argument.

  • notch: It is a Boolean argument. If it is TRUE, a notch drawn on each side of the box. If the notches of 2 plots overlapped then, we could say that the medians of them are the same. Otherwise, they are different.
# R Boxplot Example - Notch
airquality

boxplot(airquality$Wind~airquality$Month,
        notch = TRUE,
        main = "Airquality Boxplot",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )

meanval <- by(airquality$Wind, airquality$Month, mean)
points(meanval, col = "white", pch = 8, cex = 1.5)
Boxplot in R Programming 8

Horizontal Boxplot in R Programming

In this R example, we change the default vertical boxplot into a horizontal box plot using a horizontal argument.

# R Boxplot Example - Horizontal Boxplot
airquality

boxplot(airquality$Wind~airquality$Month,
        notch = TRUE,
        horizontal = TRUE,
        main = "Airquality Boxplot",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )
Boxplot in R Programming 9

Creating R Boxplot using CSV File

Let us see how to create a R Boxplot using external data. For this, we are importing data from the CSV file using read.csv function. Refer R Read CSV article to import the CSV file.

# R Boxplot Example - Using CSV

employee <- read.csv("Products.csv", TRUE, sep = ",", 
                     na.strings = TRUE)

boxplot(employee$SalesAmount~employee$EnglishCountryRegionName,
        main = "Products Boxplot",
        col = c("steelblue", "tomato3", "yellow2", 
                "orange4", "lawngreen", "skyblue4")
        )

Above code snippet will draw the boxplot for the Sales Amount, group by Country.

Boxplot in R Programming 10