Boxplot in R Programming

The box plot or boxplot in R programming is a convenient way to graphically visualizing the numerical data group by specific data. Let us see how to Create a R boxplot, Remove outlines, Format its color, adding names, adding the mean, and drawing horizontal boxplot in R Programming language with example.

Syntax of a Boxplot in R

The syntax to draw the Boxplot in R Programming is

boxplot(formula, data = NULL,.., subset, na.action = NULL)

and the complex syntax behind this R Boxplot function has the following arguments

(x, ....., range = 1.5, width = NULL, varwidth = FALSE,
     notch = FALSE, outline = TRUE, col = NULL, log = "",
     border = par("fg"), names, plot = TRUE, 
     pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5), 
     horizontal = FALSE, add = FALSE, at = TRUE)

There are many arguments supported by the Boxplot in R programming language, and the following are some of the arguments:

  • data: Please specify the Data Frame, or List that contains the data to draw boxplot. In this example, it is airquality
  • subset: You can restrict the R barplots by specifying the vector of values. In this example, you can restrict the barplot box for August month.
  • x: Please specify the data from which you want to draw the R box plot. Here, you can use a numeric vector, or a list containing the numeric vector.
  • range: This R Programming argument decides how far the whisker extends out of the box.
  • width: It is optional, use this to specify a vector that contains the widths of each box.
  • varwidth: It is a Boolean argument. If it is TRUE, boxes draw with widths proportional to the square roots of the no. of observations in the group.
  • border: It is an optional argument. Please specify the vector of color you want to add to the outlines of the boxplot borders.
  • plot: It is a Boolean argument. If it is FALSE, it returns the summaries on which the R boxplots based on.
  • log: You have to specify a character string of three options. If X-Axis is to be logarithmic then “x”, If Y-Axis is to be logarithmic “y”, if both X-Axis and Y-Axis are to be logarithmic, then specify either “xy” or “yx”
  • add: It is a Boolean argument, and by default, it is FALSE. If it is TRUE, the plot should add to an already existing plot.
  • horizontal: It is a Boolean argument. If it is FALSE, the R boxplot drew vertically. If it is TRUE, it drew horizontally.
  • at: It is a numeric vector, which gives the locations where the boxplot drew. It is very helpful when we are adding a new boxto the existing plot region.

Before we get into the example, let us see the data that we are going to use for this R boxplot example. airquality is the date set provided by the R

Boxplot in R Programming 0

Return Value of a Boxplot in R Programming

In general, before we start creating a R boxplot, let us see how the data divided by the box plot. It returns the stats, outliners, groups, and names.

airquality

return.value <- boxplot(airquality$Wind)
return.value
Boxplot in R Programming 1

Create a Boxplot in R Programming

In this example, we create a Boxplot using the airquality data set, which is provided by the Studio. If you require import data from external files, then refer R Read CSV article to understand the importing of the CSV file.

airquality

boxplot(airquality$Wind)
Boxplot in R Programming 2

airquality data set returns the output as a List. So, we are using the $ to extract the data from List.

boxplot(airquality$Wind)

Use Formula to create a Boxplot in R

In this example, we create a Boxplot using the formula argument.

  • formula: It should be something like value~group, where value is the vector of numeric values, and the group is the column you want to use as a group by. For example, you want to draw the sample size box for countrywide sales, then value = sales and group = country.
airquality

boxplot(airquality$Wind~airquality$Month)
Boxplot in R Programming 3

Assigning names to Boxplot in R Programming

In this example, we assign names to R Box plot, X-Axis, and Y-Axis using main, xlab, and ylab

  • main: You can change, or provide the Title.
  • xlab: Please specify the label for the X-Axis
  • ylab: Please specify the label for the Y-Axis
  • las: Used to change the Y-axis values direction.
airquality

boxplot(airquality$Wind~airquality$Month,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1
        )
Boxplot in R Programming 4

Change Colors of a Boxplot in R

In this example, we change the R Boxplot box colors using the col argument

  • col: Please specify the color you want to use. Type colors() in your console to get the list of colors available in R programming
  • names: Please specify the names for the boxes. Here, we are changing the Month numbers to Month names
#  Changing Colors, Assigning new Names
airquality

boxplot(airquality$Wind~airquality$Month,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1", 
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", "August", "September")
        )
Boxplot in R Programming 5

Removing Outlines of Boxplot in R

In this R Box plot example, we remove the Outlines using an outline argument.

  • outline: It is a Boolean argument. If it is TRUE, it draws the outlines (that are extra dots outside the box), and if it is false, all the outlines in the boxplot removed.
airquality

boxplot(airquality$Wind~airquality$Month,
        outline = FALSE,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )
Boxplot in R Programming 6

Calculating & Adding Mean to Boxplot in R

In this R example, we calculate the Mean of each box, and how to add those mean values to existing boxplot using points function.

airquality

boxplot(airquality$Wind~airquality$Month,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )

meanval <- by(airquality$Wind, airquality$Month, mean)
points(meanval, col = "white", pch = 8, cex = 1.5)
Boxplot in R Programming 7

The following statement finds the mean value of Wind, group by Month numbers.

meanval <- by(airquality$Wind, airquality$Month, mean)

The following statement add that means value to the boxes. pch = 8 means star character, cex is the size of the character, and col is for color.

points(meanval, col = "white", pch = 8, cex = 1.5)

Notch argument in R Boxplot

In this example, we draw a line on each side of the boxes using the notch argument.

  • notch: It is a Boolean argument. If it is TRUE, a notch drawn on each side of the box. If the notches of 2 plots overlapped then, we could say that the medians of them are the same. Otherwise, they are different.
# Notch
airquality

boxplot(airquality$Wind~airquality$Month,
        notch = TRUE,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )

meanval <- by(airquality$Wind, airquality$Month, mean)
points(meanval, col = "white", pch = 8, cex = 1.5)
Boxplot in R Programming 8

Horizontal Boxplot in R Programming

In this R example, we change the default vertical boxplot into a horizontal box plot using a horizontal argument.

airquality

boxplot(airquality$Wind~airquality$Month,
        notch = TRUE,
        horizontal = TRUE,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )
Boxplot in R Programming 9

Creating R Boxplot using CSV File

Let us see how to create a R Boxplot using external data. For this, we are importing data from the CSV file using read.csv function. Refer Read CSV article to import the CSV file.

employee <- read.csv("Products.csv", TRUE, sep = ",", 
                     na.strings = TRUE)

boxplot(employee$SalesAmount~employee$EnglishCountryRegionName,
        main = "Products",
        col = c("steelblue", "tomato3", "yellow2", 
                "orange4", "lawngreen", "skyblue4")
        )

Above code snippet will draw the boxplot for the Sales Amount, group by Country.

Boxplot in R Programming 10