Boxplot in R Programming

The box plot or boxplot in R programming is a convenient way to graphically visualize the numerical data group by specific data. Let us see how to Create, Remove outlines, Format their color, add names, add the mean, and draw a horizontal boxplot in R Programming language with an example.

R Boxplot Syntax

The syntax to draw the Boxplot in R Programming is

boxplot(formula, data = NULL,.., subset, na.action = NULL)

The complex syntax behind this R Boxplot function has the following arguments

(x, ....., range = 1.5, width = NULL, varwidth = FALSE,
     notch = FALSE, outline = TRUE, col = NULL, log = "",
     border = par("fg"), names, plot = TRUE, 
     pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5), 
     horizontal = FALSE, add = FALSE, at = TRUE)

There are many arguments supported by the Boxplot in R programming language, and the following are some of the arguments:

  • data: Please specify the Data Frame or List that contains the data to draw a boxplot. In this example, it is airquality
  • subset: You can restrict the plots by specifying the vector of values. In this example, you can restrict them to August month.
  • x: Please specify the data from which you want to draw. Here, you can use a numeric vector or a list containing the numeric vector.
  • range: This R Programming boxplot argument decides how far the whisker extends out of the box.
  • width: It is optional, use this to specify a vector that contains the widths of each box.
  • varwidth: It is a Boolean argument. If it is TRUE, boxes draw with widths proportional to the square roots of the no. of observations in the group.
  • border: It is an optional argument. Please specify the vector of color you want to add to the outlines of the boxplot borders.
  • plot: It is a Boolean argument. If it is FALSE, it returns the summaries on which the R boxplots are based.
  • log: You have to specify a character string of three options. If X-Axis is to be logarithmic, then “x”, If Y-Axis is to be logarithmic “y”, if both X-Axis and Y-Axis are to be logarithmic, then specify either “xy” or “yx”
  • add: It is a Boolean argument, and by default, it is FALSE. If it is TRUE, the plot should add to an already existing plot.
  • horizontal: It is a Boolean argument. If it is FALSE, the boxplot drew vertically. If it is TRUE, it drew horizontally.
  • at: It is a numeric vector, which gives the locations where the boxplot drew. It is very helpful when we are adding a new box to the existing plot region.

Before we get into the example, let us see the airquality dataset data provided by this R programming, which we are going to use for this boxplot example.

airquality Data 0

Return Value of a Boxplot in R Programming

In general, before we start creating, let us see how the data is divided. It returns the stats, outliners, groups, and names.

airquality

return.value <- boxplot(airquality$Wind)
return.value
Return Value of airquality 1

Create a Boxplot in R Programming

In this example, we create a R Boxplot using the airquality data set, which is provided by the Studio. If you require to import data from external files, then refer to the R Read CSV article to understand the importing of the CSV file.

airquality

boxplot(airquality$Wind)
Boxplot in R Programming 2

airquality data set returns the output as a List. So, we are using the $ to extract the data from the List.

boxplot(airquality$Wind)

Boxplot using Formula argument

In this example, we create using the formula argument to create a boxplot in r programming.

  • formula: It should be something like value~group, where the value is the vector of numeric values, and the group is the column you want to use as a group by. For example, you want to draw the sample size box for countrywide sales, then value = sales and group = country.
airquality

boxplot(airquality$Wind~airquality$Month)
R Boxplot using formula

Assigning names to Boxplot

In this example, we assign names to R Boxplot, X-Axis, and Y-Axis using main, xlab, and ylab

  • main: You can change or provide the Title.
  • xlab: Please specify the label for the X-Axis
  • ylab: Please specify the label for the Y-Axis
  • las: Used to change the Y-axis values direction.
airquality

boxplot(airquality$Wind~airquality$Month,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1
        )
Assign Names to R Boxplot using main, xlab, ylab, and las 4

Change Colors of a Boxplot in R

In this R example, we change the Boxplot box colors using the col argument

  • col: Please specify the color you want to use. Type colors() in your console to get the list of colors available.
  • names: Please specify the names of the boxes. Here, we are changing the Month numbers to Month names.
#  Changing Colors, Assigning new Names
airquality

boxplot(airquality$Wind~airquality$Month,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1", 
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", "August", "September")
        )
Change Colors of a Box Plot

Removing Outlines of Boxplot

In this R Boxplot example, we remove the Outlines using an outline argument.

  • outline: It is a Boolean argument. If it is TRUE, it draws the outlines (that are extra dots outside the box), and if it is false, all the outlines are removed.
airquality

boxplot(airquality$Wind~airquality$Month,
        outline = FALSE,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )
Removing Outlines of RB Boxplot in R Programming 6

Calculating & Adding Mean to Boxplot in R

In this R example, we calculate the Mean of each box and how to add those mean values to the existing boxplot using the points function.

airquality

boxplot(airquality$Wind~airquality$Month,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )

meanval <- by(airquality$Wind, airquality$Month, mean)
points(meanval, col = "white", pch = 8, cex = 1.5)
Calculating & Adding Mean for Boxplot in R Programming 7

The following statement finds the mean value of Wind, grouped by Month numbers.

meanval <- by(airquality$Wind, airquality$Month, mean)

The following statement adds that means value to the boxes. pch = 8 means star character, cex is the size of the character, and col is for color.

points(meanval, col = "white", pch = 8, cex = 1.5)

Notch argument in R Boxplot

In this example, we draw a line on each side of the boxes using the notch argument.

  • notch: It is a Boolean argument. If it is TRUE, a notch is drawn on each side of the box. If the notches of 2 plots overlapped, then we could say that the medians of them are the same. Otherwise, they are different.
# Notch
airquality

boxplot(airquality$Wind~airquality$Month,
        notch = TRUE,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )

meanval <- by(airquality$Wind, airquality$Month, mean)
points(meanval, col = "white", pch = 8, cex = 1.5)
notch argument in Boxplot in R Programming 8

Horizontal Boxplot in R Programming

We change the default vertical boxplot into a horizontal box plot in this R example using a horizontal argument.

airquality

boxplot(airquality$Wind~airquality$Month,
        notch = TRUE,
        horizontal = TRUE,
        main = "Airquality",
        xlab = "Months",
        ylab = "Wind",
        las = 1,
        col = c("violetred", "steelblue1", "salmon1",
                "palegoldenrod", "olivedrab"),
        names = c("May", "June", "July", 
                  "August", "September")
        )
Horizontal Boxplot in R Programming 9

Creating R Boxplot using CSV File

Let us see how to create the R Boxplot using external data. For this, we are importing data from the CSV file using the read.csv function. Refer to the Read CSV article to import the CSV file.

employee <- read.csv("Products.csv", TRUE, sep = ",", 
                     na.strings = TRUE)

boxplot(employee$SalesAmount~employee$EnglishCountryRegionName,
        main = "Products",
        col = c("steelblue", "tomato3", "yellow2", 
                "orange4", "lawngreen", "skyblue4")
        )

The above code snippet will draw the boxplot for the Sales Amount, grouped by Country.

Creating R Boxplot using CSV File 10