The box plot or boxplot in R programming is a convenient way to graphically visualizing the numerical data group by specific data. Let us see how to Create a R boxplot, Remove outlines, Format its color, adding names, adding the mean, and drawing horizontal boxplot in R Programming language with example.
Syntax of a Boxplot in R
The syntax to draw the Boxplot in R Programming is
boxplot(formula, data = NULL,.., subset, na.action = NULL)
and the complex syntax behind this R Boxplot is:
boxplot(x, ....., range = 1.5, width = NULL, varwidth = FALSE, notch = FALSE, outline = TRUE, col = NULL, log = "", border = par("fg"), names, plot = TRUE, pars = list(boxwex = 0.8, staplewex = 0.5, outwex = 0.5), horizontal = FALSE, add = FALSE, at = TRUE)
There are many arguments supported by the Boxplot in R programming language, and the following are some of the arguments:
- data: Please specify the DataFrame, or List that contains the data to draw boxplot. In this example, it is airquality
- subset: You can restrict the R barplots by specifying the vector of values. In this example, you can restrict the barplot box for August month.
- x: Please specify the data from which you want to draw the R box plot. Here, you can use a numeric vector, or a list containing the numeric vector.
- range: This R Programming argument decides how far the whisker extends out of the box.
- width: It is optional, use this to specify a vector that contains the widths of each box.
- varwidth: It is a Boolean argument. If it is TRUE, boxes draw with widths proportional to the square roots of the no. of observations in the group.
- border: It is an optional argument. Please specify the vector of color you want to add to the outlines of the boxplot borders.
- plot: It is a Boolean argument. If it is FALSE, it returns the summaries on which the R boxplots based on.
- log: You have to specify a character string of three options. If X-Axis is to be logarithmic then “x”, If Y-Axis is to be logarithmic “y”, if both X-Axis and Y-Axis are to be logarithmic, then specify either “xy” or “yx”
- add: It is a Boolean argument, and by default, it is FALSE. If it is TRUE, the boxplot should add to an already existing plot.
- horizontal: It is a Boolean argument. If it is FALSE, the R boxplot drew vertically. If it is TRUE, the boxplot drew horizontally.
- at: It is a numeric vector, which gives the locations where the boxplot drew. It is very helpful when we are adding a new boxplot to the existing plot region.
Before we get into the example, let us see the data that we are going to use for this R box plot example. airquality is the date set provided by the R
Return Value of a Boxplot in R Programming
In general, before we start creating a R boxplot, let us see how the data divided by the box plot. It returns the stats, outliners, groups, and names.
# R Boxplot Data airquality return.value <- boxplot(airquality$Wind) return.value
Create a Boxplot in R Programming
In this example, we create a Boxplot using the airquality data set, which is provided by the R Studio. If you require import data from external files, then refer R Read CSV article to understand the importing of the CSV file.
# R Boxplot Data airquality boxplot(airquality$Wind)
airquality data set returns the output as a List. So, we are using the $ to extract the data from List.
boxplot(airquality$Wind)
Use Formula to create a Boxplot in R
In this example, we create a Boxplot using the formula argument.
- formula: It should be something like value~group, where value is the vector of numeric values, and the group is the column you want to use as a group by. For example, you want to draw a boxplot for countrywide sales, then value = sales and group = country.
# R Boxplot Example airquality boxplot(airquality$Wind~airquality$Month)
Assigning names to Boxplot in R Programming
In this example, we assign names to R Box plot, X-Axis, and Y-Axis using main, xlab, and ylab
- main: You can change, or provide the Title for your Boxplot.
- xlab: Please specify the label for the X-Axis
- ylab: Please specify the label for the Y-Axis
- las: Used to change the Y-axis values direction.
# R Boxplot Example - Changing Names airquality boxplot(airquality$Wind~airquality$Month, main = "Airquality Boxplot", xlab = "Months", ylab = "Wind", las = 1 )
Change Colors of a Boxplot in R
In this example, we change the R Boxplot box colors using the col argument
- col: Please specify the color you want to use for your Boxplot. Type colors() in your console to get the list of colors available in R programming
- names: Please specify the names for the boxes. Here, we are changing the Month numbers to Month names
# R Boxplot Example - Changing Colors, Assigning new Names airquality boxplot(airquality$Wind~airquality$Month, main = "Airquality Boxplot", xlab = "Months", ylab = "Wind", las = 1, col = c("violetred", "steelblue1", "salmon1", "palegoldenrod", "olivedrab"), names = c("May", "June", "July", "August", "September") )
Removing Outlines of Boxplot in R
In this R Box plot example, we remove the Outlines using an outline argument.
- outline: It is a Boolean argument. If it is TRUE, Boxplot draws the outlines (that are extra dots outside the box), and if it is false, all the outlines in the boxplot removed.
# R Boxplot Example - Removing Outlines airquality boxplot(airquality$Wind~airquality$Month, outline = FALSE, main = "Airquality Boxplot", xlab = "Months", ylab = "Wind", las = 1, col = c("violetred", "steelblue1", "salmon1", "palegoldenrod", "olivedrab"), names = c("May", "June", "July", "August", "September") )
Calculating & Adding Mean to Boxplot in R
In this R Boxplot example, we calculate the Mean of each box, and how to add those mean values to existing boxplot using points function.
# R Boxplot Example - Adding Mean airquality boxplot(airquality$Wind~airquality$Month, main = "Airquality Boxplot", xlab = "Months", ylab = "Wind", las = 1, col = c("violetred", "steelblue1", "salmon1", "palegoldenrod", "olivedrab"), names = c("May", "June", "July", "August", "September") ) meanval <- by(airquality$Wind, airquality$Month, mean) points(meanval, col = "white", pch = 8, cex = 1.5)
The following statement finds the mean value of Wind, group by Month numbers.
meanval <- by(airquality$Wind, airquality$Month, mean)
The following statement add that means value to the boxes. pch = 8 means star character, cex is the size of the character, and col is for color.
points(meanval, col = "white", pch = 8, cex = 1.5)
Notch argument in R Boxplot
In this example, we draw a line on each side of the boxes using the notch argument.
- notch: It is a Boolean argument. If it is TRUE, a notch drawn on each side of the box. If the notches of 2 plots overlapped then, we could say that the medians of them are the same. Otherwise, they are different.
# R Boxplot Example - Notch airquality boxplot(airquality$Wind~airquality$Month, notch = TRUE, main = "Airquality Boxplot", xlab = "Months", ylab = "Wind", las = 1, col = c("violetred", "steelblue1", "salmon1", "palegoldenrod", "olivedrab"), names = c("May", "June", "July", "August", "September") ) meanval <- by(airquality$Wind, airquality$Month, mean) points(meanval, col = "white", pch = 8, cex = 1.5)
Horizontal Boxplot in R Programming
In this R example, we change the default vertical boxplot into a horizontal box plot using a horizontal argument.
# R Boxplot Example - Horizontal Boxplot airquality boxplot(airquality$Wind~airquality$Month, notch = TRUE, horizontal = TRUE, main = "Airquality Boxplot", xlab = "Months", ylab = "Wind", las = 1, col = c("violetred", "steelblue1", "salmon1", "palegoldenrod", "olivedrab"), names = c("May", "June", "July", "August", "September") )
Creating R Boxplot using CSV File
Let us see how to create a R Boxplot using external data. For this, we are importing data from the CSV file using read.csv function. Refer R Read CSV article to import the CSV file.
# R Boxplot Example - Using CSV employee <- read.csv("Products.csv", TRUE, sep = ",", na.strings = TRUE) boxplot(employee$SalesAmount~employee$EnglishCountryRegionName, main = "Products Boxplot", col = c("steelblue", "tomato3", "yellow2", "orange4", "lawngreen", "skyblue4") )
Above code snippet will draw the boxplot for the Sales Amount, group by Country.