The Data Frame in R is a table or two-dimensional data structure. In R Data Frames, data is stored in row and columns, and we can access the data frame elements using the row index and column index. The following are some of the characteristics of the R Data Frame:
- A data frame is a list of variables, and it must contain the same number of rows with unique row names.
- The Column Names should not be Empty
- Although r data frame supports duplicate column names by using check.names = FALSE, It is always preferable to use unique Column names.
- The data stored in a data frame can be Character type, Numerical type, or Factors.
In this article, we show you how to Create a Data Frame, How to Access Data Frame Columns and Rows. And Manipulate the individual or Row level or Column level Elements, Creating Named Data Frames. Also explains some of the important functions supported by the Data Frame in R Programming with example.
Create Data Frame in R
In this example, we will create a Data Frame with a different element. The following code snippet will show you the most common way to create a Data Frame in R Programming.
# R Create Data Frame Id <- c(1:10) Name <- c("John", "Rob", "Ruben", "Christy","Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu") Occupation <- c("Professional", "Programmer","Management", "Clerical", "Developer", "Programmer", "Management", "Clerical", "Developer","Programmer") Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000) employee <- data.frame(Id, Name, Occupation, Salary) print(employee)
From the above code snippet, you can observe that First, we created four vectors of a different type, and then we created the data frame using those four-vectors.
Create Named Data Frame in R
In this example, we show you the steps involved in creating named Data Frame in R programming. Syntax behind this is:
DataFrame_Name <- data.frame(“index_Name1” = Item1, “index_Name2″ = Item2,… ,”index_NameN” = ItemN )
# Create Named Data Frame in R Programming Id <- c(1:6) Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu") Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin") Salary <- c(80000, 90000, 75000, 92000, 68000, 82000) # We are assigning new names to the Columns employee <- data.frame("Empid" = Id, "Full_Name" = Name, "Profession" = Occupation, "income" = Salary) print(employee) # Names function will display the Index Names of each Item print(names(employee))
OUTPUT
Access R Data Frame Elements
In R programming, We can access the Data Frame item in multiple ways. In this example, we will show you how to access the data frame items using the index position. Using this index value, we can access each and every item present in the Data Frame. Index value starts at 1 and ends at n where n is the number of items in a data frame.
For example, if we declare a data frame that stores 10 items (10 columns), then the index starts at 1 and ends at 10. To access 1st value use DataFrame_Name[1] and to access the 10th value, use DataFrame_Name[10].
We can also access the Data frame elements using the [[ double brackets. In this example, we show you how to access the data frame items using this [[. It will return the result as a R Programming Vector with Level information.
# Accessing R Data Frame Elements Id <- c(1:6) Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu") Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin") Salary <- c(80000, 90000, 75000, 92000, 68000, 82000) employee <- data.frame(Id, Name, Occupation, Salary) print(employee) # Accessing all the Elements (Rows) Present in the Name Items (Column) employee["Name"] # Accessing all the Elements (Rows) Present in the 3rd Column (i.e., Occupation) employee[3] # Index Values: 1 = Id, 2 = Name, 3 = Occupation, 4 = Salary
OUTPUT
Access Elements of a Data Frame
We can also access the Data frame elements using the [[ double brackets. In this example, we show how to access the data frame items using this [[. This will return the result as a Vector with Level information
# Accessing R Data Frame Elements Id <- c(1:6) Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu") Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin") Salary <- c(80000, 90000, 75000, 92000, 68000, 82000) employee <- data.frame(Id, Name, Occupation, Salary) print(employee) employee[["Name"]] employee[[3]]
OUTPUT
From the above screenshot, you can observe that this is returning the same result as the above example. But, it is returning vector, rather than a data frame.
Accessing R Data Frame items using $
In R Programming, We can also access the Data frame elements using the $ dollar symbol. In this example, we will show how to access the elements of the data frame in R using this $. It will return the result as a Vector with Level information. Syntax behind this is: <Data Frame>$Column_Name
# Accessing R Data Frame Elements Id <- c(1:6) Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu") Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin") Salary <- c(80000, 90000, 75000, 92000, 68000, 82000) employee <- data.frame(Id, Name, Occupation, Salary) print(employee) # Accessing all the Elements (Rows) Present in the Name Item (Column) employee$Name # Accessing all the Elements (Rows) Present in the Salary Item (Column) employee$Salary
OUTPUT
Accessing Low Level elements of R Data Frame
In R programming, We can access the lower level elements of the items (or individual cells) in a Data Frame using the index position. Using this index value, we can access each and every individual element present in the data frame. Index value starts at 1 and ends at n where n is the number of elements in a row or column. Syntax behind this is: <Data Frame>[Row_Number, Column_Number] .
For example, if we declare a data frame that has six rows of elements and 4 column elements. To access or alter 1st value use DataFrame_Name[1, 1], to access 2nd row 3rd column value use DataFrame_Name[2, 3], and to access the 6th row 4th column use DataFrame_Name[6, 4].
# Accessing Low level elements in R Data Frame Id <- c(1:6) Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu") Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin") Salary <- c(80000, 90000, 75000, 92000, 68000, 82000) employee <- data.frame(Id, Name, Occupation, Salary) print(employee) # Accessing Element at 1st Row and 2nd Column employee[1, 2] # Accessing Element at 4th Row and 3rd Column employee[4, 3] # Accessing All Elements at 5th Row employee[5, ] # Accessing All Item of the 4th Column employee[, 4]
OUTPUT
Accessing Multiple Values from R Data
In our previous example, we showed how to access a single element from the Data Frame. In this example, we will show how to access multiple items from the Data Frame. To achieve the same, we use the R Vector
# Accessing Subset of elements in R Data Frame Id <- c(1:6) Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu") Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin") Salary <- c(80000, 90000, 75000, 92000, 68000, 82000) employee <- data.frame(Id, Name, Occupation, Salary) print(employee) # Accessing Item at 1st, 2nd Rows and 3rd, 4th Columns employee[c(1, 2), c(3, 4)] # Accessing Item at 2nd, 3rd, 4th Rows and 2nd, 4th Columns employee[2:4, c(2, 4)] # Accessing All Item at 2nd, 3rd, 4th, 5th Rows employee[2:5, ] # Accessing All Item of 2nd and 4th Column employee[c(2, 4)]
OUTPUT
Access R Data Frame Elements at Lower Level using $
We can also access the Data frame elements at a lower level (individual cell) using the $ dollar symbol. In this example, we will show how to access the individual cells in a data frame using this $. It returns the result as a Vector with Level information.
# Accessing elements in R Data Frame Id <- c(1:6) Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu") Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin") Salary <- c(80000, 90000, 75000, 92000, 68000, 82000) employee <- data.frame(Id, Name, Occupation, Salary) print(employee) # Accessing Item at 2nd, 4th Rows of Name Columns employee$Name[c(2, 4)] # Accessing Item at 2nd, 3rd, 4th, 5th Rows of Occupation Column employee$Occupation[2:5]
OUTPUT
Modifying R Data Frame Elements
In R programming, We can access the data frame elements using the index position. Using this index value, we can alter or change each and every individual element present in the data frame. In this example, we will show you how to modify the particular Cell value, and entire column items.
# Modifying elements in R Data Frame Id <- c(1:6) Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu") Salary <- c(80000, 90000, 75000, 92000, 68000, 82000) employee <- data.frame(Id, Name, Salary) print(employee) # Modifying Item at 2nd Row and 3rd Column employee[2, 3] <- 100000 print(employee) # Modifying All Item of 1st Column employee[, 1] <- c(10:15) print(employee)
OUTPUT
Add Elements to Data Frame
In this example, we will show how to add the new elements to the existing Data Frame in R programming.
- cbind(Data Frame, Values): This method is used to add extra Columns with values. In general, we prefer Vector as values parameter
- rbind(Data Frame, Values): This method is used to add extra Row with values.
# Adding elements in R Data Frame Id <- c(1:6) Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu") Salary <- c(80000, 90000, 75000, 92000, 68000, 82000) employee <- data.frame(Id, Name, Salary, stringsAsFactors=FALSE) print(employee) # Adding Extra Row rbind(employee, list(7, "Gateway", 105505)) # Adding Extra Column Occupation <- c("Management", "Developer", "User", "Programmer", "Clerical", "Admin") cbind(employee, Occupation)
OUTPUT
Important Functions of Data Frame in R
The following DataFrame functions are some of the most useful functions.
- typeof(Data Frame): This method will tell you the type of Data Frame. Since the data frame is a kind of list, this function will return a list
- class(Data Frame): This method will tell you the class of the Data Frame
- length(Data Frame): This method will count the number of items (columns) in a Data Frame
- nrow(Data Frame): This method will return the total number of Rows present in the Data Frame.
- ncol(Data Frame): This method will return the total number of Columns available in the Data Frame.
- dim(Data Frame): This method will return the total number of Rows and Columns present in the Data Frame.
# R Data Frame Important Functions Id <- c(1:10) Name <- c("John", "Rob", "Ruben", "Christy","Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu") Occupation <- c("Professional", "Programmer","Management", "Clerical", "Developer", "Programmer", "Management", "Clerical", "Developer","Programmer") Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000) #employee <- data.frame("empid" = Id, "name" = Name, "Profession" = Occupation, "income" = Salary) employee <- data.frame(Id, Name, Occupation, Salary) print(employee) typeof(employee) class(employee) names(employee) # Number of Rows and Columns length(employee) ncol(employee) nrow(employee) dim(employee)
OUTPUT
Head and Tail Functions in R Data Frame
The following functions are the very useful functions supported by the Data Frame. If your data is too big and you want to extract the top-performing records, then you can use these functions
- head(Data Frame, limit): This method will return the top six elements (if you Omit the limit). If you specify the limit as 2 then, it will return the first 2 records. It is something like selecting the top 10 records.
- tail(Data Frame, limit): This method will return the last six elements (if you Omit the limit). If you specify the limit as 4, it will return the last four records.
# Head and Tail Function in R Data Frame Id <- c(1:10) Name <- c("John", "Rob", "Ruben", "Christy", "Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu") Occupation <- c("Professional", "Programmer","Management", "Clerical", "Developer", "Programmer", "Management", "Clerical", "Developer","Programmer") Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000) employee <- data.frame(Id, Name, Occupation, Salary) print(employee) # No limit - It means Displaying First Six Records head(employee) # Limit is 4 - It means Displaying First Four Records head(employee, 4) # No limit - It means Displaying Last Six Records tail(employee) # Limit is 4 - It means Displaying Last Six Records tail(employee, 4)
OUTPUT
R Data Frame Special Functions
The following two are the very useful functions supported by the Data Frame. It is always good to check the structure of the data before we start manipulating or inserting new records.
- str(Data Frame): This method returns the structure of the data present in the Data Frame.
- summary(Data Frame): This R Programming method returns the nature of the data and the statistical summary such as Minimum, Median, Mean, Median, etc.
# R Data Frame Important Functions Id <- c(1:10) Name <- c("John", "Rob", "Ruben", "Christy","Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu") Occupation <- c("Professional", "Programmer","Management", "Clerical", "Developer", "Programmer", "Management", "Clerical", "Developer","Programmer") Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000) #employee <- data.frame("empid" = Id, "name" = Name, "Profession" = Occupation, "income" = Salary) employee <- data.frame(Id, Name, Occupation, Salary) print(employee) print(str(employee)) print(summary(employee))
OUTPUT