Data Frame in R

The Data Frame in R is a table or two-dimensional data structure. In R Data Frames, data is stored in row and columns, and we can access the data frame elements using the row index and column index. The following are some of the characteristics of the R Data Frame:

  • A data frame is a list of variables, and it must contain the same number of rows with unique row names.
  • The Column Names should not be Empty
  • Although r data frame supports duplicate column names by using check.names = FALSE, It is always preferable to use unique Column names.
  • The data stored in a data frame can be Character type, Numerical type, or Factors.

In this article, we show you how to Create a Data Frame, How to Access Data Frame Columns and Rows. And Manipulate the individual or Row level or Column level Elements, Creating Named Data Frames. Also explains some of the important functions supported by the Data Frame in R Programming.

How to Create Data Frame in R

This example create a Data Frame with a different element. The most common way to create a Data Frame in R Programming is.

# R Create Data Frame

Id <- c(1:10)
Name <- c("John", "Rob", "Ruben", "Christy","Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu")
Occupation <- c("Professional", "Programmer","Management", "Clerical", 
                "Developer", "Programmer", "Management", "Clerical", "Developer","Programmer")
Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000)

employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

First, we created four vectors of a different type, and then we created the data frame using those four-vectors.

Data Frame in R programming 1

Create Named Data Frame in R

It shows the steps involved in creating named Data Frame in R programming and the syntax is:

DataFrame_Name <- data.frame(“index_Name1” = Item1, “index_Name2″ = Item2,… ,”index_NameN” = ItemN )

# Create Named Data Frame in R Programming

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

# We are assigning new names to the Columns
employee <- data.frame("Empid" = Id, "Full_Name" = Name, "Profession" = Occupation, "income" = Salary)

print(employee)

# Names function will display the Index Names of each Item 
print(names(employee))
Data Frame in R programming 2

Access R Data Frame Elements

In R programming, we can access the Data Frame item in multiple ways. Here, we show you how to access the data frame items using the index position. Index value starts at 1 and ends at n where n is the number of items in a data frame.

For example, if we declare a data frame that stores 10 items (10 columns), then the index starts at 1 and ends at 10. To access 1st value use DataFrame_Name[1] and to access the 10th value, use DataFrame_Name[10].

We can also access the Data frame elements using the [[ double brackets. In this example, we show you how to access the data frame items using this [[. It will return the result as a R Programming Vector with Level information.

# Accessing R Data Frame Elements

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

# Accessing all the Elements (Rows) Present in the Name Items (Column)
employee["Name"]

# Accessing all the Elements (Rows) Present in the 3rd Column (i.e., Occupation)
employee[3] # Index Values: 1 = Id, 2 = Name, 3 = Occupation, 4 = Salary
Data Frame in R programming 3

Access Elements of a R Data Frame

We can also access the Data frame elements using the [[ double brackets. In this example, we show how to access the data frame items using this [[. This will return the result as a Vector with Level information

# Accessing R Data Frame Elements

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

employee[["Name"]]
employee[[3]]

It is returning the same result as the above example. But, it is returning vector, rather than a data frame.

Data Frame in R programming 4

Accessing R Data Frame items using $

We can also access the Data frame elements using the $ dollar symbol. In this example, we will show how to access the elements of the data frame in R using this $. It will return the result as a Vector with Level information. Syntax behind this is: <Data Frame>$Column_Name

# Accessing R Data Frame Elements

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

# Accessing all the Elements (Rows) Present in the Name Item (Column)
employee$Name

# Accessing all the Elements (Rows) Present in the Salary Item (Column)
employee$Salary
Data Frame in R programming 13

Accessing Low Level elements of R Data Frame

In R programming, We can access the lower level elements of the items (or individual cells) in a Data Frame using the index position. Using this index value, we can access each individual item in the data frame. Index value starts at 1 and ends at n where n is the number of elements in a row or column. Syntax behind this is: <Data Frame>[Row_Number, Column_Number] .

For example, if we declare a data frame that has six rows of elements and 4 column elements. To access or alter 1st value use DataFrame_Name[1, 1], to access 2nd row 3rd column value use DataFrame_Name[2, 3], and to access the 6th row 4th column use DataFrame_Name[6, 4].

# Accessing Low level elements in R Data Frame

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)
employee <- data.frame(Id, Name, Occupation, Salary)

print(employee)
# Accessing Element at 1st Row and 2nd Column 
employee[1, 2]

# Accessing Element at 4th Row and 3rd Column 
employee[4, 3] 

# Accessing All Elements at 5th Row 
employee[5, ] 
         
# Accessing All Item of the 4th Column 
employee[, 4]
Data Frame in R programming 5

Accessing Multiple Values

It shows how to access multiple items from the Data Frame. To achieve the same, we use the R Vector

# Accessing Subset of elements in R Data Frame

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)
employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

# Accessing Item at 1st, 2nd Rows and 3rd, 4th Columns 
employee[c(1, 2), c(3, 4)]

# Accessing Item at 2nd, 3rd, 4th Rows and 2nd, 4th Columns 
employee[2:4, c(2, 4)] 

# Accessing All Item at 2nd, 3rd, 4th, 5th Rows 
employee[2:5, ] 
         
# Accessing All Item of 2nd and 4th Column 
employee[c(2, 4)]
Data Frame in R programming 6

Access Data Frame Elements at Lower Level using $

We can also access the Data frame elements at a lower level (individual cell) using the $ dollar symbol. Let’s see how to access the individual cells in a data frame using this $. It returns the result as a Vector with Level information.

# Accessing elements in R Data Frame

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Occupation <- c("Professional", "Management", "Developer", 
                "Programmer", "Clerical", "Admin")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)
employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

# Accessing Item at 2nd, 4th Rows of Name Columns 
employee$Name[c(2, 4)] 

# Accessing Item at 2nd, 3rd, 4th, 5th Rows of Occupation Column 
employee$Occupation[2:5] 
Data Frame in R programming 7

Modifying Data Frame Elements

In R programming, We can access the data frame elements using the index position. Using this index value, we can alter or change each and every individual Data Frame element. Here, we modify the particular Cell value, and entire column items.

# Modifying elements in R Data Frame

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

employee <- data.frame(Id, Name, Salary)
print(employee)

# Modifying Item at 2nd Row and 3rd Column 
employee[2, 3] <- 100000
print(employee)

#  Modifying All Item of 1st Column 
employee[, 1] <- c(10:15)
print(employee)
Data Frame in R programming 8

Add Elements to Data Frame

This example adds the new elements to the existing Data Frame in R programming.

  • cbind(Data Frame, Values): It adds extra Columns with values. In general, we prefer Vector as values parameter
  • rbind(Data Frame, Values): To add extra Row with values.
# Adding elements in R Data Frame

Id <- c(1:6)
Name <- c("John", "Rob", "Christy","Johnson", "Miller", "Zhu")
Salary <- c(80000, 90000, 75000, 92000, 68000, 82000)

employee <- data.frame(Id, Name, Salary, stringsAsFactors=FALSE)
print(employee)

# Adding Extra Row 
rbind(employee, list(7, "Gateway", 105505))

# Adding Extra Column 
Occupation <- c("Management", "Developer", "User", "Programmer", "Clerical", "Admin")
cbind(employee, Occupation)
Data Frame in R programming 9

Important Functions of Data Frame in R

The following DataFrame functions are some of the most useful functions.

  • typeof(Data Frame): Returns the type of Data Frame. Since the data frame is a kind of list, it returns a list
  • class(Data Frame): The class of the Data Frame
  • length(Data Frame): Count the number of items (columns) in a Data Frame
  • nrow(Data Frame): Returns the total number of Rows present in the Data Frame.
  • ncol(Data Frame): The total number of Data Frame Columns.
  • dim(Data Frame): The total number of Rows and Columns present in the Data Frame.
# R Data Frame Important Functions

Id <- c(1:10)
Name <- c("John", "Rob", "Ruben", "Christy","Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu")
Occupation <- c("Professional", "Programmer","Management", "Clerical", 
                "Developer", "Programmer", "Management", "Clerical", "Developer","Programmer")
Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000)

#employee <- data.frame("empid" = Id, "name" = Name, "Profession" = Occupation, "income" = Salary)
employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

typeof(employee)
class(employee)
names(employee)

# Number of Rows and Columns
length(employee)
ncol(employee)
nrow(employee)
dim(employee)
Data Frame in R programming 10

Head and Tail Functions in R Data Frame

If your data is too big and you want to extract the top-performing records, then you can use these Data Frame functions

  • head(Data Frame, limit): Returns the top six elements (if you Omit the limit). If you specify the limit as 2 then, it will return the first 2 records. It is something like selecting the top 10 records.
  • tail(Data Frame, limit): Returns the last six elements (if you Omit the limit). If you specify the limit as 4, it will return the last four records.
# Head and Tail Function in R Data Frame

Id <- c(1:10)
Name <- c("John", "Rob", "Ruben", "Christy", "Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu")
Occupation <- c("Professional", "Programmer","Management", "Clerical", "Developer", "Programmer", 
                "Management", "Clerical", "Developer","Programmer")
Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000)

employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

# No limit - It means Displaying First Six Records 
head(employee)

# Limit is 4 - It means Displaying First Four Records 
head(employee, 4)

# No limit - It means Displaying Last Six Records 
tail(employee)

# Limit is 4 - It means Displaying Last Six Records 
tail(employee, 4)
Data Frame in R programming 11

R Data Frame Special Functions

The following two are the very useful functions supported by the Data Frame. It is always good to check the structure of the data before we start manipulating or inserting new records.

  • str(Data Frame): Returns the structure of the Data Frame data.
  • summary(Data Frame): It returns the nature of the data and the statistical summary such as Minimum, Median, Mean, Median, etc.
# R Data Frame Important Functions

Id <- c(1:10)
Name <- c("John", "Rob", "Ruben", "Christy","Johnson", "Miller", "Carlson", "Ruiz", "Yang","Zhu")
Occupation <- c("Professional", "Programmer","Management", "Clerical", 
                "Developer", "Programmer", "Management", "Clerical", "Developer","Programmer")
Salary <- c(80000, 70000, 90000, 50000, 60000, 75000, 92000, 68000, 55000, 82000)
#employee <- data.frame("empid" = Id, "name" = Name, "Profession" = Occupation, "income" = Salary)
employee <- data.frame(Id, Name, Occupation, Salary)
print(employee)

print(str(employee))
print(summary(employee))
Data Frame in R programming 12