R Read CSV Function

The R read.csv function is very useful for importing the CSV files from the file system and URLs and storing the data in a Data Frame. In this article, we will show you, How to use this read CSV function and how to manipulate the CSV data in R Programming with an example.

R Read CSV Syntax

The basic syntax to read the data from a CSV file in R programming is as shown below

read.csv(file, header = , sep = , quote = )

The read.csv supports many arguments. The following are some of the most useful arguments in real-time usage of read csv in r programming language function:

  • file: You have to specify the file name or Full path along with the name. You can also use the URL of the external (online) files. For example, sample.csv or “C:/Users/ Suresh/ Documents/sample.csv”
  • header true: If the text file contains Columns names as the First Row, then please specify the header argument TRUE otherwise, FALSE
  • sep: It is a short form of the separator. You have to specify the character that is separating the fields. ” , ” means data is separated by comma
  • quote: If your character values (FirstName, Education column tc) are enclosed in quotes, then you have to specify the quote type. For double quotes, we use quote = “\”” in r read.csv function
  • as.is: Please specify the Boolean vector of the same length as the number of columns. This argument will convert the character values to factors based on the Boolean value. For example, if we have two columns (FirstName, Sales) then we can use as.is = c(TRUE, FALSE), and this will keep the character FirstName as a character (not an implicit factor)
  • nrows: It is an integer value. You can use this R Read CSV argument to restrict the number of rows. For example, if you want top 5 records, use nrows = 5
  • skip: Please specify the number of rows you want to skip from the file before beginning the read. For example, if you want to skip the top 2 records, use skip = 2
  • strip.white: When the sep argument is not equal to “”, then you can use this Boolean value to trim the extra leading and tailing white spaces from the character field.
  • comment.char: If there are any comment lines in your file, then you can use this R Read CSV argument to ignore those lines. You have described the single special character that you used for the comment line. For example, if your data contains comments starting with $, then use comment.char = “$” to skip this comment line from reading.
  • stringsAsFactors: Boolean Value indicating whether the text fields present in the CSV file should be converted to factors or not.

The following screenshot will show you the data inside our employee, and we are going to use this text to demonstrate the R read.csv function. As you can see, it has Columns names, 14 rows, and 7 columns.

Employee txt 1

If you want to use the same data, then Please copy the below data and paste it into notepad, and save it as an employee

FirstName,LastName,Education,Occupation,YearlyIncome,Sales,HireDate
John,Yang,Bachelors,Professional,90000,3578.27,28-01-06
Rob,Johnson,Bachelors,Management,80000,3399.99,29-12-10
Ruben,Torres,Partial College,Skilled Manual,50000,699.0982,29-12-11
Christy,Zhu,Bachelors,Professional,80000,3078.27,28-12-12
Rob,Huang,High School,Skilled Manual,60000,2319.99,22-09-08
John,Ruiz,Bachelors,Professional,70000,539.99,06-07-09
John,Miller,Masters Degree,Management,80000,2320.49,12-08-09
Christy,Mehta,Partial High School,Clerical,50000,24.99,05-07-07
Rob,Verhoff,Partial High School,Clerical,45000,24.99,15-09-13
Christy,Carlson,Graduate Degree,Management,70000,2234.99,25-01-14
Gail,Erickson,Education,Professional,90000,4319.99,02-10-06
Barry,Johnson,Education,Management,80000,4968.59,15-05-14
Peter,Krebs,Graduate Degree,Clerical,50000,59.53,14-01-13
Greg,Alderson,Partial High School,Clerical,45000,23.5,05-07-13

R Read CSV File from Current Working Directory

In this R example, we will show you, How to read data from the CSV (comma separated values) file that is present in the current working directory in this Programming.

# From Current Working Directory

# Locate the Current Working Directory
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",")

print(employee)
R Read CSV File 2

R Read CSV File from Custom Directory

In this R example, we will show you, How to read data from the file that is present in the custom directory.

  • getwd(): This method will return the current working directory. Mostly, it is your Documents folder
  • setwd(“system address”): The setwd function can help us to change the current directory as per your requirement
  • list.files(): It displays the list of files present in that directory
# From Optional Working Directory

# Locate the Current Working Directory
getwd()

setwd("R Programs") # Or use Full path C:/Users/Suresh/Documents 
list.files()
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)
R Read CSV File from custom directory 3

Accessing CSV file Data

In R programming, the read.csv function will automatically convert the data into Data Frame. So, all the functions that the Data Frame supports can be used on CSV data. Please refer Data Frame article to understand the description of the function.

# Accessing Data
 
# Locate the Current Working Directory
getwd()
employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

# Accessing all the Elements (Rows) Present in the 3rd Column (i.e., Occupation)
Index Values: 1 = FirstNmae, 2 = LastName, 3 = Education, 4 = Occupation, 4 = Yearly Income 5 = Salary, and 6 = HireDate
employee[[5]] 

# Accessing all the Elements (Rows) Present in the Occupation Item (Column)
employee$Occupation

# Accessing Element at 4th Row and 3rd Column 
employee[4, 3] 

# Accessing Item at 1st, 2nd 4th Rows and 4th, 5th, 6th, 7th Columns 
employee[c(1, 2, 4), c(4:7)]
Access Data using positions 4

Common Functions

While we are working or reading data from CSV files in R programming, the following functions are common functions.

  • max method will return the maximum value within the column
  • min method will return the minimum value within the column
  • subset(data, condition): This method will return the subset of data, and the data depends on the condition
# Common Functions 
# Locate the Current Working Directory
getwd()
employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

# It returns the Maximum Value within the Yearly Income Column
maximum.salary <- max(employee$YearlyIncome)
print(maximum.salary)

# It returns the Minimum Value within the Sales Column
minimum.sales <- min(employee$Sales)
print(minimum.sales)

# It will calculate and returns the Sales Column Mean Value
mean.sales <- mean(employee$Sales)
print(mean.sales)

# It returns all the records, whose Education is equal to Bachelors
subdata <- subset(employee, Education == "Bachelors")
print(subdata)

# It returns all the records, whose Education is equal to Bachelors and Yearly Income > 70000
partialdata <- subset(employee, Education == "Bachelors" & YearlyIncome > 70000)
print(partialdata)
R Read CSV File 5

Important R Read CSV Functions

The following functions are some of the most useful functions while reading CSV files in R programming.

  • typeof method will tell you the type of the variable. Since the data frame is a kind of list, this function will return a list
  • class method that will tell you the class of the Data present in the CSV file
  • length method will count the number of items (columns)
  • nrow method will return the total number of Rows present.
  • ncol method will return the total number of Columns available.
  • dim method will return the total number of Rows and Columns.
# Important Functions

# Locate the Current Working Directory
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

typeof(employee)
class(employee)
names(employee)

length(employee)
nrow(employee)
ncol(employee)
dim(employee)
R Read CSV File 6

Head and Tail Functions

In R Programming, the Following functions are very useful functions to work with external data(read CSV files). If your CSV file is too big and you want to extract the top performing records (top 20 records), then you can use these functions

  • head(Data, limit): This method will return the top six elements (if you Omit the limit). If you specify the limit as 3, then it will return the first three records. It is something like selecting the top 20 records.
  • tail(Data, limit): This method will return the last six elements (if you Omit the limit). If you specify the limit as 4, it will return the last four records. It is something like selecting the bottom 10 records.
# head and Tail

# Locate the Current Working Directory
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

# No limit - It will Display Top Six Records 
head(employee)

# Limit is 4 - It will Display Top Four Records
head(employee, 4)

# No limit - It will Display Bottom Six Records 
tail(employee)

# Limit is 3 - It will Display Bottom Three Records
tail(employee, 3)
Head and Tail Functions

Special Functions of R Read CSV

The following two are very useful functions supported by the R programming while reading CSV files. It is always good to check the structure of the external data before we start manipulating or inserting new records

  • str(Data): This method will return the structure of the records present in the CSV file.
  • summary(Data Frame): This method will return the nature of the external data source and the statistical summary such as Minimum, Median, Mean, Median, etc.
# str and summary functions

# Locate the Current Working Directory
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",")
print(employee)

print(str(employee))
print(summary(employee))
str and summery special methods 7

StringsAsFactor in R Read csv function

If your CSV file contains character and numeric variables, then the character variables are automatically converted to the factors type. To prevent this automatic conversion, we have to specify stringsAsFactors = FALSE explicitly.

# Factors to String

# Locate the Current Working Directory
getwd()

employee <- read.csv("Employee.csv", TRUE, sep = ",", stringsAsFactors = FALSE)
print(employee)

str(employee)

If you observe the screenshot below, it returns FirstName as char, rather than Factor type.

Set StringsAsFactor to false 8