Tutorial Gateway

  • C
  • C#
  • Python
  • SQL
  • Java
  • JS
  • BI Tools
    • Informatica
    • Talend
    • Tableau
    • Power BI
    • SSIS
    • SSRS
    • SSAS
    • MDX
    • R Tutorial
    • Alteryx
    • QlikView
  • More
    • C Programs
    • C++ Programs
    • Go Programs
    • Python Programs
    • Java Programs
  • MySQL

Python pandas DataFrame

Pandas DataFrame in Python is a two dimensional data structure. It means, Pandas DataFrames stores data in a tabular format i.e., rows and columns. In this article, we show how to create Python Pandas DataFrame, access dataFrame, alter DataFrame rows and columns. Next, we will discuss about Transposing DataFrame in Python, Iterating over DataFrame rows so on.

Creating a DataFrame in Python

In real-time, we use this Pandas dataFrame to load data from Sql Server, Text Files, Excel Files or any CSV Files. Next, we slice and dice that data as per our requirements. Once the data is in our required format, we use that data to create reports or charts or graphs using matplotlib module.

Pandas Create Empty DataFrame

This is a simple example to create a DataFrame in Python. Here, we are creating an empty DataFrame

import pandas as pd
 
data = pd.DataFrame()
print(data)
Create an Empty Python Pandas DataFrame

Pandas Create DataFrame from List

Here, we create a list of Python integer values. Next, we used pandas.DataFrame function to create our DataFrame from list or to convert list to DataFrame.

import pandas as pd
 
table = [1, 2, 3, 4, 5]

data = pd.DataFrame(table)
print(data)
Python Pandas DataFrame from List

Creating pandas DataFrame from Mixed List. Here, we are also using multiple rows and Columns.

import pandas as pd
 
table = [[1, 'Suresh'], [2, 'Python'], [3, 'Hello']]

data = pd.DataFrame(table)
print(data)
Python List to DataFrame

Assign the names to column values in a Pandas DataFrame.

import pandas as pd
 
table = [[1, 'Suresh'], [2, 'Python'], [3, 'Hello']]
data = pd.DataFrame(table, columns = ['S.No', 'Name'])
print(data)
Python List to DataFrame 2

Python DataFrame of Random Numbers

To create a Pandas DataFrame using random numbers, we used numpy random function to generate random numbers of size 8 * 4. Next, we used Python DataFrame function to convert those sequence to a DataFrame

import numpy as np
import pandas as pd
 
d_frame = pd.DataFrame(np.random.randn(8, 4))
print(d_frame)
Python DataFrame of Random Numbers

Python Pandas DataFrame from dict

Python pandas allows you to create DataFrame from dict or dictionary. It was pretty much straight forward. All you have to do is, declare a dictionary of different values and then use Python DataFrame function to convert that dictionary to DataFrame

import pandas as pd
 
table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
 
data = pd.DataFrame(table)
print(data)

To create a Pandas DataFrame from dict, the length of Dictionary values of all keys should be the same otherwise, it throws error. Next, if you are passing the index values then they should match the length of key values or arrays otherwise, it raises error. If you haven’t passed any index values then it will automatically create an index for you, and it start from 0 to n-1.

Pandas Create DataFrame from dict 1

Let me take another example of Python pandas DataFrame from Dictionary. This time, we are converting dictionary of four columns of Data to DataFrame.

import pandas as pd
table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data = pd.DataFrame(table)
print(data)
Pandas Create DataFrame from dict 2

Pandas create DataFrame from dict of lists

If you are confused to place everything in one place, divide them into part. Here, we declared four lists of items and then assigned them for each column.

import pandas as pd

names = ['John', 'Mike', 'Suresh', 'Tracy']
ages =  [25, 32, 30, 26]
Professions = ['Developer', 'Analyst', 'Admin', 'HR']
Salaries = [1000000, 1200000, 900000, 1100000]
	      
table = {'name': names,
         'Age': ages,
         'Profession': Professions,
         'Salary': Salaries
         }
	      
data = pd.DataFrame(table)
print(data)
Pandas Create DataFrame from dict 3

Python Pandas DataFrame of Dates

Using Python pandas module, you can also create a DataFrame with series of dates. Let me create a DataFrame of dates from 2019-01-01 to 2019-01-08. By changing the period values, you can generate more number of Date sequence.

import numpy as np
import pandas as pd

dates = pd.date_range('20190101', periods = 8)
print(dates)
print()

d_frame = pd.DataFrame(np.random.randn(8, 4), index = dates,
                       columns = {'apples', 'oranges', 'kiwis', 'bananas'})
print(d_frame)
Python Pandas DataFrame of Dates

Pandas DataFrame Columns

This example shows how to reorder the columns in a DataFrame. By default, DataFrame will use the column order that we used in the actual data. However, you can use the Columns argument to alter the position of any column. Let me change the Age from 2nd position to 4th.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data1 = pd.DataFrame(table)
print(data1)

print('\n---- After Changing the Column Order-----')
data2 = pd.DataFrame(table, columns = ['name', 'Profession', 'Salary', 'Age'])
print(data2)

Please be careful, while using this columns argument. If we specified any non-existing column name or typo mistake will returns NaN. Let me use Qualification column name (which doesn’t exist)

print('\n---- Using Wrong Column -----')
data3 = pd.DataFrame(table, columns = ['name', 'Qualification', 'Salary', 'Age'])
print(data3)
Pandas DataFrame Columns 1

The DataFrame columns attribute returns the list of available columns in a DataFrame in the same order, along with the datatype of a DataFrame

print(data1.columns)
print(data2.columns)
print(data3.columns)
Pandas DataFrame Columns 2

Pandas DataFrame Index

By default, Python will assign the index values from 0 to n-1, where n is the maximum number. However, you have an option to alter those default index values using the index attribute. Here, we using the same and assigning the alphabets from a to d as the index values.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

#Without Index Values - Uses Default Values
data1 = pd.DataFrame(table)
print(data1)

# Index Values are a, b, c, d
data2 = pd.DataFrame(table, index = ['a', 'b', 'c', 'd'])
print('\n----After Setting Index Values----')
print(data2)
Python DataFrame Index 1

In Python, you can use DataFrame set_index function to change or set a column as an index value. Here, we use this DataFrame set_index function not set name as an index. Next, the loc function to show that, we can extra information using index name.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data = pd.DataFrame(table)
print(data)

print('\n---Setting name as an index---')
new_data = data.set_index('name')
print(new_data)

print('\n---Return Index John Details---')
print(new_data.loc['John'])
Python Pandas DataFrame Index 2

Pandas DataFrame Attributes

The list of available attributes of Python DataFrame

Python DataFrame shape attribute

The Pandas DataFrame shape attribute returns the shape or tuple of number of rows and columns in a DataFrame.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Salary':[1000000, 1200000, 900000, 1100000]
	    }

data = pd.DataFrame(table)
print(data)
print('\n---Shape or Size of a DataFrame---')
print(data.shape)
Python DataFrame shape

Python DataFrame values attribute

The DataFrame values attributes returns the DataFrame data (without column names) in a two dimensional array format.

import pandas as pd
table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data2 = pd.DataFrame(table, columns = ['name', 'Profession', 'Salary', 'Age'])

data3 = pd.DataFrame(table, columns = ['name', 'Qualification', 'Salary', 'Age'])

print('---Data2 Values--- ')
print(data2.values)

print('\n---Data3 Values--- ')
print(data3.values)
Python DataFrame values attribute 1

The above pandas dataframe examples are returns an array of type Object. This is because, both these DataFrames has a mixed content (int, string). If that is not the case then it won’t display any dtype inside an array. For this, we used an integer DataFrame

import pandas as pd

table = {'Age': [25, 32, 30, 26],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data4 = pd.DataFrame(table)
print(data4.values)
Pandas DataFrame values attribute 2

Pandas DataFrame name attribute

The Python DataFrame index and the column has a name attribute, which allows to assign a name to an index or column. 

import pandas as pd
table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data1 = pd.DataFrame(table)

table = {'Age': [25, 32, 30, 26],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data4 = pd.DataFrame(table)

data1.index.name = 'Emp No'
print(data1)
print()

data4.index.name = 'Cust No'
print(data4)
Pandas DataFrame name attribute 1

Similarly, we can use columns name attribute to assign name for column headers.

data1.columns.name = 'Employee Details'
print(data1)
 
data4.columns.name = 'Customers Information'
print(data4)
Pandas DataFrame name attribute 2

Python DataFrame dtype attribute

The DataFrame dtype attribute returns the data type of each column in a DataFrame.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[22.55, 12.66, 119.470, 200.190],
         'Salary':[10000, 12000, 9000, 11000]
         }

data = pd.DataFrame(table)
print(data)

print('\n---dtype attribute result---')
print(data.dtypes)
Pandas DataFrame dtype attribute

Python DataFrame describe function

Use this python DataFrame describe function to get a quick statistical information about the DataFrame.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[10000, 12000, 9000, 11000]
         }

data1 = pd.DataFrame(table)
print(data1)

print('\n---describe function result---')
print(data1.describe())
Pandas DataFrame describe function

Access Python DataFrame Data

The data in Python DataFrame is stored in a tabular format of rows and columns. It means, you can access DataFrame items using columns and rows.

Access Pandas DataFrame Columns

You can access the DataFrame columns in two ways, either specifying the column name inside the [] or after a dot notation. Both these methods will returns the specified column as a Series.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[10000, 12000, 9000, 11000]
         }

data1 = pd.DataFrame(table)
data2 = pd.DataFrame(table, columns = ['name', 'Profession', 'Salary', 'Age'])

print('-----Accessing DataFrame Columns-----')
print(data1.Age)
print(data1['name'])
print(data2.Salary)

We can also access multiple DataFrame columns 
print('-----Accessing Multiple DataFrame Columns-----')
print(data1[['Age', 'Profession']])
print(data2[['name', 'Salary']])
Access Pandas DataFrame Columns 1

This is an another example to access pandas DataFrame columns

import pandas as pd
table = {'name': ['Kane', 'John', 'Suresh', 'Tracy', 'Steve'],
         'Age': [35, 25, 32, 30, 29],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 119.470, 200.190, 44.55],
         'Salary':[12000, 10000, 14000, 11000, 14000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---Select name column from DataFrame---')
print(data['name'])

print('\n---Select Profession and Sale column from DataFrame---')
print(data[['Profession', 'Sale']])

print('\n---Select Profession column from DataFrame---')
print(data.Profession)
Access Pandas DataFrame Columns 2

Access Pandas DataFrame Rows

A Pandas DataFrame in Python can also be accessed using rows. Here, we are using the index slicing technique to returns the required rows from a DataFrame. Here, data[1:] returns all the rows from index 1 to n-1, and data[1:3] returns rows from index 1 to 3.

import pandas as pd
table = {'name': ['Kane', 'John', 'Suresh', 'Tracy', 'Steve'],
         'Age': [35, 25, 32, 30, 29],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 119.470, 200.190, 44.55],
         'Salary':[12000, 10000, 14000, 11000, 14000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---Select all rows from 1 to N in a DataFrame---')
print(data[1:])

print('\n---Select rows from 1 to 2 in a DataFrame---')
print(data[1:3])

print('\n---Select rows from 0 to 3 in a DataFrame---')
print(data[0:4])

print('\n---Select last row in a DataFrame---')
print(data[-1:])
Access Pandas DataFrame Rows

Pandas DataFrame loc

A Pandas DataFrame loc is one of the important thing to understand. You can use the DataFrame loc[] to select more than one column and more than one row at a time. Or, use this Pandas dataFrame loc[] to select a portion of a DataFrame.

Use this loc to select rows from a DataFrame.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data = pd.DataFrame(table, index = ['a', 'b', 'c', 'd'])
print(data)

print('\n---Select b row from a DataFrame---')
print(data.loc['b'])

print('\n---Select c row from a DataFrame---')
print(data.loc['c'])

print('\n---Select b and d rows from a DataFrame---')
print(data.loc[['b', 'd']])
Pandas DataFrame loc 1

The first statement, data.loc[:, [‘name’, ‘Sale’]] returns all the rows of name and sale column. Within the last statement, data.loc[1:3, [‘name’, ‘Profession’, ‘Salary’]] returns rows from index value 1 to 3 for the columns of name, profession and Salary.

import pandas as pd
table = {'name': ['Kane', 'John', 'Suresh', 'Tracy', 'Steve'],
         'Age': [35, 25, 32, 30, 29],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 119.470, 200.190, 44.55],
         'Salary':[12000, 10000, 14000, 11000, 14000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---Select name, Sale column in a DataFrame---')
print(data.loc[:, ['name', 'Sale']])

print('\n---Select name, Profession, Salary in a DataFrame---')
print(data.loc[:, ['name', 'Profession', 'Salary']])

print('\n---Select rows from 1 to 2 in a DataFrame---')
print(data.loc[1:3, ['name', 'Profession', 'Salary']])
Python Pandas DataFrame loc 2

Pandas DataFrame iloc

Similar to loc[], Python Pandas DataFrame has iloc[]. However, this will only accept integer values or index to return data from a DataFrame.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data = pd.DataFrame(table, index = ['a', 'b', 'c', 'd'])
print(data)

print('\n---Select 1st row from a DataFrame---')
print(data.iloc[1])

print('\n---Select 3rd row from a DataFrame---')
print(data.iloc[3])

print('\n---Select 1 and 3 rows from a DataFrame---')
print(data.iloc[[1, 3]])
Pandas DataFrame iloc 1

You can use loc, iloc, at and iat to extract or access a single value from a DataFrame. The following example will show you the same.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data = pd.DataFrame(table)
print(data)

print('\nitem at 0, 0 in DataFrame   = ', data.iloc[0][0])

print('item at 0, 1 in DataFrame      = ', data.loc[0][1])

print('item at 1, Profession is       = ', data.loc[1]['Profession'])

print('item at 2, 3 in DataFrame      = ', data.iat[2, 3])

print('item at 0, Salary in DataFrame = ', data.at[0, 'Salary'])
Python Pandas DataFrame iloc 2

Add New Column to Pandas DataFrame

In this example, we will show you, how to add a new column to an existing DataFrame. data[‘Sale’] = [422.19, 200.190, 44.55] adds completely new column called Sale. data[‘Income’] = data[‘Salary’] + data[‘basic’] adds new column Income by adding values in Salary column and basic column.

import pandas as pd

table = {'name': ['Kane', 'Suresh', 'Tracy'],
         'Age': [35, 25, 29],
         'Profession': ['Manager', 'Developer', 'HR'],
         'Salary': [10000, 14000, 11000],
         'basic': [4000, 6000, 4500]
        }

data = pd.DataFrame(table)
print(data)

# Add New Column to DataFrame
data['Sale'] = [422.19, 200.190, 44.55]
print('\n---After adding New Column DataFrame---')
print(data)

# Add New Column using existing
data['Income'] = data['Salary'] + data['basic']
print('\n---Total Salary in a DataFrame---')
print(data)

# Add New Calculated Column to DataFrame
data['New_Salary'] = data['Salary'] + data['Salary'] * 0.25
print('\n---After adding New Column DataFrame---')
print(data)
Add New Column to Pandas DataFrame

Delete Column from a DataFrame in Python

In Python, there are two ways to delete a column from a Pandas DataFrame. Either you can use del function or pop function. In this example, we are going to use both these function to delete columns from Pandas DataFrame.

Here, del(data[‘basic’]) deletes basic column (complete rows belong to basic column) from DataFrame. x = data.pop(‘Age’) deletes or pops Age column from DataFrame, and we are printing that popped column as well. Next, we used the Pandas DataFrame drop function to delete Sale column.

import pandas as pd

table = {'name': ['Kane', 'Suresh', 'Tracy'],
         'Age': [35, 25, 29],
         'Profession': ['Manager', 'Developer', 'HR'],
         'Salary': [10000, 14000, 11000],
         'basic': [4000, 6000, 4500],
         'Sale': [422.19, 200.190, 44.55]
        }

data = pd.DataFrame(table)
print(data)

# Delete existing Columns from DataFrame
del(data['basic'])
print('\n---After Deleting basic Column DataFrame---')
print(data)

x = data.pop('Age')
print('\n---After Deleting Age Column DataFrame---')
print(data)
print('\n---pop Column from DataFrame---')
print(x)

y = data.drop(columns = 'Sale')
print('\n---After Deleting Sale Column DataFrame---')
print(y)
Delete Column from a DataFrame in Python

Delete DataFrame Row in Python

In this Python example, we are using the Pandas drop function to delete DataFrame row.

import pandas as pd

table = {'name': ['Kane', 'Suresh', 'Tracy'],
         'Profession': ['Manager', 'Developer', 'HR'],
         'Salary': [10000, 14000, 11000],
         'Sale': [422.19, 200.190, 44.55]
        }

data = pd.DataFrame(table, index = ['a', 'b', 'c'])
print(data)

x = data.drop('b')
print('\n---After Deleting b row DataFrame---')
print(x)

y = data.drop('a')
print('\n---After Deleting a row DataFrame---')
print(y)
Delete Pandas DataFrame Row in Python

Rename Pandas DataFrame Column

In Python, use Pandas rename function to rename a column or multiple columns of a DataFrame. Here, we use this Pandas DataFrame rename function to rename Profession column to Qualification and Salary to Income.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data = pd.DataFrame(table)
print(data)

# data = data.rename(columns = {'Profession': 'Qualification'})
data.rename(columns = {'Profession': 'Qualification'}, inplace = True)
print('\n---After Renaming Column in a DataFrame---')
print(data)

data.rename(columns =
                {'Profession': 'Qualification',
                'Salary': 'Income'},
            inplace = True)
print('\n---After Renaming two Column in a DataFrame---')
print(data)
Rename Pandas DataFrame Column

Python pandas head and tail

If you are coming from R programming, you might be familiar with head and tail functions. The Pandas DataFrame head function accepts integer value as an argument and returns Top or first given number of records. For example, head(5) returns Top 5 records from a dataFrame. Similarly, Python DataFrame tail function returns bottom or last records from a DataFrame. For example, tail(5) returns last 5 records or bottom 5 records from a DataFrame.

import pandas as pd
table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy', 'Steve'],
         'Age': [35, 25, 32, 30, 26, 29],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR', 'HOD'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190, 44.55],
         'Salary':[12000, 10000, 8000, 14000, 11000, 14000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---First Five records DataFrame head()---')
print(data.head())

print('\n---First two records DataFrame head(2)---')
print(data.head(2))

print('\n---Bottom Five records DataFrame tail()---')
print(data.tail())

print('\n---last two records DataFrame tail(2)---')
print(data.tail(2))
Python pandas head and tail

Pandas Transpose DataFrame in Python

Python DataFrame has inbuilt functionality to transpose a Matrix. For this, you have to use DataFrame.T

import pandas as pd
table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 32, 30, 26],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190],
         'Salary':[12000, 10000, 8000, 14000, 11000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---Transposed DataFrame---')
print(data.T)
Pandas Transpose DataFrame in Python

Python DataFrame groupby

A Python DataFrame groupby function is similar to Group By clause in Sql Server. I mean, you can use this Pandas groupby function to group data by some columns and find the aggregated results of the other columns. This is one of the important concept or function, while working with real-time data. 

In this example, we created a DataFrame of different columns and data types. Next, we used this groupby function on that DataFrame. The first statement, data.groupby(‘Profession’).sum() groups DataFrame by Profession column and calculate the sum of Sales, Salary and Age. The second statement, data.groupby([‘Profession’, ‘Age’]).sum() groups DataFrame by Profession and Age columns and calculate the sum of Sales, and Salary. Remember, any string columns (unable to aggregate) will be concatenated or combined.

import pandas as pd
table = {'name': ['Kane', 'Dave', 'Ram', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 25, 35, 25, 35, 35],
         'Profession': ['Analyst', 'HR', 'Analyst', 'Admin', 'HR', 'Admin', 'HR'],
         'Sale':[422, 22, 55, 12, 119, 470, 200],
         'Salary':[12000, 9000, 10000, 8000, 14000, 20000, 11000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---DataFrame groupby Profession---')
print(data.groupby('Profession').sum())

print('\n---DataFrame groupby Profession and Age---')
print(data.groupby(['Profession', 'Age']).sum())
Python Pandas DataFrame groupby

Python DataFrame stack

A Python Pandas DataFrame stack function is used to compress one level of a DataFrame object. In order to use this DataFrame stack function, you can simply call data_to_stack.stack(). In this example, we are using this Python DataFrame stack function on grouped data (groupby function result) to further compress the DataFrame.

import pandas as pd
table = {'name': ['Kane', 'Dave', 'Ram', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 25, 35, 25, 35, 35],
         'Profession': ['Analyst', 'HR', 'Analyst', 'Admin', 'HR', 'Admin', 'HR'],
         'Sale':[422, 22, 55, 12, 119, 470, 200],
         'Salary':[12000, 9000, 10000, 8000, 14000, 20000, 11000]
	    }
data = pd.DataFrame(table)

grouped_data1 = data.groupby('Profession').sum()
stacked_data1 = grouped_data1.stack()
print('\n---Stacked DataFrame groupby Profession---')
print(stacked_data1)

grouped_data2 = data.groupby(['Profession', 'Age']).sum()
stacked_data2 = grouped_data2.stack()
print('\n---Stacked DataFrame groupby Profession and Age---')
print(stacked_data2)
Python Pandas DataFrame stack

Python DataFrame unstack

The DataFrame unstack function undo the operation done by stack function or say, opposite to stack function. This Python DataFrame unstack function uncompress the last column of a stacked DataFrame (.stack() function). In order to use this function, you can simply call stacked_data.unstack()

import pandas as pd
table = {'name': ['Kane', 'Dave', 'Ram', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 25, 35, 25, 35, 35],
         'Profession': ['Analyst', 'HR', 'Analyst', 'Admin', 'HR', 'Admin', 'HR'],
         'Sale':[422, 22, 55, 12, 119, 470, 200],
         'Salary':[12000, 9000, 10000, 8000, 14000, 20000, 11000]
	    }
data = pd.DataFrame(table)

grouped_data1 = data.groupby('Profession').sum()
stacked_data1 = grouped_data1.stack()
unstacked_data1 = stacked_data1.unstack()
# print('\n---Stacked DataFrame groupby Profession---')
# print(stacked_data1)
print('\n---Unstacked DataFrame groupby Profession---')
print(unstacked_data1)

grouped_data2 = data.groupby(['Profession', 'Age']).sum()
stacked_data2 = grouped_data2.stack()
unstacked_data2 = stacked_data2.unstack()
# print('\n---Stacked DataFrame groupby Profession and Age---')
# print(stacked_data2)
print('\n---Unstacked DataFrame groupby Profession and Age---')
print(unstacked_data2)
Python Pandas DataFrame unstack

Python DataFrame Concatenation

A Pandas DataFrame concat function is used to combine or concatenate DataFrame objects. First, we declared two DataFrames of random values of a size 4 * 6. Next, we used concat function to concatenate DataFrames

import pandas as pd
import numpy as np

dataframe_one = pd.DataFrame(np.random.randn(4, 6))
print(dataframe_one)

dataframe_two = pd.DataFrame(np.random.randn(4, 6))
print(dataframe_two)

print('\n---DataFrame concatenation---')
print(pd.concat([dataframe_one, dataframe_two]))
Python Pandas DataFrame Concatenation 1

In the above example, we are concatenating two dataFrame objects of same size. However, you can use this Python Pandas DataFrame concat function to concatenate or combines more than two DataFrame Objects and different size. For this, we used three different size DataFrames of randomly generated numbers. Next, we used the Python DataFrame concat function to concat those three objects.

import numpy as np
import pandas as pd

dataframe_one = pd.DataFrame(np.random.randn(4, 6))
print(dataframe_one)

dataframe_two = pd.DataFrame(np.random.randn(4, 5))
print(dataframe_two)

dataframe_three = pd.DataFrame(np.random.randn(3, 4))
print(dataframe_three)

print('\n-----DataFrame concatenation-----')
print(pd.concat([dataframe_one, dataframe_two, dataframe_three]))
Pandas DataFrame Concatenation 2

Python Math operations on DataFrame

In this example, we use few of the Python Pandas DataFrame mathematical functions. For the bands dataframe demo purpose, we are finding the Mean and Median of each column and each Row. To get the mean or median of each row, you have to place integer 1 inside the function.

import pandas as pd
table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 32, 30, 26],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190],
         'Salary':[12000, 10000, 8000, 14000, 11000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---DataFrame Mean of Columns---')
print(data.mean())

print('\n---DataFrame Mean of Rows---')
print(data.mean(1))

print('\n---DataFrame Median of Columns---')
print(data.median())

print('\n---DataFrame Median of Rows---')
print(data.median(1))
Python Math operations on Pandas DataFrame

We are calculating the sum of all the rows of each column, sum of all columns in each row. Similarly, minimum value in a column, maximum value in each column, maximum value in each row using sum(), min() and max() functions.

import pandas as pd
table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 32, 30, 26],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190],
         'Salary':[12000, 10000, 8000, 14000, 11000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---DataFrame sum of Columns---')
print(data.sum())

print('\n---DataFrame sum of Rows---')
print(data.sum(1))

print('\n---DataFrame Minimum of Columns---')
print(data.min())

print('\n---DataFrame Maximum of Columns---')
print(data.max())

print('\n---DataFrame Maximum of Rows---')
print(data.max(1))
Pandas Math operations on DataFrame

Arithmetic Operations on Python Pandas DataFrame

We will perform Arithmetic operations on Python DataFrame

import pandas as pd
table = {'Age': [25, 32, 30],
         'Sale':[422.19, 119.470, 200.190],
         'Salary':[12000, 14000, 11000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---Add 20 to DataFrame---')
print(data + 20)

print('\n---Subtract 10 from DataFrame---')
print(data - 10)

print('\n---Multiply DataFrame by 2---')
print(data * 2)
Arithmetic Operations on Pandas DataFrame

Python Pandas DataFrame Nulls

The isnull check and returns True if a value in DataFrame is Null otherwise False. Pandas notnull function returns True if value is not Null otherwise, False is returned.

import pandas as pd
import numpy as np

table = {'name': ['Kane', 'Suresh', np.nan],
         'Profession': ['Manager', np.nan, 'HR'],
         'Salary': [np.nan, 14000, 11000],
         'Sale': [422.19, np.nan, 44.55]
        }

data = pd.DataFrame(table)
print(data)

print('\n---Checking Nulls in a DataFrame---')
print(data.isnull())

print('\n---Checking Not Nulls in a DataFrame---')
print(data.notnull())
Python Pandas DataFrame Nulls

Replace Nulls in DataFrame

We can also replace those Null values with a meaningful numbers. For this, use Python DataFrame fillna function or replace function.

import pandas as pd
import numpy as np

table = {'Age': [20, 35, np.nan],
         'Salary': [np.nan, 14000, 11000],
         'Sale': [422.19, np.nan, 44.55]
        }

data = pd.DataFrame(table)
print(data)

print('\n---Fill Missing Values DataFrame---')
print(data.fillna(30))

print('\n---Replace Missing Values DataFrame---')
print(data.replace({np.nan:66}))
Pandas Replace DataFrame Nulls

Pandas DataFrame pivot

The DataFrame has a pivot function, which is very useful to pivot the existing DataFrame.

import pandas as pd

table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 32, 30, 26],
         'Profession': ['Manager', 'HR', 'Analyst', 'Manager', 'HR'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190],
         'Salary':[12000, 10000, 8000, 14000, 11000]
	    }
data = pd.DataFrame(table)
print(data)
print('\n--- After DataFrame Pivot---')
data2 = data.pivot(index = 'name', columns = 'Profession', values = 'Salary')
print(data2)

print('\n--- After DataFrame Pivot---')
data3 = data.pivot(index = 'name', columns = 'Profession')
print(data3)
Pandas DataFrame pivot

Save DataFrame to CSV and Text File

To load data from a Pandas DataFrame to a csv file or text file, you have to use the Pandas to_csv function.

import pandas as pd

table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 32, 30, 26],
         'Profession': ['Manager', 'HR', 'Analyst', 'Manager', 'HR'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190],
         'Salary':[12000, 10000, 8000, 14000, 11000]
	    }
data = pd.DataFrame(table)
print(data)
# load DataFrame to text file
data.to_csv('user_info.txt') 
# load DataFrame to csv file with comma separator
data.to_csv('user_info.csv')
# load data from DataFrame to csv file with Tab separator
data.to_csv('user_info_new.csv', sep = '\t')
Pandas Save DataFrame to CSV and Text File

Iterate over DataFrame Rows

In Python, use any of the three functions iteritems, iterrows and itertuple to iterate over rows and returns each row of a DataFrame.

import pandas as pd

table = {'name': ['Kane', 'John', 'Mike'],
         'Age': [35, 25, 32],
         'Profession': ['Manager', 'HR', 'Analyst'],
         'Sale':[422.19, 119.470, 200.190],
         'Salary':[12000, 14000, 11000]
	    }
data = pd.DataFrame(table)
print(data)
print('\n---Iterating Rows---')
for rows, columns in data.iterrows():
    print(rows, columns)
    print()
Pandas Iterate over DataFrame Rows

Filed Under: Python

  • Download and Install Python
  • Python Arithmetic Operators
  • Python Assignment Operators
  • Python Bitwise Operators
  • Python Comparison Operators
  • Python Logical Operators
  • Python If Statement
  • Python If Else
  • Python Elif Statement
  • Python Nested If
  • Python For Loop
  • Python While Loop
  • Python Break
  • Python Continue
  • Python Dictionary
  • Python datetime
  • Python String
  • Python Set
  • Python Tuple
  • Python List
  • Python List Comprehensions
  • Python Lambda Function
  • Python Functions
  • Python Types of Functions
  • Python Iterator
  • Python File Handling
  • Python Directory
  • Python Class
  • Python classmethod
  • Python Inheritance
  • Python Method Overriding
  • Python Static Method
  • Connect Python and SQL Server
  • Python SQL Create DB
  • Python SQL Select Top
  • Python SQL Where Clause
  • Python SQL Order By
  • Python SQL Select Statement
  • Python len Function
  • Python max Function
  • Python map Function
  • Python print Function
  • Python sort Function
  • Python range Function
  • Python zip Function
  • Python Math Functions
  • Python String Functions
  • Python List Functions
  • Python NumPy Array
  • NumPy Aggregate Functions
  • NumPy Arithmetic Operations
  • Python Numpy Bitwise operators
  • Numpy Comparison Operators
  • Numpy Exponential Functions
  • Python Numpy logical operators
  • Python numpy String Functions
  • NumPy Trigonometric Functions
  • Python random Array
  • Python numpy concatenate
  • Python numpy Array shape
  • Python pandas DataFrame
  • Pandas DataFrame plot
  • Python Series
  • Python matplotlib Histogram
  • Python matplotlib Scatter Plot
  • Python matplotlib Pie Chart
  • Python matplotlib Bar Chart
  • Python List Length
  • Python sort List Function
  • Python String Concatenation
  • Python String Length
  • Python substring
  • Python Programming Examples

Copyright © 2021· All Rights Reserved by Suresh.
About | Contact | Privacy Policy