Pandas DataFrame in Python is a two dimensional data structure. It means, Pandas DataFrames stores data in a tabular format i.e., rows and columns. In this article, we show how to create Python Pandas DataFrame, access dataFrame, alter DataFrame rows and columns. Next, we will discuss about Transposing DataFrame in Python, Iterating over DataFrame rows so on.
Creating a DataFrame in Python
In real-time, we use this Pandas dataFrame to load data from Sql Server, Text Files, Excel Files or any CSV Files. Next, we slice and dice that data as per our requirements. Once the data is in our required format, we use that data to create reports or charts or graphs using matplotlib module.
Pandas Create Empty DataFrame
This is a simple example to create a DataFrame in Python. Here, we are creating an empty DataFrame
import pandas as pd data = pd.DataFrame() print(data)

Pandas Create DataFrame from List
Here, we create a list of Python integer values. Next, we used pandas.DataFrame function to create our DataFrame from list or to convert list to DataFrame.
import pandas as pd table = [1, 2, 3, 4, 5] data = pd.DataFrame(table) print(data)

Creating pandas DataFrame from Mixed List. Here, we are also using multiple rows and Columns.
import pandas as pd table = [[1, 'Suresh'], [2, 'Python'], [3, 'Hello']] data = pd.DataFrame(table) print(data)

Assign the names to column values in a Pandas DataFrame.
import pandas as pd table = [[1, 'Suresh'], [2, 'Python'], [3, 'Hello']] data = pd.DataFrame(table, columns = ['S.No', 'Name']) print(data)

Python DataFrame of Random Numbers
To create a Pandas DataFrame using random numbers, we used numpy random function to generate random numbers of size 8 * 4. Next, we used Python DataFrame function to convert those sequence to a DataFrame
import numpy as np import pandas as pd d_frame = pd.DataFrame(np.random.randn(8, 4)) print(d_frame)

Python Pandas DataFrame from dict
Python pandas allows you to create DataFrame from dict or dictionary. It was pretty much straight forward. All you have to do is, declare a dictionary of different values and then use Python DataFrame function to convert that dictionary to DataFrame
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Salary':[1000000, 1200000, 900000, 1100000] } data = pd.DataFrame(table) print(data)
To create a Pandas DataFrame from dict, the length of Dictionary values of all keys should be the same otherwise, it throws error. Next, if you are passing the index values then they should match the length of key values or arrays otherwise, it raises error. If you haven’t passed any index values then it will automatically create an index for you, and it start from 0 to n-1.

Let me take another example of Python pandas DataFrame from Dictionary. This time, we are converting dictionary of four columns of Data to DataFrame.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[1000000, 1200000, 900000, 1100000] } data = pd.DataFrame(table) print(data)

Pandas create DataFrame from dict of lists
If you are confused to place everything in one place, divide them into part. Here, we declared four lists of items and then assigned them for each column.
import pandas as pd names = ['John', 'Mike', 'Suresh', 'Tracy'] ages = [25, 32, 30, 26] Professions = ['Developer', 'Analyst', 'Admin', 'HR'] Salaries = [1000000, 1200000, 900000, 1100000] table = {'name': names, 'Age': ages, 'Profession': Professions, 'Salary': Salaries } data = pd.DataFrame(table) print(data)

Python Pandas DataFrame of Dates
Using Python pandas module, you can also create a DataFrame with series of dates. Let me create a DataFrame of dates from 2019-01-01 to 2019-01-08. By changing the period values, you can generate more number of Date sequence.
import numpy as np import pandas as pd dates = pd.date_range('20190101', periods = 8) print(dates) print() d_frame = pd.DataFrame(np.random.randn(8, 4), index = dates, columns = {'apples', 'oranges', 'kiwis', 'bananas'}) print(d_frame)

Pandas DataFrame Columns
This example shows how to reorder the columns in a DataFrame. By default, DataFrame will use the column order that we used in the actual data. However, you can use the Columns argument to alter the position of any column. Let me change the Age from 2nd position to 4th.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[1000000, 1200000, 900000, 1100000] } data1 = pd.DataFrame(table) print(data1) print('\n---- After Changing the Column Order-----') data2 = pd.DataFrame(table, columns = ['name', 'Profession', 'Salary', 'Age']) print(data2)
Please be careful, while using this columns argument. If we specified any non-existing column name or typo mistake will returns NaN. Let me use Qualification column name (which doesn’t exist)
print('\n---- Using Wrong Column -----') data3 = pd.DataFrame(table, columns = ['name', 'Qualification', 'Salary', 'Age']) print(data3)

The DataFrame columns attribute returns the list of available columns in a DataFrame in the same order, along with the datatype of a DataFrame
print(data1.columns) print(data2.columns) print(data3.columns)

Pandas DataFrame Index
By default, Python will assign the index values from 0 to n-1, where n is the maximum number. However, you have an option to alter those default index values using the index attribute. Here, we using the same and assigning the alphabets from a to d as the index values.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[1000000, 1200000, 900000, 1100000] } #Without Index Values - Uses Default Values data1 = pd.DataFrame(table) print(data1) # Index Values are a, b, c, d data2 = pd.DataFrame(table, index = ['a', 'b', 'c', 'd']) print('\n----After Setting Index Values----') print(data2)

In Python, you can use DataFrame set_index function to change or set a column as an index value. Here, we use this DataFrame set_index function not set name as an index. Next, the loc function to show that, we can extra information using index name.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[1000000, 1200000, 900000, 1100000] } data = pd.DataFrame(table) print(data) print('\n---Setting name as an index---') new_data = data.set_index('name') print(new_data) print('\n---Return Index John Details---') print(new_data.loc['John'])

Pandas DataFrame Attributes
The list of available attributes of Python DataFrame
Python DataFrame shape attribute
The Pandas DataFrame shape attribute returns the shape or tuple of number of rows and columns in a DataFrame.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Salary':[1000000, 1200000, 900000, 1100000] } data = pd.DataFrame(table) print(data) print('\n---Shape or Size of a DataFrame---') print(data.shape)

Python DataFrame values attribute
The DataFrame values attributes returns the DataFrame data (without column names) in a two dimensional array format.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[1000000, 1200000, 900000, 1100000] } data2 = pd.DataFrame(table, columns = ['name', 'Profession', 'Salary', 'Age']) data3 = pd.DataFrame(table, columns = ['name', 'Qualification', 'Salary', 'Age']) print('---Data2 Values--- ') print(data2.values) print('\n---Data3 Values--- ') print(data3.values)

The above pandas dataframe examples are returns an array of type Object. This is because, both these DataFrames has a mixed content (int, string). If that is not the case then it won’t display any dtype inside an array. For this, we used an integer DataFrame
import pandas as pd table = {'Age': [25, 32, 30, 26], 'Salary':[1000000, 1200000, 900000, 1100000] } data4 = pd.DataFrame(table) print(data4.values)

Pandas DataFrame name attribute
The Python DataFrame index and the column has a name attribute, which allows to assign a name to an index or column.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[1000000, 1200000, 900000, 1100000] } data1 = pd.DataFrame(table) table = {'Age': [25, 32, 30, 26], 'Salary':[1000000, 1200000, 900000, 1100000] } data4 = pd.DataFrame(table) data1.index.name = 'Emp No' print(data1) print() data4.index.name = 'Cust No' print(data4)

Similarly, we can use columns name attribute to assign name for column headers.
data1.columns.name = 'Employee Details' print(data1) data4.columns.name = 'Customers Information' print(data4)

Python DataFrame dtype attribute
The DataFrame dtype attribute returns the data type of each column in a DataFrame.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Sale':[22.55, 12.66, 119.470, 200.190], 'Salary':[10000, 12000, 9000, 11000] } data = pd.DataFrame(table) print(data) print('\n---dtype attribute result---') print(data.dtypes)

Python DataFrame describe function
Use this python DataFrame describe function to get a quick statistical information about the DataFrame.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[10000, 12000, 9000, 11000] } data1 = pd.DataFrame(table) print(data1) print('\n---describe function result---') print(data1.describe())

Access Python DataFrame Data
The data in Python DataFrame is stored in a tabular format of rows and columns. It means, you can access DataFrame items using columns and rows.
Access Pandas DataFrame Columns
You can access the DataFrame columns in two ways, either specifying the column name inside the [] or after a dot notation. Both these methods will returns the specified column as a Series.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[10000, 12000, 9000, 11000] } data1 = pd.DataFrame(table) data2 = pd.DataFrame(table, columns = ['name', 'Profession', 'Salary', 'Age']) print('-----Accessing DataFrame Columns-----') print(data1.Age) print(data1['name']) print(data2.Salary) We can also access multiple DataFrame columns print('-----Accessing Multiple DataFrame Columns-----') print(data1[['Age', 'Profession']]) print(data2[['name', 'Salary']])

This is an another example to access pandas DataFrame columns
import pandas as pd table = {'name': ['Kane', 'John', 'Suresh', 'Tracy', 'Steve'], 'Age': [35, 25, 32, 30, 29], 'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'], 'Sale':[422.19, 22.55, 119.470, 200.190, 44.55], 'Salary':[12000, 10000, 14000, 11000, 14000] } data = pd.DataFrame(table) print(data) print('\n---Select name column from DataFrame---') print(data['name']) print('\n---Select Profession and Sale column from DataFrame---') print(data[['Profession', 'Sale']]) print('\n---Select Profession column from DataFrame---') print(data.Profession)

Access Pandas DataFrame Rows
A Pandas DataFrame in Python can also be accessed using rows. Here, we are using the index slicing technique to returns the required rows from a DataFrame. Here, data[1:] returns all the rows from index 1 to n-1, and data[1:3] returns rows from index 1 to 3.
import pandas as pd table = {'name': ['Kane', 'John', 'Suresh', 'Tracy', 'Steve'], 'Age': [35, 25, 32, 30, 29], 'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'], 'Sale':[422.19, 22.55, 119.470, 200.190, 44.55], 'Salary':[12000, 10000, 14000, 11000, 14000] } data = pd.DataFrame(table) print(data) print('\n---Select all rows from 1 to N in a DataFrame---') print(data[1:]) print('\n---Select rows from 1 to 2 in a DataFrame---') print(data[1:3]) print('\n---Select rows from 0 to 3 in a DataFrame---') print(data[0:4]) print('\n---Select last row in a DataFrame---') print(data[-1:])

Pandas DataFrame loc
A Pandas DataFrame loc is one of the important thing to understand. You can use the DataFrame loc[] to select more than one column and more than one row at a time. Or, use this Pandas dataFrame loc[] to select a portion of a DataFrame.
Use this loc to select rows from a DataFrame.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[1000000, 1200000, 900000, 1100000] } data = pd.DataFrame(table, index = ['a', 'b', 'c', 'd']) print(data) print('\n---Select b row from a DataFrame---') print(data.loc['b']) print('\n---Select c row from a DataFrame---') print(data.loc['c']) print('\n---Select b and d rows from a DataFrame---') print(data.loc[['b', 'd']])

The first statement, data.loc[:, [‘name’, ‘Sale’]] returns all the rows of name and sale column. Within the last statement, data.loc[1:3, [‘name’, ‘Profession’, ‘Salary’]] returns rows from index value 1 to 3 for the columns of name, profession and Salary.
import pandas as pd table = {'name': ['Kane', 'John', 'Suresh', 'Tracy', 'Steve'], 'Age': [35, 25, 32, 30, 29], 'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'], 'Sale':[422.19, 22.55, 119.470, 200.190, 44.55], 'Salary':[12000, 10000, 14000, 11000, 14000] } data = pd.DataFrame(table) print(data) print('\n---Select name, Sale column in a DataFrame---') print(data.loc[:, ['name', 'Sale']]) print('\n---Select name, Profession, Salary in a DataFrame---') print(data.loc[:, ['name', 'Profession', 'Salary']]) print('\n---Select rows from 1 to 2 in a DataFrame---') print(data.loc[1:3, ['name', 'Profession', 'Salary']])

Pandas DataFrame iloc
Similar to loc[], Python Pandas DataFrame has iloc[]. However, this will only accept integer values or index to return data from a DataFrame.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[1000000, 1200000, 900000, 1100000] } data = pd.DataFrame(table, index = ['a', 'b', 'c', 'd']) print(data) print('\n---Select 1st row from a DataFrame---') print(data.iloc[1]) print('\n---Select 3rd row from a DataFrame---') print(data.iloc[3]) print('\n---Select 1 and 3 rows from a DataFrame---') print(data.iloc[[1, 3]])

You can use loc, iloc, at and iat to extract or access a single value from a DataFrame. The following example will show you the same.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[1000000, 1200000, 900000, 1100000] } data = pd.DataFrame(table) print(data) print('\nitem at 0, 0 in DataFrame = ', data.iloc[0][0]) print('item at 0, 1 in DataFrame = ', data.loc[0][1]) print('item at 1, Profession is = ', data.loc[1]['Profession']) print('item at 2, 3 in DataFrame = ', data.iat[2, 3]) print('item at 0, Salary in DataFrame = ', data.at[0, 'Salary'])

Add New Column to Pandas DataFrame
In this example, we will show you, how to add a new column to an existing DataFrame. data[‘Sale’] = [422.19, 200.190, 44.55] adds completely new column called Sale. data[‘Income’] = data[‘Salary’] + data[‘basic’] adds new column Income by adding values in Salary column and basic column.
import pandas as pd table = {'name': ['Kane', 'Suresh', 'Tracy'], 'Age': [35, 25, 29], 'Profession': ['Manager', 'Developer', 'HR'], 'Salary': [10000, 14000, 11000], 'basic': [4000, 6000, 4500] } data = pd.DataFrame(table) print(data) # Add New Column to DataFrame data['Sale'] = [422.19, 200.190, 44.55] print('\n---After adding New Column DataFrame---') print(data) # Add New Column using existing data['Income'] = data['Salary'] + data['basic'] print('\n---Total Salary in a DataFrame---') print(data) # Add New Calculated Column to DataFrame data['New_Salary'] = data['Salary'] + data['Salary'] * 0.25 print('\n---After adding New Column DataFrame---') print(data)

Delete Column from a DataFrame in Python
In Python, there are two ways to delete a column from a Pandas DataFrame. Either you can use del function or pop function. In this example, we are going to use both these function to delete columns from Pandas DataFrame.
Here, del(data[‘basic’]) deletes basic column (complete rows belong to basic column) from DataFrame. x = data.pop(‘Age’) deletes or pops Age column from DataFrame, and we are printing that popped column as well. Next, we used the Pandas DataFrame drop function to delete Sale column.
import pandas as pd table = {'name': ['Kane', 'Suresh', 'Tracy'], 'Age': [35, 25, 29], 'Profession': ['Manager', 'Developer', 'HR'], 'Salary': [10000, 14000, 11000], 'basic': [4000, 6000, 4500], 'Sale': [422.19, 200.190, 44.55] } data = pd.DataFrame(table) print(data) # Delete existing Columns from DataFrame del(data['basic']) print('\n---After Deleting basic Column DataFrame---') print(data) x = data.pop('Age') print('\n---After Deleting Age Column DataFrame---') print(data) print('\n---pop Column from DataFrame---') print(x) y = data.drop(columns = 'Sale') print('\n---After Deleting Sale Column DataFrame---') print(y)

Delete DataFrame Row in Python
In this Python example, we are using the Pandas drop function to delete DataFrame row.
import pandas as pd table = {'name': ['Kane', 'Suresh', 'Tracy'], 'Profession': ['Manager', 'Developer', 'HR'], 'Salary': [10000, 14000, 11000], 'Sale': [422.19, 200.190, 44.55] } data = pd.DataFrame(table, index = ['a', 'b', 'c']) print(data) x = data.drop('b') print('\n---After Deleting b row DataFrame---') print(x) y = data.drop('a') print('\n---After Deleting a row DataFrame---') print(y)

Rename Pandas DataFrame Column
In Python, use Pandas rename function to rename a column or multiple columns of a DataFrame. Here, we use this Pandas DataFrame rename function to rename Profession column to Qualification and Salary to Income.
import pandas as pd table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'], 'Age': [25, 32, 30, 26], 'Profession': ['Developer', 'Analyst', 'Admin', 'HR'], 'Salary':[1000000, 1200000, 900000, 1100000] } data = pd.DataFrame(table) print(data) # data = data.rename(columns = {'Profession': 'Qualification'}) data.rename(columns = {'Profession': 'Qualification'}, inplace = True) print('\n---After Renaming Column in a DataFrame---') print(data) data.rename(columns = {'Profession': 'Qualification', 'Salary': 'Income'}, inplace = True) print('\n---After Renaming two Column in a DataFrame---') print(data)

Python pandas head and tail
If you are coming from R programming, you might be familiar with head and tail functions. The Pandas DataFrame head function accepts integer value as an argument and returns Top or first given number of records. For example, head(5) returns Top 5 records from a dataFrame. Similarly, Python DataFrame tail function returns bottom or last records from a DataFrame. For example, tail(5) returns last 5 records or bottom 5 records from a DataFrame.
import pandas as pd table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy', 'Steve'], 'Age': [35, 25, 32, 30, 26, 29], 'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR', 'HOD'], 'Sale':[422.19, 22.55, 12.66, 119.470, 200.190, 44.55], 'Salary':[12000, 10000, 8000, 14000, 11000, 14000] } data = pd.DataFrame(table) print(data) print('\n---First Five records DataFrame head()---') print(data.head()) print('\n---First two records DataFrame head(2)---') print(data.head(2)) print('\n---Bottom Five records DataFrame tail()---') print(data.tail()) print('\n---last two records DataFrame tail(2)---') print(data.tail(2))

Pandas Transpose DataFrame in Python
Python DataFrame has inbuilt functionality to transpose a Matrix. For this, you have to use DataFrame.T
import pandas as pd table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'], 'Age': [35, 25, 32, 30, 26], 'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'], 'Sale':[422.19, 22.55, 12.66, 119.470, 200.190], 'Salary':[12000, 10000, 8000, 14000, 11000] } data = pd.DataFrame(table) print(data) print('\n---Transposed DataFrame---') print(data.T)

Python DataFrame groupby
A Python DataFrame groupby function is similar to Group By clause in Sql Server. I mean, you can use this Pandas groupby function to group data by some columns and find the aggregated results of the other columns. This is one of the important concept or function, while working with real-time data.
In this example, we created a DataFrame of different columns and data types. Next, we used this groupby function on that DataFrame. The first statement, data.groupby(‘Profession’).sum() groups DataFrame by Profession column and calculate the sum of Sales, Salary and Age. The second statement, data.groupby([‘Profession’, ‘Age’]).sum() groups DataFrame by Profession and Age columns and calculate the sum of Sales, and Salary. Remember, any string columns (unable to aggregate) will be concatenated or combined.
import pandas as pd table = {'name': ['Kane', 'Dave', 'Ram', 'John', 'Mike', 'Suresh', 'Tracy'], 'Age': [35, 25, 25, 35, 25, 35, 35], 'Profession': ['Analyst', 'HR', 'Analyst', 'Admin', 'HR', 'Admin', 'HR'], 'Sale':[422, 22, 55, 12, 119, 470, 200], 'Salary':[12000, 9000, 10000, 8000, 14000, 20000, 11000] } data = pd.DataFrame(table) print(data) print('\n---DataFrame groupby Profession---') print(data.groupby('Profession').sum()) print('\n---DataFrame groupby Profession and Age---') print(data.groupby(['Profession', 'Age']).sum())

Python DataFrame stack
A Python Pandas DataFrame stack function is used to compress one level of a DataFrame object. In order to use this DataFrame stack function, you can simply call data_to_stack.stack(). In this example, we are using this Python DataFrame stack function on grouped data (groupby function result) to further compress the DataFrame.
import pandas as pd table = {'name': ['Kane', 'Dave', 'Ram', 'John', 'Mike', 'Suresh', 'Tracy'], 'Age': [35, 25, 25, 35, 25, 35, 35], 'Profession': ['Analyst', 'HR', 'Analyst', 'Admin', 'HR', 'Admin', 'HR'], 'Sale':[422, 22, 55, 12, 119, 470, 200], 'Salary':[12000, 9000, 10000, 8000, 14000, 20000, 11000] } data = pd.DataFrame(table) grouped_data1 = data.groupby('Profession').sum() stacked_data1 = grouped_data1.stack() print('\n---Stacked DataFrame groupby Profession---') print(stacked_data1) grouped_data2 = data.groupby(['Profession', 'Age']).sum() stacked_data2 = grouped_data2.stack() print('\n---Stacked DataFrame groupby Profession and Age---') print(stacked_data2)

Python DataFrame unstack
The DataFrame unstack function undo the operation done by stack function or say, opposite to stack function. This Python DataFrame unstack function uncompress the last column of a stacked DataFrame (.stack() function). In order to use this function, you can simply call stacked_data.unstack()
import pandas as pd table = {'name': ['Kane', 'Dave', 'Ram', 'John', 'Mike', 'Suresh', 'Tracy'], 'Age': [35, 25, 25, 35, 25, 35, 35], 'Profession': ['Analyst', 'HR', 'Analyst', 'Admin', 'HR', 'Admin', 'HR'], 'Sale':[422, 22, 55, 12, 119, 470, 200], 'Salary':[12000, 9000, 10000, 8000, 14000, 20000, 11000] } data = pd.DataFrame(table) grouped_data1 = data.groupby('Profession').sum() stacked_data1 = grouped_data1.stack() unstacked_data1 = stacked_data1.unstack() # print('\n---Stacked DataFrame groupby Profession---') # print(stacked_data1) print('\n---Unstacked DataFrame groupby Profession---') print(unstacked_data1) grouped_data2 = data.groupby(['Profession', 'Age']).sum() stacked_data2 = grouped_data2.stack() unstacked_data2 = stacked_data2.unstack() # print('\n---Stacked DataFrame groupby Profession and Age---') # print(stacked_data2) print('\n---Unstacked DataFrame groupby Profession and Age---') print(unstacked_data2)

Python DataFrame Concatenation
A Pandas DataFrame concat function is used to combine or concatenate DataFrame objects. First, we declared two DataFrames of random values of a size 4 * 6. Next, we used concat function to concatenate DataFrames
import pandas as pd import numpy as np dataframe_one = pd.DataFrame(np.random.randn(4, 6)) print(dataframe_one) dataframe_two = pd.DataFrame(np.random.randn(4, 6)) print(dataframe_two) print('\n---DataFrame concatenation---') print(pd.concat([dataframe_one, dataframe_two]))

In the above example, we are concatenating two dataFrame objects of same size. However, you can use this Python Pandas DataFrame concat function to concatenate or combines more than two DataFrame Objects and different size. For this, we used three different size DataFrames of randomly generated numbers. Next, we used the Python DataFrame concat function to concat those three objects.
import numpy as np import pandas as pd dataframe_one = pd.DataFrame(np.random.randn(4, 6)) print(dataframe_one) dataframe_two = pd.DataFrame(np.random.randn(4, 5)) print(dataframe_two) dataframe_three = pd.DataFrame(np.random.randn(3, 4)) print(dataframe_three) print('\n-----DataFrame concatenation-----') print(pd.concat([dataframe_one, dataframe_two, dataframe_three]))

Python Math operations on DataFrame
In this example, we use few of the Python Pandas DataFrame mathematical functions. For the bands dataframe demo purpose, we are finding the Mean and Median of each column and each Row. To get the mean or median of each row, you have to place integer 1 inside the function.
import pandas as pd table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'], 'Age': [35, 25, 32, 30, 26], 'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'], 'Sale':[422.19, 22.55, 12.66, 119.470, 200.190], 'Salary':[12000, 10000, 8000, 14000, 11000] } data = pd.DataFrame(table) print(data) print('\n---DataFrame Mean of Columns---') print(data.mean()) print('\n---DataFrame Mean of Rows---') print(data.mean(1)) print('\n---DataFrame Median of Columns---') print(data.median()) print('\n---DataFrame Median of Rows---') print(data.median(1))

We are calculating the sum of all the rows of each column, sum of all columns in each row. Similarly, minimum value in a column, maximum value in each column, maximum value in each row using sum(), min() and max() functions.
import pandas as pd table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'], 'Age': [35, 25, 32, 30, 26], 'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'], 'Sale':[422.19, 22.55, 12.66, 119.470, 200.190], 'Salary':[12000, 10000, 8000, 14000, 11000] } data = pd.DataFrame(table) print(data) print('\n---DataFrame sum of Columns---') print(data.sum()) print('\n---DataFrame sum of Rows---') print(data.sum(1)) print('\n---DataFrame Minimum of Columns---') print(data.min()) print('\n---DataFrame Maximum of Columns---') print(data.max()) print('\n---DataFrame Maximum of Rows---') print(data.max(1))

Arithmetic Operations on Python Pandas DataFrame
We will perform Arithmetic operations on Python DataFrame
import pandas as pd table = {'Age': [25, 32, 30], 'Sale':[422.19, 119.470, 200.190], 'Salary':[12000, 14000, 11000] } data = pd.DataFrame(table) print(data) print('\n---Add 20 to DataFrame---') print(data + 20) print('\n---Subtract 10 from DataFrame---') print(data - 10) print('\n---Multiply DataFrame by 2---') print(data * 2)

Python Pandas DataFrame Nulls
The isnull check and returns True if a value in DataFrame is Null otherwise False. Pandas notnull function returns True if value is not Null otherwise, False is returned.
import pandas as pd import numpy as np table = {'name': ['Kane', 'Suresh', np.nan], 'Profession': ['Manager', np.nan, 'HR'], 'Salary': [np.nan, 14000, 11000], 'Sale': [422.19, np.nan, 44.55] } data = pd.DataFrame(table) print(data) print('\n---Checking Nulls in a DataFrame---') print(data.isnull()) print('\n---Checking Not Nulls in a DataFrame---') print(data.notnull())

Replace Nulls in DataFrame
We can also replace those Null values with a meaningful numbers. For this, use Python DataFrame fillna function or replace function.
import pandas as pd import numpy as np table = {'Age': [20, 35, np.nan], 'Salary': [np.nan, 14000, 11000], 'Sale': [422.19, np.nan, 44.55] } data = pd.DataFrame(table) print(data) print('\n---Fill Missing Values DataFrame---') print(data.fillna(30)) print('\n---Replace Missing Values DataFrame---') print(data.replace({np.nan:66}))

Pandas DataFrame pivot
The DataFrame has a pivot function, which is very useful to pivot the existing DataFrame.
import pandas as pd table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'], 'Age': [35, 25, 32, 30, 26], 'Profession': ['Manager', 'HR', 'Analyst', 'Manager', 'HR'], 'Sale':[422.19, 22.55, 12.66, 119.470, 200.190], 'Salary':[12000, 10000, 8000, 14000, 11000] } data = pd.DataFrame(table) print(data) print('\n--- After DataFrame Pivot---') data2 = data.pivot(index = 'name', columns = 'Profession', values = 'Salary') print(data2) print('\n--- After DataFrame Pivot---') data3 = data.pivot(index = 'name', columns = 'Profession') print(data3)

Save DataFrame to CSV and Text File
To load data from a Pandas DataFrame to a csv file or text file, you have to use the Pandas to_csv function.
import pandas as pd table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'], 'Age': [35, 25, 32, 30, 26], 'Profession': ['Manager', 'HR', 'Analyst', 'Manager', 'HR'], 'Sale':[422.19, 22.55, 12.66, 119.470, 200.190], 'Salary':[12000, 10000, 8000, 14000, 11000] } data = pd.DataFrame(table) print(data) # load DataFrame to text file data.to_csv('user_info.txt') # load DataFrame to csv file with comma separator data.to_csv('user_info.csv') # load data from DataFrame to csv file with Tab separator data.to_csv('user_info_new.csv', sep = '\t')

Iterate over DataFrame Rows
In Python, use any of the three functions iteritems, iterrows and itertuple to iterate over rows and returns each row of a DataFrame.
import pandas as pd table = {'name': ['Kane', 'John', 'Mike'], 'Age': [35, 25, 32], 'Profession': ['Manager', 'HR', 'Analyst'], 'Sale':[422.19, 119.470, 200.190], 'Salary':[12000, 14000, 11000] } data = pd.DataFrame(table) print(data) print('\n---Iterating Rows---') for rows, columns in data.iterrows(): print(rows, columns) print()
