Python pandas DataFrame

Pandas DataFrame in Python is a two dimensional data structure. It means, python pandas dataframe stores data in a tabular format i.e., rows and columns. In this article, we show how to create Python Pandas DataFrame, access, alter rows and columns. Next, we will discuss about Transposing DataFrame in Python pandas library, Iterating over rows so on.

How to create a DataFrame in Python?

In real-time, we use this Pandas dataFrame to load data from Sql Server, Text Files, Excel Files or any CSV Files. Next, we slice and dice that data as per our requirements. Once the data is in our required format, we use that data to create reports or charts or graphs using matplotlib module.

Create an Empty DataFrame in pandas

This is a simple example to create an empty DataFrame in Python. Here, we are creating an empty one

import pandas as pd
 
data = pd.DataFrame()
print(data)

Creating an Empty DataFrame in Python output

Columns: []
Index: []

Create pandas DataFrame from List

Here, we create a list of Python integer values. Next, we used pandas.DataFrame function to create our df from list or to convert list.

table = [1, 2, 3, 4, 5]

data = pd.DataFrame(table)
print(data)
   0
0  1
1  2
2  3
3  4
4  5

Creating from Mixed List or convert Mixed List to pandas Data Frame. Here, we are also using multiple rows and Columns.

table = [[1, 'Suresh'], [2, 'Python'], [3, 'Hello']]

data = pd.DataFrame(table)
print(data)
   0       1
0  1  Suresh
1  2  Python
2  3   Hello

Assign the names to columns values.

table = [[1, 'Suresh'], [2, 'Python'], [3, 'Hello']]
data = pd.DataFrame(table, columns = ['S.No', 'Name'])
print(data)
   S.No    Name
0     1  Suresh
1     2  Python
2     3   Hello

Python DataFrame of Random Numbers

To create a Pandas DataFrame using random numbers, we used numpy random function to generate random numbers of size 8 * 4. Next, we used Python function to convert those sequence to a DataFrame

import numpy as np
import pandas as pd
 
d_frame = pd.DataFrame(np.random.randn(8, 4))
print(d_frame)
          0         1         2         3
0 -0.492116 -0.824771 -0.869890 -1.753722
1 -0.733930  0.947616  0.089861  0.888474
2 -0.948483 -1.002449 -0.283761 -0.207897
3  0.013346  2.059951  1.064830  0.830474
4  0.289157 -0.418271 -0.770464  0.223895
5 -0.781827 -0.396441  0.123848 -0.824002
6  0.667090  0.183589  1.212163  0.231251
7  1.067570 -0.615639  0.461147 -1.365541

Python Pandas DataFrame from dict

Python pandas allows you to create DataFrame from dict or dictionary. It was pretty much straight forward. All you have to do is, declare a dictionary of different values and then use Python DataFrame function to convert that dictionary to Data Frame

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
 
data = pd.DataFrame(table)
print(data)

To create a df from dict, the length of Dictionary values of all keys should be the same otherwise, it throws error. Next, if you are passing the index values then they should match the length of key values or arrays otherwise, it raises an error. If you haven’t passed any index values then it will automatically create an index for you, and it start from 0 to n-1.

     name   Salary
0    John  1000000
1    Mike  1200000
2  Suresh   900000
3   Tracy  1100000

Let me take another example of Python pandas DataFrame from Dictionary. This time, we are converting dictionary keys of four columns of Data.

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data = pd.DataFrame(table)
print(data)
Pandas Create DataFrame from dict 2

How to create pandas DataFrame from dict of lists

If you are confused to place everything in one place, divide them into part. Here, we declared four lists of items and then assigned them for each column.

names = ['John', 'Mike', 'Suresh', 'Tracy']
ages =  [25, 32, 30, 26]
Professions = ['Developer', 'Analyst', 'Admin', 'HR']
Salaries = [1000000, 1200000, 900000, 1100000]
	      
table = {'name': names,
         'Age': ages,
         'Profession': Professions,
         'Salary': Salaries
         }
	      
data = pd.DataFrame(table)
print(data)
     name  Age Profession   Salary
0    John   25  Developer  1000000
1    Mike   32    Analyst  1200000
2  Suresh   30      Admin   900000
3   Tracy   26         HR  1100000

Python Pandas DataFrame of Dates

Using Python pandas module, you can also create a DataFrame with series of dates. Let me create one with dates from 2019-01-01 to 2019-01-08. By changing the period values, you can generate more number of Date sequence.

import numpy as np
import pandas as pd

dates = pd.date_range('20190101', periods = 8)
print(dates)
print()

d_frame = pd.DataFrame(np.random.randn(8, 4), index = dates,
                       columns = {'apples', 'oranges', 'kiwis', 'bananas'})
print(d_frame)
DatetimeIndex(['2019-01-01', '2019-01-02', '2019-01-03', '2019-01-04',
               '2019-01-05', '2019-01-06', '2019-01-07', '2019-01-08'],
              dtype='datetime64[ns]', freq='D')

               kiwis    apples   oranges   bananas
2019-01-01 -0.393538 -0.406943  1.612431  1.089230
2019-01-02  1.070080 -1.565538  0.727056  1.677534
2019-01-03 -1.324169  0.256827  1.332544 -2.952971
2019-01-04  0.419778 -0.562119  0.507846 -0.223730
2019-01-05  0.175785  1.566511 -1.832633  2.035536
2019-01-06  0.541516 -0.113477  0.444046  0.387718
2019-01-07  0.247760 -1.143530  0.615681  0.400743
2019-01-08 -0.242328  0.913758 -0.088591 -0.533690

Pandas DataFrame Columns

This example shows how to reorder the columns. By default, Data Frame will use the column order that we used in the actual data. However, you can use the this argument to alter the position of any column. Let me change the Age from 2nd position to 4th.

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data1 = pd.DataFrame(table)
print(data1)

print('\n---- After Changing the Order-----')
data2 = pd.DataFrame(table, columns = ['name', 'Profession', 'Salary', 'Age'])
print(data2)

Please be careful, while using this columns argument. If we specified any non-existing column name or typo mistake will returns NaN. Let me use Qualification column name (which doesn’t exist)

print('\n---- Using Wrong Col -----')
data3 = pd.DataFrame(table, columns = ['name', 'Qualification', 'Salary', 'Age'])
print(data3)
     name  Age Profession   Salary
0    John   25  Developer  1000000
1    Mike   32    Analyst  1200000
2  Suresh   30      Admin   900000
3   Tracy   26         HR  1100000

---- After Changing the Order-----
     name Profession   Salary  Age
0    John  Developer  1000000   25
1    Mike    Analyst  1200000   32
2  Suresh      Admin   900000   30
3   Tracy         HR  1100000   26

---- Using Wrong Col -----
     name Qualification   Salary  Age
0    John           NaN  1000000   25
1    Mike           NaN  1200000   32
2  Suresh           NaN   900000   30
3   Tracy           NaN  1100000   26

The columns attribute returns the list of available columns in a Data Frame in the same order, along with the datatype.

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data1 = pd.DataFrame(table)
print(data1)

data2 = pd.DataFrame(table, columns = ['name', 'Profession', 'Salary', 'Age'])

data3 = pd.DataFrame(table, columns = ['name', 'Qualification', 'Salary', 'Age'])

print(data1.columns)
print(data2.columns)
print(data3.columns)
     name  Age Profession   Salary
0    John   25  Developer  1000000
1    Mike   32    Analyst  1200000
2  Suresh   30      Admin   900000
3   Tracy   26         HR  1100000
Index(['name', 'Age', 'Profession', 'Salary'], dtype='object')
Index(['name', 'Profession', 'Salary', 'Age'], dtype='object')
Index(['name', 'Qualification', 'Salary', 'Age'], dtype='object')

Pandas DataFrame Index

By default, Python will assign the index values from 0 to n-1, where n is the maximum number. However, you have an option to alter those default index values using the index attribute. Here, we using the same and assigning the alphabets from a to d as the index values.

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

#Without Index Values - Uses Default Values
data1 = pd.DataFrame(table)
print(data1)

# Index Values are a, b, c, d
data2 = pd.DataFrame(table, index = ['a', 'b', 'c', 'd'])
print('\n----After Setting Index Values----')
print(data2)
     name  Age Profession   Salary
0    John   25  Developer  1000000
1    Mike   32    Analyst  1200000
2  Suresh   30      Admin   900000
3   Tracy   26         HR  1100000

----After Setting Index Values----
     name  Age Profession   Salary
a    John   25  Developer  1000000
b    Mike   32    Analyst  1200000
c  Suresh   30      Admin   900000
d   Tracy   26         HR  1100000

In Python, you can use DataFrame set_index function to change or set a column as an index value. Here, we use this set_index function not set name as an index. Next, the loc function to show that, we can extra information using index name.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data = pd.DataFrame(table)

print('\n---Setting name as an index---')
new_data = data.set_index('name')
print(new_data)

print('\n---Return Index John Details---')
print(new_data.loc['John'])

---Setting name as an index---
        Age Profession   Salary
name                           
John     25  Developer  1000000
Mike     32    Analyst  1200000
Suresh   30      Admin   900000
Tracy    26         HR  1100000

---Return Index John Details---
Age                  25
Profession    Developer
Salary          1000000
Name: John, dtype: object

Pandas DataFrame Attributes

The list of available pandas attributes of Python DataFrame

shape attribute

The Pandas shape attribute returns the shape or tuple of number of rows and columns in it.

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Salary':[1000000, 1200000, 900000, 1100000]
	    }

data = pd.DataFrame(table)

print('\n---Shape or Size ---')
print(data.shape)

---Shape or Size ---
(4, 3)

values attribute

The values attributes returns the data (without column names) in a two dimensional array format.

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data2 = pd.DataFrame(table, columns = ['name', 'Profession', 'Salary', 'Age'])

data3 = pd.DataFrame(table, columns = ['name', 'Qualification', 'Salary', 'Age'])

print('---Data2 Values--- ')
print(data2.values)

print('\n---Data3 Values--- ')
print(data3.values)
---Data2 Values--- 
[['John' 'Developer' 1000000 25]
 ['Mike' 'Analyst' 1200000 32]
 ['Suresh' 'Admin' 900000 30]
 ['Tracy' 'HR' 1100000 26]]

---Data3 Values--- 
[['John' nan 1000000 25]
 ['Mike' nan 1200000 32]
 ['Suresh' nan 900000 30]
 ['Tracy' nan 1100000 26]]

The above pandas examples are returns an array of type Object. This is because, both these Data Frames has a mixed content (int, string). If that is not the case then it won’t display any dtype inside an array. For this, we used an integer Data Frame

import pandas as pd

table = {'Age': [25, 32, 30, 26],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data4 = pd.DataFrame(table)
print(data4.values)
[[     25 1000000]
 [     32 1200000]
 [     30  900000]
 [     26 1100000]]

Pandas DataFrame name attribute

The Python DataFrame index and the column has a name attribute, which allows to assign a name to an index or column.

import pandas as pd
table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data1 = pd.DataFrame(table)

table = {'Age': [25, 32, 30, 26],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data4 = pd.DataFrame(table)

data1.index.name = 'Emp No'
print(data1)
print()

data4.index.name = 'Cust No'
print(data4)
          name  Age Profession   Salary
Emp No                                 
0         John   25  Developer  1000000
1         Mike   32    Analyst  1200000
2       Suresh   30      Admin   900000
3        Tracy   26         HR  1100000

         Age   Salary
Cust No              
0         25  1000000
1         32  1200000
2         30   900000
3         26  1100000

Similarly, we can use column labels attribute of Python pandas dataframe to assign name for column headers.

import pandas as pd
table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data1 = pd.DataFrame(table)

table = {'Age': [25, 32, 30, 26],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }

data4 = pd.DataFrame(table)

data1.columns.name = 'Employee Details'
print(data1)
 
data4.columns.name = 'Customers Information'
print(data4)
Employee Details    name  Age Profession   Salary
0                   John   25  Developer  1000000
1                   Mike   32    Analyst  1200000
2                 Suresh   30      Admin   900000
3                  Tracy   26         HR  1100000
Customers Information  Age   Salary
0                       25  1000000
1                       32  1200000
2                       30   900000
3                       26  1100000

dtype attribute

The dtype attribute returns the data type of each column in the data structures.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[22.55, 12.66, 119.470, 200.190],
         'Salary':[10000, 12000, 9000, 11000]
         }

data = pd.DataFrame(table)

print('\n---dtype attribute result---')
print(data.dtypes)

dtype attribute output


---dtype attribute result---
name           object
Age             int64
Profession     object
Sale          float64
Salary          int64
dtype: object

Python DataFrame describe function

Use this python DataFrame describe function to get a quick statistical information about it.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[10000, 12000, 9000, 11000]
         }

data1 = pd.DataFrame(table)

print('\n---describe function result---')
print(data1.describe())

describe function output


---describe function result---
             Age        Salary
count   4.000000      4.000000
mean   28.250000  10500.000000
std     3.304038   1290.994449
min    25.000000   9000.000000
25%    25.750000   9750.000000
50%    28.000000  10500.000000
75%    30.500000  11250.000000
max    32.000000  12000.000000

How to access Python DataFrame Data?

The data in Python DataFrame is stored in a tabular format of rows and columns. It means, you can access items using columns and rows.

Accessing Pandas Data Frame Columns

You can access the columns in two ways, either specifying the column name inside the [] or after a dot notation. Both these methods will returns the specified column as a Series.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[10000, 12000, 9000, 11000]
         }

data1 = pd.DataFrame(table)
data2 = pd.DataFrame(table, columns = ['name', 'Profession', 'Salary', 'Age'])

print('-----Accessing Columns-----')
print(data1.Age)
print(data1['name'])
print(data2.Salary)

# We can also access multiple columns
print('-----Accessing Multiple Cols-----')
print(data1[['Age', 'Profession']])
print(data2[['name', 'Salary']])
-----Accessing Columns-----
0    25
1    32
2    30
3    26
Name: Age, dtype: int64
0      John
1      Mike
2    Suresh
3     Tracy
Name: name, dtype: object
0    10000
1    12000
2     9000
3    11000
Name: Salary, dtype: int64
-----Accessing Multiple Cols-----
   Age Profession
0   25  Developer
1   32    Analyst
2   30      Admin
3   26         HR
     name  Salary
0    John   10000
1    Mike   12000
2  Suresh    9000
3   Tracy   11000

This is an another example to access pandas Data Frame columns

import pandas as pd
table = {'name': ['Kane', 'John', 'Suresh', 'Tracy', 'Steve'],
         'Age': [35, 25, 32, 30, 29],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 119.470, 200.190, 44.55],
         'Salary':[12000, 10000, 14000, 11000, 14000]
	    }
data = pd.DataFrame(table)

print('\n---Select name column ---')
print(data['name'])

print('\n---Select Profession and Sale column ---')
print(data[['Profession', 'Sale']])

print('\n---Select Profession column ---')
print(data.Profession)

---Select name column ---
0      Kane
1      John
2    Suresh
3     Tracy
4     Steve
Name: name, dtype: object

---Select Profession and Sale column ---
  Profession    Sale
0    Manager  422.19
1  Developer   22.55
2    Analyst  119.47
3      Admin  200.19
4         HR   44.55

---Select Profession column ---
0      Manager
1    Developer
2      Analyst
3        Admin
4           HR
Name: Profession, dtype: object

Access Pandas DataFrame Rows

A Pandas DataFrame in Python can also be accessed using rows. Here, we are using the index slicing technique to returns the required rows from it. Here, data[1:] returns all the records in a data structures from index 1 to n-1, and data[1:3] returns rows from index 1 to 3.

import pandas as pd
table = {'Fullname': ['Kane', 'John', 'Suresh', 'Tracy', 'Steve'],
         'Age': [35, 25, 32, 30, 29],
         'Designation': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'SaleAmount':[422.19, 22.55, 119.470, 200.190, 44.55],
         'Income':[12000, 10000, 14000, 11000, 14000]
	    }
data = pd.DataFrame(table)
#print(data)

print('\n---Select all rows from 1 to N ---')
print(data[1:])

print('\n---Select rows from 1 to 2 ---')
print(data[1:3])

print('\n---Select rows from 0 to 3 ---')
print(data[0:4])

print('\n---Select last row ---')
print(data[-1:])

---Select all rows from 1 to N ---
  Fullname  Age Designation  SaleAmount  Income
1     John   25   Developer       22.55   10000
2   Suresh   32     Analyst      119.47   14000
3    Tracy   30       Admin      200.19   11000
4    Steve   29          HR       44.55   14000

---Select rows from 1 to 2 ---
  Fullname  Age Designation  SaleAmount  Income
1     John   25   Developer       22.55   10000
2   Suresh   32     Analyst      119.47   14000

---Select rows from 0 to 3 ---
  Fullname  Age Designation  SaleAmount  Income
0     Kane   35     Manager      422.19   12000
1     John   25   Developer       22.55   10000
2   Suresh   32     Analyst      119.47   14000
3    Tracy   30       Admin      200.19   11000

---Select last row ---
  Fullname  Age Designation  SaleAmount  Income
4    Steve   29          HR       44.55   14000

Pandas DataFrame loc method Example

A Pandas loc method is one of the important thing to understand. You can use the loc[] to select more than one column and more than one row at a time. Or, use this Pandas loc[] to select a portion by passing integer location to it. Use this loc method with square brackets to select rows from large datasets.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data = pd.DataFrame(table, index = ['a', 'b', 'c', 'd'])
#print(data)

print('\n---Select b row ---')
print(data.loc['b'])

print('\n---Select c row ---')
print(data.loc['c'])

print('\n---Select b and d rows ---')
print(data.loc[['b', 'd']])

---Select b row ---
name             Mike
Age                32
Profession    Analyst
Salary        1200000
Name: b, dtype: object

---Select c row ---
name          Suresh
Age               30
Profession     Admin
Salary        900000
Name: c, dtype: object

---Select b and d rows ---
    name  Age Profession   Salary
b   Mike   32    Analyst  1200000
d  Tracy   26         HR  1100000

The first statement, data.loc[:, [‘name’, ‘Sale’]] returns all the rows of name and sale column. Within the last statement, data.loc[1:3, [‘name’, ‘Profession’, ‘Salary’]] returns rows from index value 1 to 3 for the columns of name, profession and Salary.

import pandas as pd
table = {'name': ['Kane', 'John', 'Suresh', 'Tracy', 'Steve'],
         'Age': [35, 25, 32, 30, 29],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 119.470, 200.190, 44.55],
         'Salary':[12000, 10000, 14000, 11000, 14000]
	    }
data = pd.DataFrame(table)
#print(data)

print('\n---Select name, Sale column ---')
print(data.loc[:, ['name', 'Sale']])

print('\n---Select name, Profession, Salary ---')
print(data.loc[:, ['name', 'Profession', 'Salary']])

print('\n---Select rows from 1 to 2 ---')
print(data.loc[1:3, ['name', 'Profession', 'Salary']])

---Select name, Sale column ---
     name    Sale
0    Kane  422.19
1    John   22.55
2  Suresh  119.47
3   Tracy  200.19
4   Steve   44.55

---Select name, Profession, Salary ---
     name Profession  Salary
0    Kane    Manager   12000
1    John  Developer   10000
2  Suresh    Analyst   14000
3   Tracy      Admin   11000
4   Steve         HR   14000

---Select rows from 1 to 2 ---
     name Profession  Salary
1    John  Developer   10000
2  Suresh    Analyst   14000
3   Tracy      Admin   11000

iloc Example

Similar to loc[], Python Pandas DataFrame has iloc[]. However, this will only accept integer values or index to return data.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data = pd.DataFrame(table, index = ['a', 'b', 'c', 'd'])
#print(data)

print('\n---Select 1st row ---')
print(data.iloc[1])

print('\n---Select 3rd row ---')
print(data.iloc[3])

print('\n---Select 1 and 3 rows ---')
print(data.iloc[[1, 3]])

---Select 1st row ---
name             Mike
Age                32
Profession    Analyst
Salary        1200000
Name: b, dtype: object

---Select 3rd row ---
name            Tracy
Age                26
Profession         HR
Salary        1100000
Name: d, dtype: object

---Select 1 and 3 rows ---
    name  Age Profession   Salary
b   Mike   32    Analyst  1200000
d  Tracy   26         HR  1100000

You can use loc, iloc, at and iat to extract or access a single value. The following example will show you the same.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
   import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data = pd.DataFrame(table)
#print(data)

print('\nitem at 0, 0         = ', data.iloc[0][0])

print('item at 0, 1           = ', data.loc[0][1])

print('item at 1, Profession  = ', data.loc[1]['Profession'])

print('item at 2, 3           = ', data.iat[2, 3])

print('item at 0, Salary      = ', data.at[0, 'Salary'])

item at 0, 0           =  John
item at 0, 1           =  25
item at 1, Profession  =  Analyst
item at 2, 3           =  900000
item at 0, Salary      =  1000000

How to add a New Column to Pandas DataFrame?

In this example, we will show you, how to add a new column to an existing DataFrame. data[‘Sale’] = [422.19, 200.190, 44.55] adds completely new column called Sale. data[‘Income’] = data[‘Salary’] + data[‘basic’] adds new column Income by adding values in Salary column and basic column.

import pandas as pd

table = {'name': ['Kane', 'Suresh', 'Tracy'],
         'Age': [35, 25, 29],
         'Profession': ['Manager', 'Developer', 'HR'],
         'Salary': [10000, 14000, 11000],
         'basic': [4000, 6000, 4500]
        }

data = pd.DataFrame(table)

# Add New Column
data['Sale'] = [422.19, 200.190, 44.55]
print('\n---After adding New Column ---')
print(data)

# Add New Column using existing
data['Income'] = data['Salary'] + data['basic']
print('\n---Total Salary ---')
print(data)

# Add New Calculated Column
data['New_Salary'] = data['Salary'] + data['Salary'] * 0.25
print('\n---After adding New Column ---')
print(data)

---After adding New Column ---
     name  Age Profession  Salary  basic    Sale
0    Kane   35    Manager   10000   4000  422.19
1  Suresh   25  Developer   14000   6000  200.19
2   Tracy   29         HR   11000   4500   44.55

---Total Salary ---
     name  Age Profession  Salary  basic    Sale  Income
0    Kane   35    Manager   10000   4000  422.19   14000
1  Suresh   25  Developer   14000   6000  200.19   20000
2   Tracy   29         HR   11000   4500   44.55   15500

---After adding New Column ---
     name  Age Profession  Salary  basic    Sale  Income  New_Salary
0    Kane   35    Manager   10000   4000  422.19   14000     12500.0
1  Suresh   25  Developer   14000   6000  200.19   20000     17500.0
2   Tracy   29         HR   11000   4500   44.55   15500     13750.0

Delete a Column from a DataFrame in Python

In Python, there are two ways to delete a column from a Pandas DataFrame. Either you can use del function or pop function. In this example, we are going to use both these function to delete columns from it.

Here, del(data[‘basic’]) deletes basic column (complete rows belong to basic column). x = data.pop(‘Age’) deletes or pops Age column, and we are printing that popped column as well. Next, we used the drop function to delete Sale column.

import pandas as pd

table = {'name': ['Kane', 'Suresh', 'Tracy'],
         'Age': [35, 25, 29],
         'Profession': ['Manager', 'Developer', 'HR'],
         'Salary': [10000, 14000, 11000],
         'basic': [4000, 6000, 4500],
         'Sale': [422.19, 200.190, 44.55]
        }

data = pd.DataFrame(table)
#print(data)

# Delete existing Columns
del(data['basic'])
print('\n---After Deleting basic Column ---')
print(data)

x = data.pop('Age')
print('\n---After Deleting Age Column ---')
print(data)
print('\n---pop Column ---')
print(x)

y = data.drop(columns = 'Sale')
print('\n---After Deleting Sale Column ---')
print(y)

---After Deleting basic Column ---
     name  Age Profession  Salary    Sale
0    Kane   35    Manager   10000  422.19
1  Suresh   25  Developer   14000  200.19
2   Tracy   29         HR   11000   44.55

---After Deleting Age Column ---
     name Profession  Salary    Sale
0    Kane    Manager   10000  422.19
1  Suresh  Developer   14000  200.19
2   Tracy         HR   11000   44.55

---pop Column ---
0    35
1    25
2    29
Name: Age, dtype: int64

---After Deleting Sale Column ---
     name Profession  Salary
0    Kane    Manager   10000
1  Suresh  Developer   14000
2   Tracy         HR   11000

How to delete DataFrame Row in Python?

In this Python example, we are using the Pandas drop function to delete rows.

import pandas as pd

table = {'name': ['Kane', 'Suresh', 'Tracy'],
         'Profession': ['Manager', 'Developer', 'HR'],
         'Salary': [10000, 14000, 11000],
         'Sale': [422.19, 200.190, 44.55]
        }

data = pd.DataFrame(table, index = ['a', 'b', 'c'])
#print(data)

x = data.drop('b')
print('\n---After Deleting b row---')
print(x)

y = data.drop('a')
print('\n---After Deleting a row---')
print(y)

---After Deleting b row---
    name Profession  Salary    Sale
a   Kane    Manager   10000  422.19
c  Tracy         HR   11000   44.55

---After Deleting a row---
     name Profession  Salary    Sale
b  Suresh  Developer   14000  200.19
c   Tracy         HR   11000   44.55

How to rename Pandas DataFrame Column?

In Python, use Pandas rename function to rename one or more columns. Here, we use this Pandas rename function to rename Profession column to Qualification and Salary to Income.

import pandas as pd

table = {'name': ['John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [25, 32, 30, 26],
         'Profession': ['Developer', 'Analyst', 'Admin', 'HR'],
         'Salary':[1000000, 1200000, 900000, 1100000]
         }
data = pd.DataFrame(table)

# data = data.rename(columns = {'Profession': 'Qualification'})
data.rename(columns = {'Profession': 'Qualification'}, inplace = True)
print('\n---After Renaming Column ---')
print(data)

data.rename(columns =
                {'Profession': 'Qualification',
                'Salary': 'Income'},
            inplace = True)
print('\n---After Renaming two Column ---')
print(data)

---After Renaming Column ---
     name  Age Qualification   Salary
0    John   25     Developer  1000000
1    Mike   32       Analyst  1200000
2  Suresh   30         Admin   900000
3   Tracy   26            HR  1100000

---After Renaming two Column ---
     name  Age Qualification   Income
0    John   25     Developer  1000000
1    Mike   32       Analyst  1200000
2  Suresh   30         Admin   900000
3   Tracy   26            HR  1100000

Python pandas head and tail

If you are coming from R programming, you might be familiar with head and tail functions. The head function accepts integer value as an argument and returns Top or first given number of records.

For instance, head(5) returns Top 5 records. Similarly, Python DataFrame tail function returns bottom or last records. For example, tail(5) returns last 5 records or bottom 5 records.

import pandas as pd
table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy', 'Steve'],
         'Age': [35, 25, 32, 30, 26, 29],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR', 'HOD'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190, 44.55],
         'Salary':[12000, 10000, 8000, 14000, 11000, 14000]
	    }
data = pd.DataFrame(table)

print('\n---First Five records head()---')
print(data.head())

print('\n---First two records head(2)---')
print(data.head(2))

print('\n---Bottom Five records tail()---')
print(data.tail())

print('\n---last two records tail(2)---')
print(data.tail(2))

---First Five records head()---
     name  Age Profession    Sale  Salary
0    Kane   35    Manager  422.19   12000
1    John   25  Developer   22.55   10000
2    Mike   32    Analyst   12.66    8000
3  Suresh   30      Admin  119.47   14000
4   Tracy   26         HR  200.19   11000

---First two records head(2)---
   name  Age Profession    Sale  Salary
0  Kane   35    Manager  422.19   12000
1  John   25  Developer   22.55   10000

---Bottom Five records tail()---
     name  Age Profession    Sale  Salary
1    John   25  Developer   22.55   10000
2    Mike   32    Analyst   12.66    8000
3  Suresh   30      Admin  119.47   14000
4   Tracy   26         HR  200.19   11000
5   Steve   29        HOD   44.55   14000

---last two records tail(2)---
    name  Age Profession    Sale  Salary
4  Tracy   26         HR  200.19   11000
5  Steve   29        HOD   44.55   14000

Transpose pandas DataFrame in Python

Python DataFrame has inbuilt functionality to transpose a Matrix. For this, you have to use df.T

import pandas as pd
table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 32, 30, 26],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190],
         'Salary':[12000, 10000, 8000, 14000, 11000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---Transposed ---')
print(data.T)
     name  Age Profession    Sale  Salary
0    Kane   35    Manager  422.19   12000
1    John   25  Developer   22.55   10000
2    Mike   32    Analyst   12.66    8000
3  Suresh   30      Admin  119.47   14000
4   Tracy   26         HR  200.19   11000

---Transposed ---
                  0          1        2       3       4
name           Kane       John     Mike  Suresh   Tracy
Age              35         25       32      30      26
Profession  Manager  Developer  Analyst   Admin      HR
Sale         422.19      22.55    12.66  119.47  200.19
Salary        12000      10000     8000   14000   11000

Python DataFrame groupby

A Python DataFrame groupby function is similar to Group By clause in Sql Server. I mean, you can use this Pandas groupby function to group data by some columns and find the aggregated results of the other columns. This is one of the important concept or function, while working with real-time data.

In this example, we created a table different columns and data types. Next, we used this groupby function. The first statement, data.groupby(‘Profession’).sum() groups Data Frame by Profession column and calculate the sum of Sales, Salary and Age.

The second statement, data.groupby([‘Profession’, ‘Age’]).sum() groups Data frame by Profession and Age columns and calculate the sum of Sales, and Salary. Remember, any string columns (unable to aggregate) will be concatenated or combined.

import pandas as pd
table = {'name': ['Kane', 'Dave', 'Ram', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 25, 35, 25, 35, 35],
         'Profession': ['Analyst', 'HR', 'Analyst', 'Admin', 'HR', 'Admin', 'HR'],
         'Sale':[422, 22, 55, 12, 119, 470, 200],
         'Salary':[12000, 9000, 10000, 8000, 14000, 20000, 11000]
	    }
data = pd.DataFrame(table)
#print(data)

print('\n--- groupby Profession---')
print(data.groupby('Profession').sum())

print('\n--- groupby Profession and Age---')
print(data.groupby(['Profession', 'Age']).sum())

--- groupby Profession---
            Age  Sale  Salary
Profession                   
Admin        70   482   28000
Analyst      60   477   22000
HR           85   341   34000

--- groupby Profession and Age---
                Sale  Salary
Profession Age              
Admin      35    482   28000
Analyst    25     55   10000
           35    422   12000
HR         25    141   23000
           35    200   11000

Python DataFrame stack

A Python Pandas stack function is used to compress one level of a DataFrame object. In order to use this stack function, you can simply call data_to_stack.stack(). In this example, we are using this Python DataFrame stack function on grouped data (groupby function result) to further compress it.

import pandas as pd
table = {'name': ['Kane', 'Dave', 'Ram', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 25, 35, 25, 35, 35],
         'Profession': ['Analyst', 'HR', 'Analyst', 'Admin', 'HR', 'Admin', 'HR'],
         'Sale':[422, 22, 55, 12, 119, 470, 200],
         'Salary':[12000, 9000, 10000, 8000, 14000, 20000, 11000]
	    }
data = pd.DataFrame(table)

grouped_data1 = data.groupby('Profession').sum()
stacked_data1 = grouped_data1.stack()
print('\n---Stacked groupby Profession---')
print(stacked_data1)

grouped_data2 = data.groupby(['Profession', 'Age']).sum()
stacked_data2 = grouped_data2.stack()
print('\n---Stacked groupby Profession and Age---')
print(stacked_data2)

---Stacked groupby Profession---
Profession        
Admin       Age          70
            Sale        482
            Salary    28000
Analyst     Age          60
            Sale        477
            Salary    22000
HR          Age          85
            Sale        341
            Salary    34000
dtype: int64

---Stacked groupby Profession and Age---
Profession  Age        
Admin       35   Sale        482
                 Salary    28000
Analyst     25   Sale         55
                 Salary    10000
            35   Sale        422
                 Salary    12000
HR          25   Sale        141
                 Salary    23000
            35   Sale        200
                 Salary    11000
dtype: int64

Python DataFrame unstack

The unstack function undo the operation done by stack function or say, opposite to stack function. This Python DataFrame unstack function uncompress the last column of a stacked Data Frame (.stack() function). In order to use this function, you can simply call stacked_data.unstack()

import pandas as pd
table = {'name': ['Kane', 'Dave', 'Ram', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 25, 35, 25, 35, 35],
         'Profession': ['Analyst', 'HR', 'Analyst', 'Admin', 'HR', 'Admin', 'HR'],
         'Sale':[422, 22, 55, 12, 119, 470, 200],
         'Salary':[12000, 9000, 10000, 8000, 14000, 20000, 11000]
	    }
data = pd.DataFrame(table)

grouped_data1 = data.groupby('Profession').sum()
stacked_data1 = grouped_data1.stack()
unstacked_data1 = stacked_data1.unstack()
# print('\n---Stacked groupby Profession---')
# print(stacked_data1)
print('\n---Unstacked groupby Profession---')
print(unstacked_data1)

grouped_data2 = data.groupby(['Profession', 'Age']).sum()
stacked_data2 = grouped_data2.stack()
unstacked_data2 = stacked_data2.unstack()
# print('\n---Stacked groupby Profession and Age---')
# print(stacked_data2)
print('\n---Unstacked groupby Profession and Age---')
print(unstacked_data2)

---Unstacked groupby Profession---
            Age  Sale  Salary
Profession                   
Admin        70   482   28000
Analyst      60   477   22000
HR           85   341   34000

---Unstacked groupby Profession and Age---
                Sale  Salary
Profession Age              
Admin      35    482   28000
Analyst    25     55   10000
           35    422   12000
HR         25    141   23000
           35    200   11000

Python DataFrame Concatenation

A Pandas DataFrame concat function is used to combine or concatenate objects. First, we declared two dfs of random values of a size 4 * 6. Next, we used concat function to concatenate.

import pandas as pd
import numpy as np

df1 = pd.DataFrame(np.random.randn(4, 6))
print(df1)

df2 = pd.DataFrame(np.random.randn(4, 6))
print(df2)

print('\n--- concatenation---')
print(pd.concat([df1, df2]))
          0         1         2         3         4         5
0  0.170510 -0.549890 -0.076595 -1.666645 -0.500168 -0.837365
1 -1.056680 -0.296667 -1.418145 -0.357668 -0.319350  2.131726
2  1.359241  0.913525 -0.590698 -0.460282  1.198779 -0.900188
3  0.550750 -0.186552  0.543404  1.520353  0.288910  0.563674
          0         1         2         3         4         5
0  0.748928 -0.095618 -0.490589  0.950306 -0.786737  0.968456
1 -0.561079  0.204682  1.356939 -1.907207 -0.625462  0.163865
2  0.391494  0.881150  0.871912 -0.448490  0.589685  0.271900
3  0.179141 -0.589593 -0.335848 -0.348342  0.516758  0.691327

--- concatenation---
          0         1         2         3         4         5
0  0.170510 -0.549890 -0.076595 -1.666645 -0.500168 -0.837365
1 -1.056680 -0.296667 -1.418145 -0.357668 -0.319350  2.131726
2  1.359241  0.913525 -0.590698 -0.460282  1.198779 -0.900188
3  0.550750 -0.186552  0.543404  1.520353  0.288910  0.563674
0  0.748928 -0.095618 -0.490589  0.950306 -0.786737  0.968456
1 -0.561079  0.204682  1.356939 -1.907207 -0.625462  0.163865
2  0.391494  0.881150  0.871912 -0.448490  0.589685  0.271900
3  0.179141 -0.589593 -0.335848 -0.348342  0.516758  0.691327

In the above example, we are concatenating two df objects of same size. However, you can use this Python Pandas DataFrame concat function to concatenate or combines more than two Objects and different size.

For this, we used three different size Data Frames of randomly generated numbers. Next, we used the concat function to concat those three objects.

import numpy as np
import pandas as pd

dfA = pd.DataFrame(np.random.randn(4, 6))
print(dfA)

dfB = pd.DataFrame(np.random.randn(4, 5))
print(dfB)

dfC = pd.DataFrame(np.random.randn(3, 4))
print(dfC)

print('\n-----concatenation-----')
print(pd.concat([dfA, dfB, dfC]))
          0         1         2         3         4         5
0 -0.071220  0.286829  0.726730 -1.046570  1.114306 -0.622870
1 -0.137455 -1.237104 -2.567032 -0.773737  0.446680  1.241036
2  0.417368 -0.544948 -1.368237 -0.409373 -1.757377  1.481192
3 -0.958583  0.116646  0.491579  1.018028  0.591651  1.072710
          0         1         2         3         4
0  2.525100 -0.172472 -2.364648 -2.312990  0.264522
1  0.041258  0.688158  1.192806  1.590377 -0.549352
2  0.723508 -1.246208 -0.497221  0.174042 -0.634088
3 -0.394750  1.186304  0.575888 -1.201602  0.851508
          0         1         2         3
0  0.038201 -0.987624 -1.347281  0.968429
1 -0.268102 -0.981864  0.378091  0.193392
2  2.287503  0.834575 -0.774165  1.244232

-----concatenation-----
          0         1         2         3         4         5
0 -0.071220  0.286829  0.726730 -1.046570  1.114306 -0.622870
1 -0.137455 -1.237104 -2.567032 -0.773737  0.446680  1.241036
2  0.417368 -0.544948 -1.368237 -0.409373 -1.757377  1.481192
3 -0.958583  0.116646  0.491579  1.018028  0.591651  1.072710
0  2.525100 -0.172472 -2.364648 -2.312990  0.264522       NaN
1  0.041258  0.688158  1.192806  1.590377 -0.549352       NaN
2  0.723508 -1.246208 -0.497221  0.174042 -0.634088       NaN
3 -0.394750  1.186304  0.575888 -1.201602  0.851508       NaN
0  0.038201 -0.987624 -1.347281  0.968429       NaN       NaN
1 -0.268102 -0.981864  0.378091  0.193392       NaN       NaN
2  2.287503  0.834575 -0.774165  1.244232       NaN       NaN

math operations

In this example, we use few of the Python Pandas DataFrame mathematical functions. For this math operations demo purpose, we are finding the Mean and Median of each column and each Row. To get the mean or median of each row, you have to place integer 1 inside the function.

import pandas as pd
table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 32, 30, 26],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190],
         'Salary':[12000, 10000, 8000, 14000, 11000]
	    }
data = pd.DataFrame(table)
#print(data)

print('\n--- Mean of Columns---')
print(data.mean())

print('\n---Mean of Rows---')
print(data.mean(1))

print('\n--- Median of Columns---')
print(data.median())

print('\n--- Median of Rows---')
print(data.median(1))

--- Mean of Columns---
Age          29.600
Sale        155.412
Salary    11000.000
dtype: float64

--- Mean of Rows---
0    4152.396667
1    3349.183333
2    2681.553333
3    4716.490000
4    3742.063333
dtype: float64

--- Median of Columns---
Age          30.00
Sale        119.47
Salary    11000.00
dtype: float64

--- Median of Rows---
0    422.19
1     25.00
2     32.00
3    119.47
4    200.19
dtype: float64

We are calculating the sum of all the rows of each column, sum of all columns in each row. Similarly, minimum value in a column, maximum value in each column, maximum value in each row using sum(), min() and max() functions.

import pandas as pd
table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 32, 30, 26],
         'Profession': ['Manager', 'Developer', 'Analyst', 'Admin', 'HR'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190],
         'Salary':[12000, 10000, 8000, 14000, 11000]
	    }
data = pd.DataFrame(table)
#print(data)

print('\n--- sum of Columns---')
print(data.sum())

print('\n--- sum of Rows---')
print(data.sum(1))

print('\n--- Minimum of Columns---')
print(data.min())

print('\n--- Maximum of Columns---')
print(data.max())

print('\n--- Maximum of Rows---')
print(data.max(1))

You can see the dtype of object and dtype float64


--- sum of Columns---
name                 KaneJohnMikeSureshTracy
Age                                      148
Profession    ManagerDeveloperAnalystAdminHR
Sale                                  777.06
Salary                                 55000
dtype: object

--- sum of Rows---
0    12457.19
1    10047.55
2     8044.66
3    14149.47
4    11226.19
dtype: float64

--- Minimum of Columns---
name           John
Age              25
Profession    Admin
Sale          12.66
Salary         8000
dtype: object

--- Maximum of Columns---
name            Tracy
Age                35
Profession    Manager
Sale           422.19
Salary          14000
dtype: object

--- Maximum of Rows---
0    12000.0
1    10000.0
2     8000.0
3    14000.0
4    11000.0
dtype: float64

Arithmetic Operations

We will perform Arithmetic operations

import pandas as pd
table = {'Age': [25, 32, 30],
         'Sale':[422.19, 119.470, 200.190],
         'Salary':[12000, 14000, 11000]
	    }
data = pd.DataFrame(table)
print(data)

print('\n---Add 20 ---')
print(data + 20)

print('\n---Subtract 10 ---')
print(data - 10)

print('\n---Multiply by 2---')
print(data * 2)
   Age    Sale  Salary
0   25  422.19   12000
1   32  119.47   14000
2   30  200.19   11000

---Add 20 ---
   Age    Sale  Salary
0   45  442.19   12020
1   52  139.47   14020
2   50  220.19   11020

---Subtract 10 ---
   Age    Sale  Salary
0   15  412.19   11990
1   22  109.47   13990
2   20  190.19   10990

---Multiply by 2---
   Age    Sale  Salary
0   50  844.38   24000
1   64  238.94   28000
2   60  400.38   22000

Python Pandas DataFrame Nulls

The isnull check and returns True if a value in data frame is Null otherwise False. Pandas notnull function returns True if value is not Null otherwise, False is returned.

import pandas as pd
import numpy as np

table = {'name': ['Kane', 'Suresh', np.nan],
         'Profession': ['Manager', np.nan, 'HR'],
         'Salary': [np.nan, 14000, 11000],
         'Sale': [422.19, np.nan, 44.55]
        }

data = pd.DataFrame(table)

print('\n---Checking Nulls ---')
print(data.isnull())

print('\n---Checking Not Nulls ---')
print(data.notnull())

---Checking Nulls ---
    name  Profession  Salary   Sale
0  False       False    True  False
1  False        True   False   True
2   True       False   False  False

---Checking Not Nulls ---
    name  Profession  Salary   Sale
0   True        True   False   True
1   True       False    True  False
2  False        True    True   True

Replace Nulls

We can also replace those Null values with a meaningful numbers. So, to replace nulls in pandas, use Python DataFrame fillna function or replace function.

import pandas as pd
import numpy as np

table = {'Age': [20, 35, np.nan],
         'Salary': [np.nan, 14000, 11000],
         'Sale': [422.19, np.nan, 44.55]
        }

data = pd.DataFrame(table)

print('\n---Fill Missing Values ---')
print(data.fillna(30))

print('\n---Replace Missing Values ---')
print(data.replace({np.nan:66}))

---Fill Missing Values ---
    Age   Salary    Sale
0  20.0     30.0  422.19
1  35.0  14000.0   30.00
2  30.0  11000.0   44.55

---Replace Missing Values ---
    Age   Salary    Sale
0  20.0     66.0  422.19
1  35.0  14000.0   66.00
2  66.0  11000.0   44.55

Pandas DataFrame pivot

The Data Frame has a pivot function, which is very useful to pivot the existing one.

import pandas as pd

table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 32, 30, 26],
         'Profession': ['Manager', 'HR', 'Analyst', 'Manager', 'HR'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190],
         'Salary':[12000, 10000, 8000, 14000, 11000]
	    }
data = pd.DataFrame(table)

print('\n--- After Pivot---')
data2 = data.pivot(index = 'name', columns = 'Profession', values = 'Salary')
print(data2)

print('\n--- After Pivot---')
data3 = data.pivot(index = 'name', columns = 'Profession')
print(data3)

--- After Pivot---
Profession  Analyst       HR  Manager
name                                 
John            NaN  10000.0      NaN
Kane            NaN      NaN  12000.0
Mike         8000.0      NaN      NaN
Suresh          NaN      NaN  14000.0
Tracy           NaN  11000.0      NaN

--- After Pivot---
               Age                  Sale  ...          Salary                  
Profession Analyst    HR Manager Analyst  ... Manager Analyst       HR  Manager
name                                      ...                                  
John           NaN  25.0     NaN     NaN  ...     NaN     NaN  10000.0      NaN
Kane           NaN   NaN    35.0     NaN  ...  422.19     NaN      NaN  12000.0
Mike          32.0   NaN     NaN   12.66  ...     NaN  8000.0      NaN      NaN
Suresh         NaN   NaN    30.0     NaN  ...  119.47     NaN      NaN  14000.0
Tracy          NaN  26.0     NaN     NaN  ...     NaN     NaN  11000.0      NaN

[5 rows x 9 columns]

How to save DataFrame to CSV and Text File?

To load data from a Pandas Data Frame to a csv file or text file, you have to use the to_csv function.

import pandas as pd

table = {'name': ['Kane', 'John', 'Mike', 'Suresh', 'Tracy'],
         'Age': [35, 25, 32, 30, 26],
         'Profession': ['Manager', 'HR', 'Analyst', 'Manager', 'HR'],
         'Sale':[422.19, 22.55, 12.66, 119.470, 200.190],
         'Salary':[12000, 10000, 8000, 14000, 11000]
	    }
data = pd.DataFrame(table)
print(data)
# load to text file
data.to_csv('user_info.txt') 
# load  to csv file with comma separator
data.to_csv('user_info.csv')
# load data to csv file with Tab separator
data.to_csv('user_info_new.csv', sep = '\t')
Pandas Save DataFrame to CSV and Text File

Iterate over Python DataFrame Rows

Use any of the three functions iteritems, iterrows, and itertuple to iterate over rows and returns each row. For more information, please refer to the pandas module.

import pandas as pd

table = {'name': ['Kane', 'John', 'Mike'],
         'Age': [35, 25, 32],
         'Profession': ['Manager', 'HR', 'Analyst'],
         'Sale':[422.19, 119.470, 200.190],
         'Salary':[12000, 14000, 11000]
	    }
data = pd.DataFrame(table)

print('\n---Iterating Rows---')
for rows, columns in data.iterrows():
    print(rows, columns)
    print()

---Iterating Rows---
0 name             Kane
Age                35
Profession    Manager
Sale           422.19
Salary          12000
Name: 0, dtype: object

1 name            John
Age               25
Profession        HR
Sale          119.47
Salary         14000
Name: 1, dtype: object

2 name             Mike
Age                32
Profession    Analyst
Sale           200.19
Salary          11000
Name: 2, dtype: object

>>>