Pandas DataFrame plot

The pandas DataFrame plot function in Python to used to plot or draw charts as we generate in matplotlib. You can use this Python pandas plot function on both the Series and DataFrame. The list of Python charts that you can plot using this pandas DataFrame plot function are area, bar, barh, box, density, hexbin, hist, kde, line, pie, scatter. The list of available parameters that are accepted by the Python pandas DataFrame plot function.

  • x: The default value is None. If data is a DataFrame, assign x value.
  • y (default = None): It allows plotting one column vs. another.
  • Kind: It accepts string value specifying the plot or chart you want. They are area, bar, barh, box, density, hexbin, hist, KDE, line, pie, scatter.
  • figsize: A (width, height) tuple in inches.
  • use_index (default = True): It accepts a boolean value. Use index as tickets for the x-axis.
  • title: Assign title to a chart.
  • grid: Gridlines for Axis and the default value is None
  • legend: It accepts True, False, or ‘reverse’.
  • style: Accepts list or dictionary. Line style per column.
  • logx, logy, loglog: Use logx for scaling on the x-axis, logy for scaling y-axis, and loglog for scaling both the x and y-axis
  • xticks: Sequential values for xticks.
  • yticks: Sequence values for yticks.
  • xlim, ylim: 2-tuple or List.
  • rot: Rotation for xticks and yticks. xticks for vertical and yticks for horizontal plots.
  • fontsize: Specify integer value to decide the font size for both xticks and yticks.
  • colormap: matplotlib or str colormap object. Use this to select a color.
  • colorbar: Use this for scatter and hexbin plot by setting this to True.
  • position: Specify the alignment of the bar plot layout. You can specify any value between 0 and 1, and the default value is 0.5. Here, 0 means left bottom end, and 1 means the right top end.
  • table: This accepts boolean values, and the default value is False. If you set this True, it draws a table with the matplotlib default layout.
  • yerr: Series, DataFrame, dictionary, an array-like, and str.
  • xerr: Series, DataFrame, dictionary, an array-like, and str.
  • mark_right: By default, it set to True. When we are using a secondary y-axis, it automatically marks the column labels to the right side.
  • **kwds: Keywords.

Python Pandas DataFrame Plot Function Examples

The following list of examples helps you to use this Python Pandas DataFrame plot function to create or generate area, bar, barh, box, density, hexbin, hist, KDE, line, pie, scatter plots.

Let me show you the Sql Server data that we use for these examples. Please refer to Data for Python Charts article in Python.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt

conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD;
                      Database=SQL Tutorial ; Trusted_Connection=yes;''')

string = ( ''' SELECT EmpID ,FirstName, LastName ,Education, Occupation, YearlyIncome, Sales2019, 
                    Sales2018, Sales2017, Orders FROM EmployeeSales''')

query = pd.read_sql_query(string, conn)
df = pd.DataFrame(query)
print(df)
Pandas DataFrame Plot 1

We will use the above-specified DataFrame inside a Python Pandas plot function. As you can see, we are using the Occupation as the X-axis value and Sale2019 as Y-Axis value, but we haven’t specified any kind. In this situation, dataframe plot function decides itself and plots a chart based on the data.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt

conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD;
                      Database=SQL Tutorial ; Trusted_Connection=yes;''')

string = ( ''' SELECT EmpID ,FirstName, LastName ,Education, Occupation, YearlyIncome, Sales2019, 
                    Sales2018, Sales2017, Orders FROM EmployeeSales''')

query = pd.read_sql_query(string, conn)
df = pd.DataFrame(query)

df.plot(x = 'Occupation', y = 'Sales2019')
plt.show()
Pandas DataFrame Plot 2

In the coming Python Pandas DataFrame plot function examples, we only mention the code that we changed (this saves us some space). However, you can see the complete code in the output image. Hope you don’t mind :)

Python Pandas DataFrame Bar plot

The Python Pandas Bar plot is to visualize the categorical data using rectangular bars. You can also use this to compare one bar against the other. To generate the DataFrame bar plot, we have specified the kind parameter value as ‘bar’. To demonstrate the bar plot, we assigned Occupation as X-axis value and Sales2019 as Y-axis.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt

conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD;
                      Database=SQL Tutorial ; Trusted_Connection=yes;''')ac

string = ( ''' SELECT EmpID ,FirstName, LastName ,Education, Occupation, YearlyIncome,
                    Sales2019, Sales2018, Sales2017, Orders FROM EmployeeSales''')

query = pd.read_sql_query(string, conn)
df = pd.DataFrame(query)

df.plot(x = 'Occupation', y = 'Sales2019', kind = 'bar')
plt.show()
Pandas DataFrame Bar Plot 1

If we are generating a Python Pandas DataFrame bar chart or plot with unique names in the X-axis (something like Sales2019 against Full Name, which is unique), then the above code probably works for you. Here, we want to display the Sales based on the Employee Occupation so, we need to group those Occupations. For this, we have to use the DataFrame groupby function.

groupby_Occupation = df.groupby('Occupation')['Sales2019'].sum()
groupby_Occupation.plot(kind = 'bar')
plt.show()
Pandas Bar Plot 2

If you want to use multiple measures or compare Sales this year vs. last year, then you try the below-shown way.

groupby_Occupation = df.groupby('Occupation')['Sales2019', 'Sales2018'].sum()
groupby_Occupation.plot(kind = 'bar')
plt.show()
Pandas Bar Plot 3

By using the subplots parameter, you can divide the bar chart into 2 subparts. For this, you have to specify subplots = True. Here, we used df.plot.bar, not df.plot(kind = ‘bar’) because they both return the same result.

groupby_Occupation = df.groupby('Occupation')['Sales2019', 'Sales2018'].sum()
groupby_Occupation.plot.bar(subplots = True)
plt.show()
Pandas Bar Plot 4

Let me do some quick formatting using the above-specified parameters

groupby_Occupation = df.groupby('Occupation')['Sales2019', 'Sales2018'].sum()
groupby_Occupation.plot.bar(title = 'Bar Plot', grid = True, fontsize = 7, position = 1)
plt.show()
Pandas DataFrame Bar Plot 5

Python Pandas DataFrame Horizontal plot

The Pandas DataFrame plot barh function allows you to plot a horizontal bar chart. You can use these kinds of DataFrame Horizontal Bar plots to visualize quantitative data in rectangular bars.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt

conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD;
                      Database=SQL Tutorial ; Trusted_Connection=yes;''')

string = ( ''' SELECT EmpID ,FirstName, LastName ,Education, Occupation, YearlyIncome,
                    Sales2019, Sales2018, Sales2017, Orders FROM EmployeeSales''')

query = pd.read_sql_query(string, conn)
df = pd.DataFrame(query)

groupby_Occ = df.groupby('Occupation')['Sales2019'].sum()
groupby_Occ.plot.barh(x = 'Occupation', y = 'Sale2019' )
plt.show()
Pandas DataFrame Barh Plot 1

Let me use multiple Numeric values as Horizontal bar columns. Here, we have also separated the columns using subplots.

groupby_Occ = df.groupby('Occupation')['Sales2019', 'Sales2018'].sum()
groupby_Occ.plot.barh(title = ['Sales 2019 Bar Plot', 'Sales 2018 Bar Plot'], subplots = True, legend = False)
plt.show()
Pandas Horizontal Bar Plot 2

Python Pandas DataFrame Area plot

The Pandas Area plot is to visualize the quantitative data. It’s a kind of line chart. However, it fills the empty area.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt
conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD;
                      Database=SQL Tutorial ; Trusted_Connection=yes;''')
string = ( ''' SELECT EmpID ,FirstName, LastName ,Education, Occupation, YearlyIncome,
                    Sales2019, Sales2018, Sales2017, Orders FROM EmployeeSales''')
query = pd.read_sql_query(string, conn)
df = pd.DataFrame(query)

groupby_Occ = df.groupby('Occupation')['Orders'].sum()
groupby_Occ.plot(x = 'Occupation', y = 'Sale2019', kind = 'area' )
plt.show()
Pandas DataFrame Area Plot 1

This time, we are grouping Occupation by Sales 2019, 2018, 2017. Next, we are drawing the Pandas area chart using the DataFrame plot area function.

groupby_Occ = df.groupby('Occupation')['Sales2019', 'Sales2018', 'Sales2017'].sum()
groupby_Occ.plot.area(title = 'Occupation Vs Sales Area Plot', legend = True, color = ['r', 'b', 'g'])
plt.show()
Pandas Area Plot 2

Python Pandas DataFrame Box plot

The Pandas Box plot is to create a box plot from a given DataFrame. Use this DataFrame box plot to visualize the data using their quartiles. In this example, we created a DataFrame of random 50 rows and 5 columns and assigned column names from A to E. By using those values, we generated a Pandas boxplot with the help of plot function along with kind = ‘box’.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

table = np.random.randn(50, 5)
data = pd.DataFrame(table, columns = ['A', 'B', 'C', 'D', 'E'])

data.plot(kind = 'box')
plt.show()
Pandas DataFrame Box Plot 1

Here, we used FirstName, Sales2019, Sales2018, and Sales2017 columns from the Employees table to draw a box plot. For this, we used pandas box function.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt
conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD;
                      Database=SQL Tutorial ; Trusted_Connection=yes;''')
string = ( ''' SELECT FirstName, Sales2019, Sales2018, Sales2017 FROM EmployeeSales''')
query = pd.read_sql_query(string, conn)
df = pd.DataFrame(query)

df.plot.box(title = 'Box Plot')
plt.show()
Pandas Box Plot 2

Python Pandas DataFrame hexbin plot

The Pandas hexbin plot is to generate or plot a hexagonal binning plot.

First, we used Numpy random randn function to generate random numbers of size 1000 * 2. Next, we used DataFrame function to convert that to a DataFrame with column names A and B. data.plot(x = ‘A’, y = ‘B’, kind = ‘hexbin’, gridsize = 20) creates a hexabin or hexadecimal bin plot using those random values.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

table = np.random.randn(1000, 2)
data = pd.DataFrame(table, columns = ['A', 'B'])

data.plot(x = 'A', y = 'B', kind = 'hexbin', gridsize = 20)
plt.show()
Pandas DataFrame Hexbin Plot 1

Currently, we don’t have better data in our current table to display the Pandas hexadecimal bin plot. So, we used Sales 2018 vs. 2017 with grid size = 25.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt

conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD; Database=SQL Tutorial ; Trusted_Connection=yes;''')

string = ( ''' SELECT Sales2018, Sales2017, Orders FROM EmployeeSales''')
query = pd.read_sql_query(string, conn)
df = pd.DataFrame(query)

df.plot.hexbin(y = 'Sales2018', x = 'Sales2017',gridsize = 25,  title = 'Hexbin Plot')
plt.show()
Pandas Hexbin Plot 2

Python Pandas DataFrame hist plot

The Pandas hist plot is to draw or generate a histogram of distributed data. In this example, we generated random values for x and y columns using random randn function. Next, we used the Pandas hist function not generate a histogram in Python.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

A= np.random.randn(1000)
B= np.random.randn(1000) + 1
data = pd.DataFrame({'x':A, 'y': B}, columns = ['x', 'y'])

data.plot.hist()
plt.show()
Pandas DataFrame Histogram Plot 1

This time we changed the total number of bins to 10 and added one more column to the DataFrame.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

A= np.random.randn(1000)
B= np.random.randn(1000) + 1
C= np.random.randn(1000) - 2
data = pd.DataFrame({'x':A, 'y': B, 'z': C}, columns = ['x', 'y', 'z'])

data.plot.hist(bins = 10)
plt.show()
Pandas Hist Plot 2

Pandas DataFrame kde plot

The Pandas kde plot generates or plots the Kernel Density Estimate plot (in short kde) using Gaussian Kernels. First, we used Numpy random function to generate random numbers of size 10. Next, we are using the Pandas Series function to create Series using that numbers. Finally, data.plot(kind = ‘kde’) generate a kde or density plot using that numbers.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

table = np.random.randn(10)
data = pd.Series(table)
print(data)

data.plot(kind = 'kde')
plt.show()
Pandas DataFrame kde Plot 1

Let me draw a Python Pandas density plot for the last three years Sales in the Employees table DataFrame.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt

conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD;
                      Database=SQL Tutorial ; Trusted_Connection=yes;''')

string = ( ''' SELECT Sales2019, Sales2018, Sales2017 FROM EmployeeSales''')
query = pd.read_sql_query(string, conn)

df = pd.DataFrame(query)
df.plot.kde(title = 'Density Plot')
plt.show()
Pandas DataFrame Desnity Plot 2

Python Pandas DataFrame Line plot

The Pandas Line plot is to plot lines from a given data. Either you can use this line DataFrame to draw one dimension against a single measure or multiple measures. Here, we drew the Pandas line for employee’s education against the Orders.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt
conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD;
                      Database=SQL Tutorial ; Trusted_Connection=yes;''')
string = ( ''' SELECT EmpID ,FirstName, LastName ,Education, Occupation, YearlyIncome,
                    Sales2019, Sales2018, Sales2017, Orders FROM EmployeeSales''')
query = pd.read_sql_query(string, conn)
df = pd.DataFrame(query)

groupby_Occ = df.groupby('Education')['Orders'].sum()
groupby_Occ.plot(x = 'Education', y = 'Orders', kind = 'line',
                 title = 'Orders Vs Education  Line Plot', legend = False)
plt.show()
Pandas DataFrame Line Plot 1

We are grouping Education by Sales 2019, 2018, 2017. Next, we have drawn a line chart using the DataFrame plot line function. You can also use subplots = True inside the line function to separate those Sales lines.

groupby_Occ = df.groupby('Education')['Sales2019', 'Sales2018', 'Sales2017'].sum()
groupby_Occ.plot.line(title = 'Sales Vs Education  Line Plot')
plt.show()
Pandas Line Plot 2

Pandas DataFrame Pie plot

The Pandas Pie plot is to draw a pie chart. It slices a pie based on the numeric data column passed to it. Here, we generated a Pandas Pie chart using plot function where x = Occupation and y = Sales2019.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt
conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD;
                      Database=SQL Tutorial ; Trusted_Connection=yes;''')
string = ( ''' SELECT EmpID ,FirstName, LastName ,Education, Occupation, YearlyIncome,
                    Sales2019, Sales2018, Sales2017, Orders FROM EmployeeSales''')
query = pd.read_sql_query(string, conn)
df = pd.DataFrame(query)

groupby_Occ = df.groupby('Occupation')['Sales2019'].sum()
groupby_Occ.plot(x = 'Education', y = 'Orders', kind = 'pie',
                 title = 'Sales Vs Occupation Pie Plot', legend = True)
plt.show()
Pandas DataFrame Pie Plot 1

Python Pandas DataFrame Scatter plot

The Python Pandas DataFrame Scatter plot creates or plot marks based on the given data. Each mark defines the coordinates of X and Y-axis values from a DataFrame.

import pyodbc
import pandas as pd
import matplotlib.pyplot as plt
conn = pyodbc.connect('''Driver={SQL Server Native Client 11.0}; Server=PRASAD;
                      Database=SQL Tutorial ; Trusted_Connection=yes;''')
string = ( ''' SELECT EmpID ,FirstName, LastName ,Education, Occupation, YearlyIncome,
                    Sales2019, Sales2018, Sales2017, Orders FROM EmployeeSales''')
query = pd.read_sql_query(string, conn)
df = pd.DataFrame(query)

df.plot(x = 'EmpID', y = 'Sales2017', kind = 'scatter')
plt.show()
Pandas DataFrame Scatter Plot 1

Here, we used the X axis value as EmpID and y as Sales 2019. Next, we changed the color of the dots to green and size as well.

df.plot.scatter(x = 'EmpID', y = 'Sales2019', title = 'Scatter Plot',c = 'green', s = 24)
plt.show()
Pandas Scatter Plot 2