Python matplotlib Histogram

The Python matplotlib histogram looks similar to the pyplot bar chart. However, the data will equally distribute into bins. Each bin represents data intervals, and the matplotlib histogram shows the comparison of the frequency of numeric data against the bins.

In Python, you can use the Matplotlib library to plot histogram with the help of pyplot hist function. The hist syntax to draw matplotlib pyplot histogram in Python is

matplotlib.pyplot.pie(x, bins)

In the above Python matplotlib histogram syntax, x represents the numeric data that you want to use in the Y-Axis, and bins will use in the X-Axis.

Simple matplotlib Histogram Example

In this pyplot histogram example, we were generating a random array and assigned it to x. Next, we are drawing a python histogram using the hist function. Notice that we haven’t used the bins argument.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randn(1000)
print(x)

plt.hist(x)

plt.show()
create a Python histogram with equal bins 1

Since we are using the random array, the above image or screenshot might not be the same for you.

The first step to plot a histogram is creating equal width bins using the lower and upper range of values. However, in the above Python example, we haven’t used this argument so that the hist function will automatically create and used default bins.

Here, we used this argument number explicitly by assigning 20 to it. It means, below code will draw a hist of random numbers, and the data will equally distribute into 20 bins.

x = np.random.randn(1000)
print(x)

plt.hist(x, bins = 20)

plt.show()
create histograms 2

It is another example of the Python matplotlib pyplot histogram.

x = np.random.normal(0, 1, 1000)
print(x)

plt.hist(x, bins = 50)

plt.show()
Change hist bin size

Python matplotlib Histogram using CSV File

In this example, we are using the CSV file to plot a hist. As you can see from the below code, we are using the Orders quantity as the Y-Axis values.

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

data = df['Quantity']
bins=np.arange(min(data), max(data) + 1, 1)

print(df['Quantity'].count())
plt.hist(df['Quantity'], bins)

plt.show()
Python matplotlib Histogram using CSV File

Python matplotlib Histogram titles

This pyplot histogram example shows how to add the title, X-Axis, and Y-Axis labels.

In this example, we are also formatting the font size and color of the X and Y labels, X, and Y ticks. If you notice the below code, we are using the data whose Segment is equal to Consumer.

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

x = df['Segment'] == 'Consumer'
df.where(x, inplace = True)
print(df)

data = df['Quantity']
bins=np.arange(min(data), max(data) + 1, 1)

print(df['Quantity'].count())
plt.hist(df['Quantity'], bins)

plt.title('Example')
plt.xlabel('Bins', fontsize = 15, color = 'b')
plt.ylabel('Order Quatity', fontsize = 15, color = 'b')
plt.xticks(fontsize = 12)
plt.yticks(fontsize = 12)

plt.show()
traditional bar type histogram 6

Multiple Histograms

In this matplotlib example, we are trying to plot multiple histograms in Python.

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

dat = df['Quantity']
bins = np.arange(min(dat), max(dat) + 1, 1)

x = df.loc[df['Segment'] == 'Consumer']
y = df.loc[df['Segment'] == 'Corporate']
z = df.loc[df['Segment'] == 'Home Office']

plt.hist(x['Quantity'], bins)
plt.hist(y['Quantity'], bins)
plt.hist(z['Quantity'], bins)

plt.show()
Multiple Hists 20

Controlling pyplot hist bin size

It is the same matplotlib pyplot Histogram example that we shown above. However, this time, we changed the range of the bins to static value 5. It means the whole Quantity data distributed into five bins.

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

data = df['Quantity']
bn = 5

x = df.loc[df.Segment == 'Consumer', 'Quantity']
y = df.loc[df.Segment == 'Corporate', 'Quantity']
z = df.loc[df.Segment == 'Home Office', 'Quantity']

plt.hist(x,bn)
plt.hist(y, bn)
plt.hist(z, bn)

plt.show()
Controlling Python pyplot histogram bin size 7

As you can notice there is a difference in both of them. It is because of their change in the bin.

Python pyplot Histogram legend

While working with multiple values, it is necessary to identify which one belongs to which category. Otherwise, users will get confused. To solve these issues, you have to enable the legend by using the pyplot legend function. Next, use labels argument of the Python hist function to add labels to each one.

import numpy as np
import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

dat = df['Quantity']
bs = np.arange(min(dat), max(dat) + 1, 1)

x = df.loc[df['Segment'] == 'Consumer']
y = df.loc[df['Segment'] == 'Corporate']
z = df.loc[df['Segment'] == 'Home Office']

plt.hist(x['Quantity'], bs, label = 'Consumer')
plt.hist(y['Quantity'], bs, label = 'Corporate')
plt.hist(z['Quantity'], bs, label = 'Home Office')

plt.legend()
plt.show()
Histogram legend 8

Format Python matplotlib Histogram Colors

Whether it is one or more, Python matplotlib will automatically assign the default colors to the histogram. However, you can use the color argument of the pyplot hist function to alter the color. In this example, we are assigning maroon to the first, blue to second, and green to the third one.

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

da = df['Quantity']
bs = np.arange(min(da), max(da) + 1, 1)

x = df.loc[df['Segment'] == 'Consumer']
y = df.loc[df['Segment'] == 'Corporate']
z = df.loc[df['Segment'] == 'Home Office']

plt.hist(x['Quantity'], bs, label = 'Consumer', color = 'maroon')
plt.hist(y['Quantity'], bs, label = 'Corporate', color = 'blue')
plt.hist(z['Quantity'], bs, label = 'Home Office', color = 'green')

plt.legend()
plt.show()
Histogram Colors 9

Similarly, you can alter the bin edges colors and opacity using alpha and edgecolor argument.

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')
da = df['Quantity']
bins=np.arange(min(da), max(da) + 1, 1)

plt.hist(df['Quantity'], bins, color = 'red', alpha = 0.8, edgecolor = 'g')

plt.title('Example')
plt.xlabel('Bins', fontsize = 15, color = 'b')
plt.ylabel('Order Quatity', fontsize = 15, color = 'b')
plt.xticks(fontsize = 12)
plt.yticks(fontsize = 12)
plt.show()
Histogram edge colors and opacity 10

Python matplotlib Horizontal Histogram

The pyplot hist function has an orientation argument with two options, and they are horizontal and vertical (default). If you use this orientation argument as the horizontal, then the hist will be drawn horizontally.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

data = df['Quantity']
bins=np.arange(min(data), max(data) + 1, 1)

plt.hist(df['Quantity'], bins, color = 'red',
         alpha = 0.8, orientation = 'horizontal')

plt.title('Horizontal Example')
plt.show()
Horizontal Hist Sample

histtype

The pyplot histogram has a histtype argument, which is useful to change the type from one type to another. There are four types of hists available, and they are

  • bar: This is the traditional bar-type. If you use multiple data along with histtype as a bar, then those values are arranged side by side.
  • barstacked: When you use the multiple data, those values stacked on top of each other.
  • step: Hist without filling the bars. Something like a line chart or Waterfall chart.
  • stepfilled: Same as above, but the empty space filled with the default color.
df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

data = df['Quantity']
bins=np.arange(min(data), max(data) + 1, 1)

fix, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize = (8, 4))

x = df.loc[df['Segment'] == 'Consumer']
y = df.loc[df['Segment'] == 'Corporate']
z = df.loc[df['Segment'] == 'Home Office']

ax1.hist(x['Quantity'], bins, histtype = 'bar', color = 'red', alpha = 0.8, edgecolor = 'g')
ax2.hist(y['Quantity'], bins, color = 'blue', histtype = 'step')
ax3.hist(z['Quantity'], bins, color = 'green',histtype = 'stepfilled')

plt.show()
histtype argument for multiples 12

The log argument value accepts a boolean value, and its default is False. If you set this True, then the axis will be set on a log scale. Apart from log, there is one more argument called cumulative, which helps display the matplotlib cumulative histogram in Python.

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

data = df['Quantity']
bins=np.arange(min(data), max(data) + 1, 1)

plt.hist(df['Quantity'], bins, color = 'red', alpha = 0.8,
         cumulative = True, log = True)

plt.title('Horizontal Example')

plt.show()
Horizontal Histogram Example

Python matplotlib seaborn Histogram

In Python, we have a seaborn module, which helps to draw a hist gram along with a density curve. It is very simple and straightforward.

import matplotlib.pyplot as plt
import seaborn as sns

x = np.random.randn(1000)
print(x)

sns.distplot(x)

plt.show()
Python seaborn Histogram 14

Python matplotlib 2d Histogram

The Python pyplot has a hist2d function to draw a two dimensional or 2D. And to draw matplotlib 2D hist, you need two numerical arrays or array-like values.

x = np.random.randn(100)
print(x)

y = 2 * np.random.randn(100)
print(y)

plt.hist2d(x, y)

plt.show()
2D histogram 15

In this, we were using two subplots and changed the size.

x = np.random.randn(100)
print(x)

y = 2 * np.random.randn(100)
print(y)

fig, (ax1, ax2) = plt.subplots(1, 2)

ax1.hist2d(x, y, bins = 5)

ax2.hist2d(x, y, bins = 10)
plt.show()
Multiple 2D Histograms 16

It is another one of the Python Matplotlib 2D histogram.

x = np.random.randn(10000)
print(x)

y = 2 * np.random.randn(10000)
print(y)

fig, (ax1, ax2) = plt.subplots(1, 2)

ax1.hist2d(x, y, bins = (10, 10))

ax2.hist2d(x, y, bins = (200, 200))
plt.show()
Python matplotlib 2D Histograms 17

Let me change the colors of a histogram using the cmap argument.

x = np.random.randn(10000)
print(x)

y = 2 * np.random.randn(10000)
print(y)

fig, (ax1, ax2) = plt.subplots(1, 2)

ax1.hist2d(x, y, bins = (10, 10), cmap = 'cubehelix')

ax2.hist2d(x, y, bins = (200, 200), cmap = 'rainbow')
plt.show()
Change Histogram Colors using cmap 18

Python matplotlib Histograms of an Image

Apart from the above-specified ones, you can use the Python Matplotlib histograms to analyze the colors in an image. In this section, we use show the RGB colors In an image.

import cv2
from matplotlib import pyplot as plt

img = cv2.imread('/Users/suresh/Downloads/IMG_2065.JPG', 0)

plt.hist(img.ravel(), bins = 256, range = [0, 256])
plt.show()
hist from image