Python matplotlib Scatter Plot

The Python matplotlib scatter plot is a two dimensional graphical representation of the data. A Python scatter plot is useful to display the correlation between two numerical data values or two sets of data. In general, we use this Python matplotlib scatter plot to analyze the relationship between two numerical data points by drawing a regression line.

The matplotlib pyplot module has a function, which will draw or generate a scatter plot in Python. The basic syntax to draw matplotlib pyplot scatter plot is

matplotlib.pyplot.scatter(x, y)
  • x: list of arguments that represents the X-axis.
  • y: List of arguments represents Y-Axis.

Python matplotlib Scatter Plot Examples

This is a simple python scatter plot example where we declared two lists of random numeric values. Next, we used the pyplot function to draw a scatter plot of x against y.

import matplotlib.pyplot as plt

x = [1, 9, 5, 3, 8, 6, 2, 4, 7]

y = [22, 4, 40, 27, 33, 15, 5, 20, 30]

plt.scatter(x, y)
plt.show()
Python matplotlib Scatter Plot 1

Here, we used Python randint function to generate 50 random integer values from 5 to 50 and 100 to 1000 for x and y. Next, we draw the Python scatter plot.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(5, 50, 50)

y = np.random.randint(100, 1000, 50)

print(x)
print(y)

plt.scatter(x, y)

plt.show()
Create scatter using Random numpy numbers

matplotlib Scatter Chart using CSV

In this example, we were reading the CSV file and converted it into DataFarme. Next, we are drawing a Python matplotlib scatter plot by using Profit in X-Axis and Sales in Y-Axis.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

sales_data = df.groupby('Order Date')[['Sales', 'Profit']].sum()

print(sales_data.sort_values(by = ['Profit']))

plt.scatter(sales_data['Profit'], sales_data['Sales'])

plt.show()
Python matplotlib Scatter Plot using CSV File 3

scatter chart titles

We already mentioned in previous charts about labeling the charts. In this Python matplotlib scatter plot example, we used the xlable, ylabel, and title functions to show X-Axis, Y-Axis labels, and chart titles.

plt.title('Example')
plt.xlabel('Profit')
plt.ylabel('Global Sales')
plt.show()
Add Titles, X and Y axis names to scatter

Python Scatter plot color and Marker

In all our previous examples, you can see the default color of blue. However, you can change the marker colors using color argument, and the opacity by alpha argument. In this Python scatter plot example, we change the marker color to red and opacity to 0.3 (bit lite).

Apart from this, you can use markers argument to change the default marker shape. Here, we changed the shape of the marker to *. I suggest you refer matplotlib article to understand the list of available markers.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = 'red',
            marker = '*', alpha = 0.3)

plt.title('Example')
plt.show()
Change Scatter Marker type and color

Here, we are trying to showcase three other available markers in it.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(5, 50, 50)

y = np.random.randint(100, 1000, 50)

fix, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize = (8, 4))

ax1.scatter(x, y, marker = '+', color = 'red')
ax2.scatter(x, y, marker = '^', color = 'blue')
ax3.scatter(x, y, marker = '$\clubsuit$', color = 'green',
            alpha = 0.5)

plt.show()
Python matplotlib multiple Scatter Plots

In the previous Python scatter plot examples, we used a single color for all the markers associated with the axis values. However, you can use multiple colors or individual colors to each marker using the color argument.

Here, we defined two Radom integer arrays and a random array for colors. Next, we assigned that colors array to c to generate random colors to markers.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(10, 100, 30)

y = np.random.randint(100, 10000, 30)

colors = np.random.rand(30)

plt.scatter(x, y, c = colors, alpha = 0.5, s = y/10)

plt.show()
Add multiple colors to Scatter Bubbles or Marks

It is another way of assigning different colors to the markers. Apart from the above, you can also define a gradient to the markers (for example, rainbow) using the color and cmap arguments. To do this, first, you have to assign the list of values that define the marker color as a c argument. Second, you have to define the cmap color (gradient that you want to use), as we defined below.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Sales', 'Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            c = market_data['Quantity'],cmap = 'gist_rainbow_r',
            marker = '*')

plt.title('Markers Example')

plt.show()
Python matplotlib Pyplot Scatter Plot 8

Python Scatter plot size and edge colors

The matplotlib scatter function has an s argument that defines the size of a marker. It accepts a static one value for all the markers or array like values. Here, we assigned 150 as a marker size, which means all the markers will size to that value.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = 'green', marker = '*', alpha = 0.5,
            s = 150)

plt.title('size and edge colors')

plt.show()
Change the Scatter chart size and edge colors 9

In this Python scatter plot example, we assigned y/10 as the s values. It means each marker value will be different, and it entirely based on y value.

import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

x = np.random.randint(10, 100, 30)

y = np.random.randint(100, 10000, 30)

colors = np.random.rand(30)

plt.scatter(x, y, c = colors, alpha = 0.6, s = y/10)

plt.show()
Change Scatter Mark Value based on y-axis value 10

Let me take a CSV file example. Here, we draw it using Profit and Sales values. Next, we defined the size of the marker based on the profit. It means marker size will increase when the profit is more.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

sales_data = df.groupby('Region')[['Sales', 'Profit']].sum()

print(sales_data.sort_values(by = ['Profit']))

plt.scatter(sales_data['Profit'], sales_data['Sales'], marker = 'o',
            color = 'r', s = sales_data['Profit']/ 1000)

plt.show()
Change Scatter Mark size based on Profit 11

The linewidths argument accepts a scalar value or array, and the default value is None. This linewidths argument defines the width of marker edges. The edgecolors argument allows choosing the line edge color of the markers. In this example, we assigned the line width as 1.1 and the edge colour to green.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = 'red', marker = '*', alpha = 0.6,
            s = 100,  linewidths = 1.1, edgecolors = 'g')

plt.title('Line Width Example')

plt.show()
Python matplotlib Scatter Plot linewidths 12

Multiple scatter Charts

The Python matplotlib scatter function also allows you to draw multiple plot values. First, we are plotting y against x, and then we are plotting z against x. It will display z and y values agist x in one chart, and to differentiate them. We used red and blue colors.

import matplotlib.pyplot as plt

x = [1, 9, 5, 3, 8, 6, 2, 4, 7]

y = [22, 4, 40, 27, 33, 15, 5, 20, 30]

z = [16, 35, 4, 19, 20, 40, 35, 7, 12]

plt.scatter(x, y, color = 'blue')
plt.scatter(x, z, color = 'red')
plt.show()
Multiple Python matplotlib Scatter Plot 13

It is another example of drawing multiple plots. However, this time, we are using the CSV file where we are comparing the Region and Market Sales against the Profits.

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

region_data = df.groupby('Region')[['Sales', 'Profit']].sum()

market_data = df.groupby('Market')[['Sales', 'Profit']].sum()

plt.scatter(region_data['Profit'], region_data['Sales'],
            s= 100, marker = '*', color = 'yellow',
             linewidths = 1.1, edgecolors = 'g')

plt.scatter(market_data['Profit'], market_data['Sales'],
            s =100, marker = 'o', color = 'r')

plt.title('Multiple ones')

plt.show()
Multiple Scatters sharing same X and Y-axis with different Marks 14

Add legend to Scatter Chart

As you can see from the above screenshot, you might not know or identify which markers represent the Region’s Sales and Market. To resolve this, you can use the legend function to add a legend to the Python matplotlib Scatter plot.

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

region_data = df.groupby('Region')[['Sales', 'Profit']].sum()

market_data = df.groupby('Market')[['Sales', 'Profit']].sum()

plt.scatter(region_data['Profit'], region_data['Sales'],
            label = 'Region Sales',
            s= 100, marker = '$\heartsuit$', color = 'b',
             linewidths = 1.2, edgecolors = 'g')

plt.scatter(market_data['Profit'], market_data['Sales'],
            label = 'Market Sales',
            s =100, marker = '$\clubsuit$', color = 'r')

plt.legend()
plt.show()
Add Legend to Scatter Chart 15

Highlight Area in a Python Scatter plot

In some situations, you might need to focus on a particular location or area within the Python scatter plot. So, you need to highlight that particular area for better focus. For this, all you need to do is add patches to an existing one. In this example, we are adding a rectangle to highlight the area.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

fig, ax = plt.subplots()

ax.scatter(market_data['Quantity'], market_data['Profit'], 
            color = '#A90303', marker = '*', alpha = 0.6,
            s = 100,  linewidths = 1.1, edgecolors = '#A4F5AF')

ax.add_patch(patches.Rectangle((50, -50), 100, 2000, alpha = 0.3))

plt.show()
Python matplotlib Scatter Plot 16

Similarly, we can add a circle to the area. Apart from this, we can format that circle to view it better. In this example, we add a circle to this chart of random values and then format the color, line widths, etc.

import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

x = np.random.randint(10, 100, 30)

y = np.random.randint(10, 101, 30)

colors = np.random.rand(30)

fig, ax = plt.subplots()

ax.scatter(x, y, c = colors, alpha = 0.5, s = y*10)

ax.add_patch(
    patches.Circle((40, 60), 20, alpha = 0.3,
                   edgecolor = 'red', facecolor = 'yellowgreen',
                   linewidth = 2, linestyle = 'solid'))

plt.show()
Python matplotlib Scatter Plot 17

By using the axvline function, you can add a vertical line inside a Python matplotlib scatter plot. Similarly, use the axhline to add a horizontal line.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = '#A90303', marker = '*', alpha = 0.6,
            s = 100,  linewidths = 1.1, edgecolors = '#A4F5AF')

plt.axvline(150, color = 'b')
plt.axhline(1000, color = 'red')
plt.show()
Python matplotlib Scatter Plot 18