Python matplotlib Scatter Plot

The Python matplotlib scatter plot is a two dimensional graphical representation of the data. A Python scatter plot is useful to display the correlation between two numerical data values or two data sets. In general, we use this Python matplotlib scatter plot to analyze the relationship between two numerical data points by drawing a regression line.

The matplotlib pyplot module has a scatter function, which will draw or generate a scatter plot in Python. The basic syntax to draw matplotlib pyplot scatter plot is

matplotlib.pyplot.scatter(x, y)
  • x: list of arguments that represents the X-axis.
  • y: List of arguments represents Y-Axis.

Python matplotlib Scatter Plot Examples

This is a simple python scatter plot example where we declared two lists of random numeric values. Next, we used the pyplot scatter function to draw a scatter plot of x against y.

import matplotlib.pyplot as plt

x = [1, 9, 5, 3, 8, 6, 2, 4, 7]

y = [22, 4, 40, 27, 33, 15, 5, 20, 30]

plt.scatter(x, y)
plt.show()
Python matplotlib Scatter Plot 1

Here, we used Python randint function to generate 50 random integer values from 5 to 50 and 100 to 1000 for x and y. Next, we draw the Python scatter plot.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(5, 50, 50)

y = np.random.randint(100, 1000, 50)

print(x)
print(y)

plt.scatter(x, y)

plt.show()
Python matplotlib Scatter Plot 2

matplotlib Scatter Chart using CSV

In this example, we were reading the CSV file and converted it into DataFarme. Next, we are drawing a Python matplotlib scatter plot by using Profit in X-Axis and Sales in Y-Axis.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

sales_data = df.groupby('Order Date')[['Sales', 'Profit']].sum()

print(sales_data.sort_values(by = ['Profit']))

plt.scatter(sales_data['Profit'], sales_data['Sales'])

plt.show()
Python matplotlib Scatter Plot 3

matplotlib Scatter Chart titles

We already mentioned in previous charts about labeling the charts. In this Python matplotlib scatter plot example, we used the xlable, ylabel, and title functions to show X-Axis, Y-Axis labels, and chart titles.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

sales_data = df.groupby('Order Date')[['Sales', 'Profit']].sum()

print(sales_data.sort_values(by = ['Profit']))

plt.scatter(sales_data['Profit'], sales_data['Sales'])

plt.title('Matplotlib Scatter Plot')
plt.xlabel('Profit')
plt.ylabel('Global Sales')
plt.show()
Python matplotlib Scatter Plot 4

Python Scatter plot color and Marker

In all our previous examples, you can see the default color of blue. However, you can change the marker colors using color argument, and the opacity by alpha argument. In this Python scatter plot example, we change the marker color to red and opacity to 0.3 (bit lite). 

Apart from this, you can use markers argument to change the default marker shape. Here, we changed the shape of the marker to *. I suggest you refer matplotlib article to understand the list of available markers.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = 'red',
            marker = '*', alpha = 0.3)

plt.title('Matplotlib Scatter Plot')
plt.show()
Python matplotlib Scatter Plot 5

Here, we are trying to showcase three other available markers in Python scatter plot.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(5, 50, 50)

y = np.random.randint(100, 1000, 50)

fix, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize = (8, 4))

ax1.scatter(x, y, marker = '+', color = 'red')
ax2.scatter(x, y, marker = '^', color = 'blue')
ax3.scatter(x, y, marker = '$\clubsuit$', color = 'green',
            alpha = 0.5)

plt.show()
Python matplotlib Scatter Plot 6

In the previous Python scatter plot examples, we used a single color for all the markers associated with the axis values. However, you can use multiple colors or individual colors to each marker using the color argument. Here, we defined two Radom integer arrays and a random array for colors. Next, we assigned that colors array to c to generate random colors to markers.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(10, 100, 30)

y = np.random.randint(100, 10000, 30)

colors = np.random.rand(30)

plt.scatter(x, y, c = colors, alpha = 0.5, s = y/10)

plt.show()
Python matplotlib Scatter Plot 7

It is another way of assigning different colors to the matplotlib scatter plot markers. Apart from the above, you can also define a gradient color to the markers (for example, rainbow colors) using the color and cmap arguments. To do this, first, you have to assign the list of values that define the marker color as a c argument. Second, you have to define the cmap color (gradient color that you want to use), as we defined below.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Sales', 'Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            c = market_data['Quantity'],cmap = 'gist_rainbow_r',
            marker = '*')

plt.title('Matplotlib Scatter Plot')

plt.show()
Python matplotlib Scatter Plot 8

Python Scatter plot size and edge colors

The matplotlib scatter function has an s argument that defines the size of a marker. It accepts a static one value for all the markers or array like values. Here, we assigned 150 as a marker size, which means all the markers will size to that value.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = 'green', marker = '*', alpha = 0.5,
            s = 150)

plt.title('Matplotlib Scatter Plot')

plt.show()
Python matplotlib Scatter Plot 9

In this Python scatter plot example, we assigned y/10 as the s values. It means each marker value will be different, and it entirely based on y value.

import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

x = np.random.randint(10, 100, 30)

y = np.random.randint(100, 10000, 30)

colors = np.random.rand(30)

plt.scatter(x, y, c = colors, alpha = 0.6, s = y/10)

plt.show()
Python matplotlib Scatter Plot 10

Let me take a CSV file example. Here, we draw a scatter plot using Profit and Sales values. Next, we defined the size of the marker based on the profit. It means marker size will increase when the profit is more.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

sales_data = df.groupby('Region')[['Sales', 'Profit']].sum()

print(sales_data.sort_values(by = ['Profit']))

plt.scatter(sales_data['Profit'], sales_data['Sales'], marker = 'o',
            color = 'r', s = sales_data['Profit']/ 1000)

plt.show()
Python matplotlib Scatter Plot 11

The scatter function linewidths argument accepts a scalar value or array, and the default value is None. This linewidths argument defines the width of marker edges. The edgecolors argument allows choosing the line edge color of the markers. In this example, we assigned the line width as 1.1 and the edge color to green. 

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = 'red', marker = '*', alpha = 0.6,
            s = 100,  linewidths = 1.1, edgecolors = 'g')

plt.title('Matplotlib Scatter Plot')

plt.show()
Python matplotlib Scatter Plot 12

Multiple scatter plots in Python

The matplotlib scatter function also allows you to plot multiple values. First, we are plotting y against x, and then we are plotting z against x. It will display z and y values agist x in one chart, and to differentiate them. We used red and blue colors.

import matplotlib.pyplot as plt

x = [1, 9, 5, 3, 8, 6, 2, 4, 7]

y = [22, 4, 40, 27, 33, 15, 5, 20, 30]

z = [16, 35, 4, 19, 20, 40, 35, 7, 12]

plt.scatter(x, y, color = 'blue')
plt.scatter(x, z, color = 'red')
plt.show()
Python matplotlib Scatter Plot 13

It is another example of drawing multiple scatter plots. However, this time, we are using the CSV file where we are comparing the Region and Market Sales against the Profits.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

region_data = df.groupby('Region')[['Sales', 'Profit']].sum()

market_data = df.groupby('Market')[['Sales', 'Profit']].sum()

plt.scatter(region_data['Profit'], region_data['Sales'],
            s= 100, marker = '*', color = 'yellow',
             linewidths = 1.1, edgecolors = 'g')

plt.scatter(market_data['Profit'], market_data['Sales'],
            s =100, marker = 'o', color = 'r')

plt.title('Matplotlib Scatter Plot')

plt.show()
Python matplotlib Scatter Plot 14

Python matplotlib Scatter plot legend

As you can see from the above screenshot, you might not know or identify which markers represent the Region Sales and Market. To resolve this, you can use the legend function.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

region_data = df.groupby('Region')[['Sales', 'Profit']].sum()

market_data = df.groupby('Market')[['Sales', 'Profit']].sum()

plt.scatter(region_data['Profit'], region_data['Sales'],
            label = 'Region Sales',
            s= 100, marker = '$\heartsuit$', color = 'b',
             linewidths = 1.2, edgecolors = 'g')

plt.scatter(market_data['Profit'], market_data['Sales'],
            label = 'Market Sales',
            s =100, marker = '$\clubsuit$', color = 'r')

plt.legend()
plt.show()
Python matplotlib Scatter Plot 15

Highlight Area in a Python Scatter plot

In some situations, you might need to focus on a particular location or area within the Python scatter plot. So, you need to highlight that particular area for better focus. For this, all you need to do is add patches to an existing matplotlib scatter plot. In this example, we are adding a rectangle to highlight the area.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

fig, ax = plt.subplots()

ax.scatter(market_data['Quantity'], market_data['Profit'], 
            color = '#A90303', marker = '*', alpha = 0.6,
            s = 100,  linewidths = 1.1, edgecolors = '#A4F5AF')

ax.add_patch(patches.Rectangle((50, -50), 100, 2000, alpha = 0.3))

plt.show()
Python matplotlib Scatter Plot 16

Similarly, we can add a circle to the matplotlib scatter plot area. Apart from this, we can format that circle to view it better. In this example, we add a circle to the scatter plot of random values and then format the color, line widths, etc.

import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

x = np.random.randint(10, 100, 30)

y = np.random.randint(10, 101, 30)

colors = np.random.rand(30)

fig, ax = plt.subplots()

ax.scatter(x, y, c = colors, alpha = 0.5, s = y*10)

ax.add_patch(
    patches.Circle((40, 60), 20, alpha = 0.3,
                   edgecolor = 'red', facecolor = 'yellowgreen',
                   linewidth = 2, linestyle = 'solid'))

plt.show()
Python matplotlib Scatter Plot 17

By using the axvline function, you can add a vertical line inside a Python matplotlib scatter plot. Similarly, use the axhline to add a horizontal line.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = '#A90303', marker = '*', alpha = 0.6,
            s = 100,  linewidths = 1.1, edgecolors = '#A4F5AF')

plt.axvline(150, color = 'b')
plt.axhline(1000, color = 'red')
plt.show()
Python matplotlib Scatter Plot 18