Python matplotlib Scatter Plot

The Python matplotlib pyplot scatter plot is a two-dimensional graphical representation of the data. A scatter plot is useful for displaying the correlation between two numerical data values or two data sets. In general, we use this scatter plot to analyze the relationship between two numerical data points by drawing a regression line.

The Python matplotlib pyplot module has a function that will draw or generate a scatter plot, and the basic syntax to draw it is

matplotlib.pyplot.scatter(x, y)
  • x: list of arguments that represents the X-axis.
  • y: List of arguments represents Y-Axis.

Python matplotlib pyplot Scatter Plot Examples

This is a simple scatter plot example where we declared two lists of random numeric values. Next, we used the Python matplotlib pyplot function to draw a scatter plot of x against y.

import matplotlib.pyplot as plt

x = [1, 9, 5, 3, 8, 6, 2, 4, 7]

y = [22, 4, 40, 27, 33, 15, 5, 20, 30]

plt.scatter(x, y)
plt.show()
Python matplotlib Scatter Plot 1

Here, we used Python randint function to generate 50 random integer values from 5 to 50 and 100 to 1000 for x and y. Next, we draw the scatter plot.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(5, 50, 50)

y = np.random.randint(100, 1000, 50)

print(x)
print(y)

plt.scatter(x, y)

plt.show()
Create using Random numpy numbers

Python matplotlib pyplot Scatter Chart using CSV

In this example, we read the CSV file and converted it into DataFarme. Next, we draw a scatter plot using Profit in X-Axis and Sales in Y-Axis.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

sales_data = df.groupby('Order Date')[['Sales', 'Profit']].sum()

print(sales_data.sort_values(by = ['Profit']))

plt.scatter(sales_data['Profit'], sales_data['Sales'])

plt.show()
Python matplotlib Scatter Plot using CSV File 3

scatter chart titles

We already mentioned in previous charts about labeling the charts. In this Python matplotlib scatter plot example, we used the xlable, ylabel, and title functions to show X-Axis, Y-Axis labels, and chart titles.

plt.title('Example')
plt.xlabel('Profit')
plt.ylabel('Global Sales')
plt.show()
Add Titles, X and Y axis names

Scatter plot color and Marker

In all our previous examples, you can see the default color of blue. However, you can change the marker colors using the color argument and the opacity by the alpha argument. In this scatter plot example, we change the marker color to red and opacity to 0.3 (bit lite).

Apart from this, you can use the markers argument to change the default marker shape. Here, we changed the shape of the marker to *. I suggest you refer matplotlib article to understand the list of available markers.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = 'red',
            marker = '*', alpha = 0.3)

plt.title('Example')
plt.show()
Change Marker type and color

Here, we are trying to showcase three other available markers in it.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(5, 50, 50)

y = np.random.randint(100, 1000, 50)

fix, (ax1, ax2, ax3) = plt.subplots(1, 3, figsize = (8, 4))

ax1.scatter(x, y, marker = '+', color = 'red')
ax2.scatter(x, y, marker = '^', color = 'blue')
ax3.scatter(x, y, marker = '$\clubsuit$', color = 'green',
            alpha = 0.5)

plt.show()
Python matplotlib multiple Scatter Plots

In the previous scatter plot examples, we used a single color for all the markers associated with the axis values. However, using the color argument, you can use multiple or individual colors for each marker.

Here, we defined two Radom integer arrays and a random array for colors. Next, we assigned that colors array to c to generate random colors for markers.

import matplotlib.pyplot as plt
import numpy as np

x = np.random.randint(10, 100, 30)

y = np.random.randint(100, 10000, 30)

colors = np.random.rand(30)

plt.scatter(x, y, c = colors, alpha = 0.5, s = y/10)

plt.show()
Add multiple colors to Bubbles or Marks

It is another way of assigning different colors to the markers. Apart from the above, you can also define a gradient to the markers (for example, rainbow) using the color and cmap arguments. To do this, first, you have to assign the list of values that define the marker color as a c argument. Second, you have to define the cmap color (gradient that you want to use), as we defined below.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Sales', 'Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            c = market_data['Quantity'],cmap = 'gist_rainbow_r',
            marker = '*')

plt.title('Markers Example')

plt.show()
Python matplotlib Pyplot Scatter Plot 8

Scatter plot size and edge colors

The matplotlib scatter function has an s argument that defines the size of a marker. It accepts a static one value for all the markers or array-like values. Here, we assigned 150 as a marker size, which means all the markers will size to that value.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = 'green', marker = '*', alpha = 0.5,
            s = 150)

plt.title('size and edge colors')

plt.show()
Change the size and edge colors 9

In this scatter plot example, we assigned y/10 as the s values. It means each marker value will be different and entirely based on the y value.

import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

x = np.random.randint(10, 100, 30)

y = np.random.randint(100, 10000, 30)

colors = np.random.rand(30)

plt.scatter(x, y, c = colors, alpha = 0.6, s = y/10)

plt.show()
Change Mark Value based on y-axis value 10

Let me take a CSV file example. Here, we draw it using Profit and Sales values. Next, we defined the size of the marker based on the profit. It means marker size will increase when the profit is more.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

sales_data = df.groupby('Region')[['Sales', 'Profit']].sum()

print(sales_data.sort_values(by = ['Profit']))

plt.scatter(sales_data['Profit'], sales_data['Sales'], marker = 'o',
            color = 'r', s = sales_data['Profit']/ 1000)

plt.show()
Change Mark size based on Profit 11

The linewidths argument accepts a scalar value or array, and the default value is None. This linewidths argument defines the width of marker edges. The edgecolors argument allows choosing the line edge color of the markers. In this example, we assigned the line width as 1.1 and the edge color to green.

import pandas as pd
from matplotlib import pyplot as plt

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = 'red', marker = '*', alpha = 0.6,
            s = 100,  linewidths = 1.1, edgecolors = 'g')

plt.title('Line Width Example')

plt.show()
linewidths 12

Multiple scatter Charts

The Python matplotlib pyplot scatter function also allows you to draw multiple plot values. First, we plot y against x, and then we plot z against x. It will display z and y values agist x in one chart to differentiate them. We used red and blue colors.

import matplotlib.pyplot as plt

x = [1, 9, 5, 3, 8, 6, 2, 4, 7]

y = [22, 4, 40, 27, 33, 15, 5, 20, 30]

z = [16, 35, 4, 19, 20, 40, 35, 7, 12]

plt.scatter(x, y, color = 'blue')
plt.scatter(x, z, color = 'red')
plt.show()
Multiple Python matplotlib Scatter Plot 13

It is another example of drawing multiple plots. However, this time, we are using the CSV file to compare the Region and Market Sales against the Profits.

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

region_data = df.groupby('Region')[['Sales', 'Profit']].sum()

market_data = df.groupby('Market')[['Sales', 'Profit']].sum()

plt.scatter(region_data['Profit'], region_data['Sales'],
            s= 100, marker = '*', color = 'yellow',
             linewidths = 1.1, edgecolors = 'g')

plt.scatter(market_data['Profit'], market_data['Sales'],
            s =100, marker = 'o', color = 'r')

plt.title('Multiple ones')

plt.show()
Multiple ones sharing same X and Y-axis with different Marks 14

Add a legend

As you can see from the above screenshot, you might not know or identify which markers represent the Region’s Sales and Market. To resolve this, you can use the legend function to add a legend to the Scatter plot.

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

region_data = df.groupby('Region')[['Sales', 'Profit']].sum()

market_data = df.groupby('Market')[['Sales', 'Profit']].sum()

plt.scatter(region_data['Profit'], region_data['Sales'],
            label = 'Region Sales',
            s= 100, marker = '$\heartsuit$', color = 'b',
             linewidths = 1.2, edgecolors = 'g')

plt.scatter(market_data['Profit'], market_data['Sales'],
            label = 'Market Sales',
            s =100, marker = '$\clubsuit$', color = 'r')

plt.legend()
plt.show()
Add Legend 15

Highlight Area

In some situations, you might need to focus on a particular location or area within the scatter plot. So, you need to highlight that particular area for better focus. All you need to do for this is add patches to an existing one. In this example, we are adding a rectangle to highlight the area.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

fig, ax = plt.subplots()

ax.scatter(market_data['Quantity'], market_data['Profit'], 
            color = '#A90303', marker = '*', alpha = 0.6,
            s = 100,  linewidths = 1.1, edgecolors = '#A4F5AF')

ax.add_patch(patches.Rectangle((50, -50), 100, 2000, alpha = 0.3))

plt.show()
Python matplotlib Scatter Plot 16

Similarly, we can add a circle to the area. Apart from this, we can format that circle to view it better. In this example, we add a circle to this chart of random values and then format the color, line widths, etc.

import matplotlib.pyplot as plt
import matplotlib.patches as patches
import numpy as np

x = np.random.randint(10, 100, 30)

y = np.random.randint(10, 101, 30)

colors = np.random.rand(30)

fig, ax = plt.subplots()

ax.scatter(x, y, c = colors, alpha = 0.5, s = y*10)

ax.add_patch(
    patches.Circle((40, 60), 20, alpha = 0.3,
                   edgecolor = 'red', facecolor = 'yellowgreen',
                   linewidth = 2, linestyle = 'solid'))

plt.show()
Add Circle to it 17

By using the axvline function, you can add a vertical line inside a Python matplotlib pyplot scatter plot. Similarly, use the axhline to add a horizontal line.

import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.patches as patches

df = pd.read_excel('/Users/suresh/Downloads/Global_Superstore.xls')

market_data = df.groupby('Order Date')[['Quantity', 'Profit']].sum()

plt.scatter(market_data['Quantity'], market_data['Profit'], 
            color = '#A90303', marker = '*', alpha = 0.6,
            s = 100,  linewidths = 1.1, edgecolors = '#A4F5AF')

plt.axvline(150, color = 'b')
plt.axhline(1000, color = 'red')
plt.show()
Python matplotlib Scatter Plot 18