TUTORIAL

Download Historical Football Data as CSV with Python

Q: What endpoint should I call first?

Call /football/competitions first, then /football/competitions/{competitionid}/seasons. That gives you stable IDs before you request matches.

Q: Can I download everything in one request?

No. Use pagination. perpage supports up to 100 rows, and large historical downloads should loop through competitions, seasons, and pages.

Q: Where does xG come from?

Use /football/matches/{matchid}/stats for aggregate xG and /football/matches/{matchid}/shotmap for shot-level xG.

A practical Python guide for downloading historical football matches and match stats to CSV. Covers competition IDs, season IDs, pagination, xG, and rate limits.

Published June 8, 20264 min read

If you want a CSV of historical football data, do not start by guessing match IDs. Start with the catalog, resolve the competition and season IDs, then paginate matches and enrich each match with stats.

This guide uses TheStatsAPI's documented REST flow:

GET /football/competitions to find the league or tournament.
GET /football/competitions/{competition_id}/seasons to find historical seasons.
GET /football/matches?competition_id=...&season_id=... to download fixtures and results.
GET /football/matches/{match_id}/stats to add shots, possession, xG, corners, and cards.

The examples below use the Premier League because it has stable historical coverage and xG data.

Set up the Python client

python -m pip install requests pandas
export THESTATSAPI_KEY="your_api_key"

Create client.py:

import os
import time
import requests

API_KEY = os.environ["THESTATSAPI_KEY"]
BASE_URL = "https://api.thestatsapi.com/api"


def get(endpoint, params=None, retries=4):
    for attempt in range(retries):
        response = requests.get(
            f"{BASE_URL}{endpoint}",
            headers={
                "Authorization": f"Bearer {API_KEY}",
                "Accept": "application/json",
            },
            params=params or {},
            timeout=30,
        )

        if response.status_code == 429:
            time.sleep(min(15 * (attempt + 1), 60))
            continue

        response.raise_for_status()
        return response.json()

    raise RuntimeError(f"Rate limited after {retries} retries")

Find the competition ID

Search by name, then pick the exact competition row you want. Searching Premier League returns multiple leagues, so check the country field before hard-coding an ID.

import time
from client import get

result = get("/football/competitions", {
    "search": "Premier League",
    "per_page": 10,
})

for competition in result["data"]:
    print(competition["id"], competition["name"], competition["country"])

For the English Premier League, the API returns:

comp_3039 Premier League England

Get season IDs

Do not pass plain years like season=2024. Use the season IDs returned by the seasons endpoint.

seasons = get("/football/competitions/comp_3039/seasons", {"per_page": 20})

for season in seasons["data"]:
    print(season["id"], season["name"], season["year"])

Example IDs:

sn_3057848 Premier League 24/25 24/25
sn_606923 Premier League 23/24 23/24
sn_654318 Premier League 22/23 22/23

Download all matches for a season

List endpoints return a meta block with page, per_page, total, and total_pages. Use that to paginate.

from client import get


def get_all(endpoint, params):
    rows = []
    page = 1

    while True:
        payload = get(endpoint, {**params, "page": page, "per_page": 100})
        rows.extend(payload["data"])

        if page >= payload["meta"]["total_pages"]:
            break

        page += 1
        time.sleep(2)

    return rows


matches = get_all("/football/matches", {
    "competition_id": "comp_3039",
    "season_id": "sn_3057848",
    "status": "finished",
})

print(f"Downloaded {len(matches)} finished matches")

Each match includes fields like id, utc_date, home_team, away_team, score, status, season_id, competition_id, xg_available, and odds_available.

Save matches to CSV

import pandas as pd

match_rows = []

for match in matches:
    score = match.get("score") or {}
    match_rows.append({
        "match_id": match["id"],
        "utc_date": match["utc_date"],
        "competition_id": match["competition_id"],
        "season_id": match["season_id"],
        "home_team_id": match["home_team"]["id"],
        "home_team": match["home_team"]["name"],
        "away_team_id": match["away_team"]["id"],
        "away_team": match["away_team"]["name"],
        "home_score": score.get("home"),
        "away_score": score.get("away"),
        "status": match["status"],
        "xg_available": match.get("xg_available"),
        "odds_available": match.get("odds_available"),
    })

pd.DataFrame(match_rows).to_csv("premier-league-2024-25-matches.csv", index=False)

At this point you have a normal fixture/results CSV. For modelling, the next step is match stats.

Add match stats and xG

Only call the stats endpoint for matches where xg_available or detailed stats are relevant to your use case.

def flatten_match_stats(match):
    stats = get(f"/football/matches/{match['id']}/stats")["data"]
    overview = stats["overview"]

    return {
        "match_id": match["id"],
        "home_xg": overview["expected_goals"]["all"]["home"],
        "away_xg": overview["expected_goals"]["all"]["away"],
        "home_np_xg": stats.get("np_expected_goals", {}).get("all", {}).get("home"),
        "away_np_xg": stats.get("np_expected_goals", {}).get("all", {}).get("away"),
        "home_shots": overview["total_shots"]["all"]["home"],
        "away_shots": overview["total_shots"]["all"]["away"],
        "home_shots_on_target": overview["shots_on_target"]["all"]["home"],
        "away_shots_on_target": overview["shots_on_target"]["all"]["away"],
        "home_possession": overview["ball_possession"]["all"]["home"],
        "away_possession": overview["ball_possession"]["all"]["away"],
        "home_corners": overview["corner_kicks"]["all"]["home"],
        "away_corners": overview["corner_kicks"]["all"]["away"],
    }


stat_rows = []
for match in matches:
    if not match.get("xg_available"):
        continue
    stat_rows.append(flatten_match_stats(match))
    time.sleep(2)

pd.DataFrame(stat_rows).to_csv("premier-league-2024-25-match-stats.csv", index=False)

Keep match rows and stat rows in separate CSVs. That makes joins explicit and keeps your raw fixtures useful even when some lower-coverage competitions do not have xG.

Join the CSVs in pandas

matches_df = pd.read_csv("premier-league-2024-25-matches.csv")
stats_df = pd.read_csv("premier-league-2024-25-match-stats.csv")

df = matches_df.merge(stats_df, on="match_id", how="left")
df["total_goals"] = df["home_score"] + df["away_score"]
df["xg_diff"] = df["home_xg"] - df["away_xg"]

print(df[["home_team", "away_team", "home_score", "away_score", "home_xg", "away_xg"]].head())

Bulk download multiple seasons

season_ids = ["sn_3057848", "sn_606923", "sn_654318"]
all_matches = []

for season_id in season_ids:
    all_matches.extend(get_all("/football/matches", {
        "competition_id": "comp_3039",
        "season_id": season_id,
        "status": "finished",
    }))

print(f"Downloaded {len(all_matches)} matches across {len(season_ids)} seasons")

For a large backfill, store progress after each page. If the script stops, resume from the last completed season_id and page.

FAQ

What endpoint should I call first?

Call /football/competitions first, then /football/competitions/{competition_id}/seasons. That gives you stable IDs before you request matches.

Can I download everything in one request?

No. Use pagination. per_page supports up to 100 rows, and large historical downloads should loop through competitions, seasons, and pages.

Where does xG come from?

Use /football/matches/{match_id}/stats for aggregate xG and /football/matches/{match_id}/shotmap for shot-level xG.

Should I save JSON or CSV?

Save both if you are building a production pipeline: raw JSON for replay/debugging, CSV or database tables for analysis.

Start building today

Ready to Power Your Sports App?

Start your 7-day free trial. All endpoints included on every plan.

Start Your Free Trial View Pricing

Cancel anytime

7-day free trial

Setup in 5 minutes