TUTORIAL

Historical Football Data via API — 2026 Guide

Fetch years of historical football match data via API for analytics and prediction models. TheStatsAPI covers 10 years of data. Python tutorial included.

Published March 20, 2026Updated March 30, 20269 min read

Historical football data is the foundation of prediction models, academic research, betting algorithms, and long-form sports journalism. If you want to know how a team performs away in December, whether a striker's output declines after 30, or how league competitiveness has shifted over the last a decade - you need years of structured match data, not just last weekend's results.

This guide shows you how to fetch historical football data via API, bulk-download multiple seasons, handle rate limits properly, and load everything into Pandas for analysis. We will use TheStatsAPI, which covers 10 years of historical match data across 150 competitions (with up to 1,196 available on request) and 84,000+ players.

Who Needs Historical Football Data

Historical data is not just for data scientists. Here are the most common use cases:

Prediction model builders - training machine learning models on match outcomes, goal totals, and player performance requires thousands of historical matches as training data.
Fantasy football platforms - projecting player value requires understanding historical performance trends, injury patterns, and seasonal form.
Academic researchers - sports economics, network analysis of passing, and competitive balance studies all depend on longitudinal data.
Sports journalists - "first team to win here since 2014" requires a reliable data source, not manual Googling.
Betting analysts - backtesting strategies against historical odds and outcomes is the basis of any quantitative betting approach.

What Historical Data Is Available

TheStatsAPI provides historical data going back 10 years for major competitions. This includes:

Match results - home team, away team, scores, date, venue, and match status for every fixture
Player statistics by season - goals, assists, appearances, minutes, cards, and more, broken down by year and competition
Competition records - seasons, participating teams, and team records for each year
Team histories - which teams participated in which competitions in which seasons

Coverage is deepest for the top European leagues (Premier League, La Liga, Bundesliga, Serie A, Ligue 1), the Champions League, and major South American competitions. Smaller leagues may have shorter historical windows. The default catalog ships with 150 competitions, and up to 1,196 competitions are available on request.

Fetching Matches by Season

The core of any historical data pull is fetching matches for a specific competition and season. Here is how to do it in Python:

import requests

API_KEY = "your_api_key_here"
BASE_URL = "https://api.thestatsapi.com/api"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Accept": "application/json"
}

def fetch_season_matches(competition_id, season_id):
    """Fetch all matches for a competition in a given season."""
    all_matches = []
    page = 1

    while True:
        response = requests.get(
            f"{BASE_URL}/football/matches",
            headers=headers,
            params={
                "competition_id": competition_id,
                "season_id": season_id,
                "page": page
            }
        )

        if response.status_code != 200:
            print(f"Error {response.status_code} on page {page}")
            break

        data = response.json()
        matches = data.get("data", [])
        all_matches.extend(matches)

        total_pages = data["meta"]["total_pages"]
        if page >= total_pages:
            break

        page += 1

    return all_matches


# Fetch Premier League matches for a specific season.
# Use /football/competitions?search=Premier%20League to find competition_id.
# Use /football/competitions/{competition_id}/seasons to discover season_id values.
matches = fetch_season_matches(
    competition_id="comp_3039",   # Premier League
    season_id="sn_3057848"        # Premier League 2024-25 season
)
print(f"Fetched {len(matches)} matches")

for match in matches[:5]:
    date = match["utc_date"][:10]
    home = match["home_team"]["name"]
    away = match["away_team"]["name"]
    score = match.get("score", {})
    print(f"[{date}] {home} {score.get('home', '-')}-{score.get('away', '-')} {away}")

All IDs in TheStatsAPI are prefixed strings (e.g. comp_3039 for Premier League, sn_3057848 for the 2024-25 Premier League season). Discover competitions first, then call /football/competitions/{competition_id}/seasons to get the season IDs you need.

Bulk Fetching Multiple Seasons and Saving to CSV

For serious analysis, you need multiple seasons. Here is a script that fetches all historical matches for a competition and saves them to a CSV file:

import requests
import time
import csv

API_KEY = "your_api_key_here"
BASE_URL = "https://api.thestatsapi.com/api"

headers = {
    "Authorization": f"Bearer {API_KEY}",
    "Accept": "application/json"
}

def fetch_all_matches(competition_id, status="finished"):
    """Fetch all matches for a competition, handling pagination and rate limits."""
    all_matches = []
    page = 1

    while True:
        response = requests.get(
            f"{BASE_URL}/football/matches",
            headers=headers,
            params={
                "competition_id": competition_id,
                "status": status,
                "page": page,
                "per_page": 100
            }
        )

        if response.status_code == 429:
            retry_after = int(response.headers.get("Retry-After", 60))
            print(f"Rate limited. Waiting {retry_after} seconds...")
            time.sleep(retry_after)
            continue

        if response.status_code != 200:
            print(f"Error {response.status_code} on page {page}")
            break

        data = response.json()
        all_matches.extend(data.get("data", []))

        total_pages = data["meta"]["total_pages"]
        print(f"Page {page}/{total_pages} - {len(all_matches)} matches so far")

        if page >= total_pages:
            break

        page += 1
        time.sleep(2)  # Respect rate limits

    return all_matches


def save_to_csv(matches, filename):
    """Save match data to a CSV file."""
    if not matches:
        print("No matches to save.")
        return

    with open(filename, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerow([
            "match_id", "utc_date", "season_id", "competition_id",
            "home_team", "away_team", "home_score", "away_score", "status"
        ])

        for match in matches:
            score = match.get("score", {})
            writer.writerow([
                match["id"],
                match["utc_date"],
                match.get("season_id", ""),
                match.get("competition_id", ""),
                match["home_team"]["name"],
                match["away_team"]["name"],
                score.get("home", ""),
                score.get("away", ""),
                match.get("status", "")
            ])

    print(f"Saved {len(matches)} matches to {filename}")


# Fetch all historical Premier League matches
all_matches = fetch_all_matches(competition_id="comp_3039")
print(f"\nTotal matches: {len(all_matches)}")
print(f"Seasons covered: {len(set(m['season_id'] for m in all_matches))}")

save_to_csv(all_matches, "premier_league_historical.csv")

This script fetches finished Premier League matches and saves them as a clean CSV. For a full historical backfill, first call /football/competitions/comp_3039/seasons, loop over the season IDs you want, and pass each season_id into /football/matches. The API paginates at up to 100 results per page, so each league season is usually only a few requests.

Rate Limit Strategy

When bulk-fetching historical data, you will hit rate limits if you are not careful. Here is how to handle them properly.

Understand your plan limits

Plan	Requests/month	Rate limit
Starter ($50/mo)	100,000	30/min
Growth ($129/mo)	500,000	60/min
Scale ($379/mo)	5,000,000	300/min

Exponential backoff

Instead of a fixed delay after a 429 error, use exponential backoff:

import time

def fetch_with_backoff(url, headers, params, max_retries=5):
    """Fetch with exponential backoff on rate limit errors."""
    for attempt in range(max_retries):
        response = requests.get(url, headers=headers, params=params)

        if response.status_code == 200:
            return response

        if response.status_code == 429:
            wait_time = min(2 ** attempt * 10, 120)  # 10s, 20s, 40s, 80s, 120s
            print(f"Rate limited. Retry {attempt + 1}/{max_retries} in {wait_time}s")
            time.sleep(wait_time)
        else:
            print(f"Unexpected error: {response.status_code}")
            return response

    raise Exception("Max retries exceeded")

Plan selection for bulk downloads

If you are doing a one-time historical data pull across multiple competitions and decades, the Scale plan ($379/month) is the most practical choice. At 300 requests per minute and 5 million requests per month, you can download the entire historical dataset - including the full expanded catalog of up to 1,196 competitions available on request - in a few days. Once downloaded, cancel or downgrade - your local data does not expire.

For smaller pulls (one league, a few seasons), the Starter plan at $50/month is more than sufficient. Ten seasons of a single league requires roughly 50-100 requests, well within the 100,000 monthly limit.

Loading into Pandas

Once you have your CSV, analysis is straightforward with Pandas:

import pandas as pd

df = pd.read_csv("premier_league_2014_2024.csv")

print(f"Total matches: {len(df)}")
print(f"Seasons: {df['season_id'].nunique()}")
print(f"Teams: {df['home_team'].nunique()}")

# Average goals per match by season
df["total_goals"] = df["home_score"] + df["away_score"]
goals_by_season = df.groupby("season_id")["total_goals"].mean()
print("\nAverage goals per match by season:")
print(goals_by_season.round(2))

# Home win percentage
df["home_win"] = df["home_score"] > df["away_score"]
home_win_pct = df.groupby("season_id")["home_win"].mean()
print("\nHome win percentage by season:")
print((home_win_pct * 100).round(1))

# Top scoring teams (total home + away goals)
home_goals = df.groupby("home_team")["home_score"].sum()
away_goals = df.groupby("away_team")["away_score"].sum()
total_goals = (home_goals.add(away_goals, fill_value=0)
               .sort_values(ascending=False))
print("\nTop 10 scoring teams (all seasons):")
print(total_goals.head(10))

This gives you immediate insights: are average goals per match trending up? Is home advantage declining? Which teams have scored the most across a decade? These are the starting points for deeper analysis and model building.

Compare Historical Data Providers

If you are still choosing between REST APIs, CSV datasets, and open archives, read the best historical football data APIs comparison. For model-specific needs, the best football API for prediction models guide covers xG, odds, live odds movement, and backtesting inputs.

Frequently Asked Questions

How far back does the historical data go?

TheStatsAPI provides 10 years of historical data for major competitions. The exact start year varies by league - top European leagues (Premier League, La Liga, Bundesliga, Serie A, Ligue 1) have the deepest history, while smaller or newer competitions may have shorter windows. The default catalog includes 150 competitions, with up to 1,196 competitions available on request.

Can I download all historical data at once?

There is no single "download everything" endpoint. You fetch data by competition and season, paginating through results. This is by design - it lets you pull exactly what you need without downloading terabytes of data you will never use. The bulk-fetch script in this guide automates the process for multiple seasons.

Will I hit rate limits during a bulk download?

Yes, if you do not add delays between requests. The Starter plan allows 30 requests per minute. Adding a 2-second delay between paginated requests keeps you safely under this limit. For very large downloads (many competitions, many seasons), the Scale plan at 300 requests per minute makes the process significantly faster.

Is the historical data updated retroactively?

Match data is finalized within 1-2 hours of full time and generally does not change after that. In rare cases - such as an administrative decision reversing a result - historical records may be updated. For the vast majority of analysis use cases, you can treat downloaded historical data as stable and immutable.

TheStatsAPI offers a 7-day free trial on all plans, giving full access to every endpoint, 150 competitions out of the box (with up to 1,196 available on request), and 84,000+ players - making it the best way to evaluate a premium football API before committing. Start your trial at thestatsapi.com and begin building your historical football dataset today.

Start building today

Ready to Power Your Sports App?

Start your 7-day free trial. All endpoints included on every plan.

Start Your Free Trial View Pricing

Cancel anytime

7-day free trial

Setup in 5 minutes