Historical Football Data via API — 2026 Guide
Fetch years of historical football match data via API for analytics and prediction models. TheStatsAPI covers 10 years of data. Python tutorial included.
Historical football data is the foundation of prediction models, academic research, betting algorithms, and long-form sports journalism. If you want to know how a team performs away in December, whether a striker's output declines after 30, or how league competitiveness has shifted over the last a decade - you need years of structured match data, not just last weekend's results.
This guide shows you how to fetch historical football data via API, bulk-download multiple seasons, handle rate limits properly, and load everything into Pandas for analysis. We will use TheStatsAPI, which covers 10 years of historical match data across 80 competitions (with up to 1,196 available on request) and 84,000+ players.
Who Needs Historical Football Data
Historical data is not just for data scientists. Here are the most common use cases:
- Prediction model builders - training machine learning models on match outcomes, goal totals, and player performance requires thousands of historical matches as training data.
- Fantasy football platforms - projecting player value requires understanding historical performance trends, injury patterns, and seasonal form.
- Academic researchers - sports economics, network analysis of passing, and competitive balance studies all depend on longitudinal data.
- Sports journalists - "first team to win here since 2014" requires a reliable data source, not manual Googling.
- Betting analysts - backtesting strategies against historical odds and outcomes is the basis of any quantitative betting approach.
What Historical Data Is Available
TheStatsAPI provides historical data going back 10 years for major competitions. This includes:
- Match results - home team, away team, scores, date, venue, and match status for every fixture
- Player statistics by season - goals, assists, appearances, minutes, cards, and more, broken down by year and competition
- Competition records - seasons, participating teams, and team records for each year
- Team histories - which teams participated in which competitions in which seasons
Coverage is deepest for the top European leagues (Premier League, La Liga, Bundesliga, Serie A, Ligue 1), the Champions League, and major South American competitions. Smaller leagues may have shorter historical windows. The default catalog ships with 80 competitions, and up to 1,196 competitions are available on request.
Fetching Matches by Season
The core of any historical data pull is fetching matches for a specific competition and season. Here is how to do it in Python:
import requests
API_KEY = "your_api_key_here"
BASE_URL = "https://api.thestatsapi.com/api"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Accept": "application/json"
}
def fetch_season_matches(competition_id, season_id):
"""Fetch all matches for a competition in a given season."""
all_matches = []
page = 1
while True:
response = requests.get(
f"{BASE_URL}/football/matches",
headers=headers,
params={
"competition_id": competition_id,
"season_id": season_id,
"page": page
}
)
if response.status_code != 200:
print(f"Error {response.status_code} on page {page}")
break
data = response.json()
matches = data.get("data", [])
all_matches.extend(matches)
total_pages = data["meta"]["total_pages"]
if page >= total_pages:
break
page += 1
return all_matches
# Fetch Premier League matches for a specific season
# Use /football/competitions to find your competition_id
# Use /football/matches?competition_id=comp_3039 to discover season_ids
matches = fetch_season_matches(
competition_id="comp_3039", # Premier League
season_id="sn_6125938" # 2025-26 season
)
print(f"Fetched {len(matches)} matches")
for match in matches[:5]:
date = match["utc_date"][:10]
home = match["home_team"]["name"]
away = match["away_team"]["name"]
score = match.get("score", {})
print(f"[{date}] {home} {score.get('home', '-')}-{score.get('away', '-')} {away}")
All IDs in TheStatsAPI are prefixed strings (e.g. comp_3039 for Premier League, sn_7598 for a season). You can discover them by calling /football/competitions first.
Bulk Fetching Multiple Seasons and Saving to CSV
For serious analysis, you need multiple seasons. Here is a script that fetches all historical matches for a competition and saves them to a CSV file:
import requests
import time
import csv
API_KEY = "your_api_key_here"
BASE_URL = "https://api.thestatsapi.com/api"
headers = {
"Authorization": f"Bearer {API_KEY}",
"Accept": "application/json"
}
def fetch_all_matches(competition_id, status="finished"):
"""Fetch all matches for a competition, handling pagination and rate limits."""
all_matches = []
page = 1
while True:
response = requests.get(
f"{BASE_URL}/football/matches",
headers=headers,
params={
"competition_id": competition_id,
"status": status,
"page": page,
"per_page": 100
}
)
if response.status_code == 429:
retry_after = int(response.headers.get("Retry-After", 60))
print(f"Rate limited. Waiting {retry_after} seconds...")
time.sleep(retry_after)
continue
if response.status_code != 200:
print(f"Error {response.status_code} on page {page}")
break
data = response.json()
all_matches.extend(data.get("data", []))
total_pages = data["meta"]["total_pages"]
print(f"Page {page}/{total_pages} - {len(all_matches)} matches so far")
if page >= total_pages:
break
page += 1
time.sleep(2) # Respect rate limits
return all_matches
def save_to_csv(matches, filename):
"""Save match data to a CSV file."""
if not matches:
print("No matches to save.")
return
with open(filename, "w", newline="", encoding="utf-8") as f:
writer = csv.writer(f)
writer.writerow([
"match_id", "utc_date", "season_id", "competition_id",
"home_team", "away_team", "home_score", "away_score", "status"
])
for match in matches:
score = match.get("score", {})
writer.writerow([
match["id"],
match["utc_date"],
match.get("season_id", ""),
match.get("competition_id", ""),
match["home_team"]["name"],
match["away_team"]["name"],
score.get("home", ""),
score.get("away", ""),
match.get("status", "")
])
print(f"Saved {len(matches)} matches to {filename}")
# Fetch all historical Premier League matches
all_matches = fetch_all_matches(competition_id="comp_3039")
print(f"\nTotal matches: {len(all_matches)}")
print(f"Seasons covered: {len(set(m['season_id'] for m in all_matches))}")
save_to_csv(all_matches, "premier_league_historical.csv")
This script fetches all finished Premier League matches across every available season and saves them as a clean CSV. The API paginates at up to 100 results per page, so a full historical download typically requires a few hundred requests.
Rate Limit Strategy
When bulk-fetching historical data, you will hit rate limits if you are not careful. Here is how to handle them properly.
Understand your plan limits
| Plan | Requests/month | Rate limit |
|---|---|---|
| Starter ($50/mo) | 100,000 | 30/min |
| Growth ($129/mo) | 500,000 | 60/min |
| Scale ($379/mo) | 5,000,000 | 300/min |
Exponential backoff
Instead of a fixed delay after a 429 error, use exponential backoff:
import time
def fetch_with_backoff(url, headers, params, max_retries=5):
"""Fetch with exponential backoff on rate limit errors."""
for attempt in range(max_retries):
response = requests.get(url, headers=headers, params=params)
if response.status_code == 200:
return response
if response.status_code == 429:
wait_time = min(2 ** attempt * 10, 120) # 10s, 20s, 40s, 80s, 120s
print(f"Rate limited. Retry {attempt + 1}/{max_retries} in {wait_time}s")
time.sleep(wait_time)
else:
print(f"Unexpected error: {response.status_code}")
return response
raise Exception("Max retries exceeded")
Plan selection for bulk downloads
If you are doing a one-time historical data pull across multiple competitions and decades, the Scale plan ($379/month) is the most practical choice. At 300 requests per minute and 5 million requests per month, you can download the entire historical dataset - including the full expanded catalog of up to 1,196 competitions available on request - in a few days. Once downloaded, cancel or downgrade - your local data does not expire.
For smaller pulls (one league, a few seasons), the Starter plan at $50/month is more than sufficient. Ten seasons of a single league requires roughly 50-100 requests, well within the 100,000 monthly limit.
Loading into Pandas
Once you have your CSV, analysis is straightforward with Pandas:
import pandas as pd
df = pd.read_csv("premier_league_2014_2024.csv")
print(f"Total matches: {len(df)}")
print(f"Seasons: {df['season_id'].nunique()}")
print(f"Teams: {df['home_team'].nunique()}")
# Average goals per match by season
df["total_goals"] = df["home_score"] + df["away_score"]
goals_by_season = df.groupby("season_id")["total_goals"].mean()
print("\nAverage goals per match by season:")
print(goals_by_season.round(2))
# Home win percentage
df["home_win"] = df["home_score"] > df["away_score"]
home_win_pct = df.groupby("season_id")["home_win"].mean()
print("\nHome win percentage by season:")
print((home_win_pct * 100).round(1))
# Top scoring teams (total home + away goals)
home_goals = df.groupby("home_team")["home_score"].sum()
away_goals = df.groupby("away_team")["away_score"].sum()
total_goals = (home_goals.add(away_goals, fill_value=0)
.sort_values(ascending=False))
print("\nTop 10 scoring teams (all seasons):")
print(total_goals.head(10))
This gives you immediate insights: are average goals per match trending up? Is home advantage declining? Which teams have scored the most across a decade? These are the starting points for deeper analysis and model building.
Frequently Asked Questions
How far back does the historical data go?
TheStatsAPI provides 10 years of historical data for major competitions. The exact start year varies by league - top European leagues (Premier League, La Liga, Bundesliga, Serie A, Ligue 1) have the deepest history, while smaller or newer competitions may have shorter windows. The default catalog includes 80 competitions, with up to 1,196 competitions available on request.
Can I download all historical data at once?
There is no single "download everything" endpoint. You fetch data by competition and season, paginating through results. This is by design - it lets you pull exactly what you need without downloading terabytes of data you will never use. The bulk-fetch script in this guide automates the process for multiple seasons.
Will I hit rate limits during a bulk download?
Yes, if you do not add delays between requests. The Starter plan allows 30 requests per minute. Adding a 2-second delay between paginated requests keeps you safely under this limit. For very large downloads (many competitions, many seasons), the Scale plan at 300 requests per minute makes the process significantly faster.
Is the historical data updated retroactively?
Match data is finalized within 1-2 hours of full time and generally does not change after that. In rare cases - such as an administrative decision reversing a result - historical records may be updated. For the vast majority of analysis use cases, you can treat downloaded historical data as stable and immutable.
TheStatsAPI offers a 7-day free trial on all plans, giving full access to every endpoint, 80 competitions out of the box (with up to 1,196 available on request), and 84,000+ players - making it the best way to evaluate a premium football API before committing. Start your trial at thestatsapi.com and begin building your historical football dataset today.
Ready to Power Your Sports App?
Start your 7-day free trial. All endpoints included on every plan.