Backfill Historical Football Match Stats into PostgreSQL
Design a PostgreSQL backfill pipeline for historical football data: competitions, seasons, matches, match stats, shotmaps, and odds using TheStatsAPI.
CSV is fine for notebooks. PostgreSQL is better when you are building a product, model pipeline, dashboard, or betting analytics backend that needs repeatable syncs.
This guide shows a practical schema and backfill flow for TheStatsAPI:
- Store competitions.
- Store seasons per competition.
- Store matches per competition season.
- Store match-level stats.
- Store shotmaps and odds in separate tables when you need them.
Minimal schema
create table football_competitions (
id text primary key,
name text not null,
country text,
country_code text,
type text,
has_team_stats boolean,
has_player_stats boolean,
odds_available boolean,
live_odds_available boolean,
xg_available boolean
);
create table football_seasons (
id text primary key,
competition_id text not null references football_competitions(id),
name text not null,
year text
);
create table football_matches (
id text primary key,
competition_id text not null references football_competitions(id),
season_id text references football_seasons(id),
utc_date timestamptz not null,
status text not null,
matchday integer,
stage_name text,
group_label text,
home_team_id text,
home_team_name text,
away_team_id text,
away_team_name text,
home_score integer,
away_score integer,
final_home_score integer,
final_away_score integer,
xg_available boolean,
odds_available boolean,
live_odds_available boolean,
updated_at timestamptz not null default now()
);
create table football_match_stats (
match_id text primary key references football_matches(id),
home_xg numeric,
away_xg numeric,
home_np_xg numeric,
away_np_xg numeric,
home_shots integer,
away_shots integer,
home_shots_on_target integer,
away_shots_on_target integer,
home_possession integer,
away_possession integer,
home_corners integer,
away_corners integer,
raw jsonb not null,
updated_at timestamptz not null default now()
);
Keep the raw jsonb field on stats rows. It lets you add fields later without re-running the entire historical sync.
Backfill order
1. Competitions
GET /football/competitions?page=1&per_page=100
Store each returned competition. For a focused backfill, use search=Premier League and pick the exact row:
comp_3039 Premier League England
2. Seasons
GET /football/competitions/comp_3039/seasons?per_page=20
Example season IDs:
sn_3057848 Premier League 24/25
sn_606923 Premier League 23/24
sn_654318 Premier League 22/23
3. Matches
GET /football/matches?competition_id=comp_3039&season_id=sn_3057848&status=finished&page=1&per_page=100
Upsert matches by id. The important point is to store competition_id and season_id from the response, not just the team names.
4. Match stats
GET /football/matches/{match_id}/stats
Only request stats for matches where you need advanced fields. The match object includes xg_available; use that to avoid wasting requests on lower-coverage rows.
Upsert matches from Node.js
import pg from "pg";
import { get } from "./client.mjs";
const pool = new pg.Pool({ connectionString: process.env.DATABASE_URL });
async function upsertMatch(match) {
const score = match.score ?? {};
const finalScore = score.final_score ?? {};
await pool.query(
`
insert into football_matches (
id, competition_id, season_id, utc_date, status, matchday, stage_name, group_label,
home_team_id, home_team_name, away_team_id, away_team_name,
home_score, away_score, final_home_score, final_away_score,
xg_available, odds_available, live_odds_available, updated_at
)
values (
$1, $2, $3, $4, $5, $6, $7, $8,
$9, $10, $11, $12,
$13, $14, $15, $16,
$17, $18, $19, now()
)
on conflict (id) do update set
utc_date = excluded.utc_date,
status = excluded.status,
home_score = excluded.home_score,
away_score = excluded.away_score,
final_home_score = excluded.final_home_score,
final_away_score = excluded.final_away_score,
xg_available = excluded.xg_available,
odds_available = excluded.odds_available,
live_odds_available = excluded.live_odds_available,
updated_at = now()
`,
[
match.id,
match.competition_id,
match.season_id,
match.utc_date,
match.status,
match.matchday,
match.stage_name,
match.group_label,
match.home_team?.id,
match.home_team?.name,
match.away_team?.id,
match.away_team?.name,
score.home,
score.away,
finalScore.home,
finalScore.away,
match.xg_available,
match.odds_available,
match.live_odds_available,
],
);
}
Upsert match stats
async function upsertMatchStats(matchId) {
const { data } = await get(`/football/matches/${matchId}/stats`);
const overview = data.overview;
await pool.query(
`
insert into football_match_stats (
match_id, home_xg, away_xg, home_np_xg, away_np_xg,
home_shots, away_shots, home_shots_on_target, away_shots_on_target,
home_possession, away_possession, home_corners, away_corners,
raw, updated_at
)
values ($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,now())
on conflict (match_id) do update set
home_xg = excluded.home_xg,
away_xg = excluded.away_xg,
home_np_xg = excluded.home_np_xg,
away_np_xg = excluded.away_np_xg,
home_shots = excluded.home_shots,
away_shots = excluded.away_shots,
raw = excluded.raw,
updated_at = now()
`,
[
matchId,
overview.expected_goals?.all?.home,
overview.expected_goals?.all?.away,
data.np_expected_goals?.all?.home,
data.np_expected_goals?.all?.away,
overview.total_shots?.all?.home,
overview.total_shots?.all?.away,
overview.shots_on_target?.all?.home,
overview.shots_on_target?.all?.away,
overview.ball_possession?.all?.home,
overview.ball_possession?.all?.away,
overview.corner_kicks?.all?.home,
overview.corner_kicks?.all?.away,
JSON.stringify(data),
],
);
}
Shotmaps and odds
Use separate tables for one-to-many data.
GET /football/matches/{match_id}/shotmap
GET /football/matches/{match_id}/odds
Shotmaps return one row per shot. Odds return bookmakers and nested market maps such as match_odds, btts, total_goals, match_corners, and asian_handicap. Keep them separate from football_matches.
Backfill strategy
- Backfill one competition-season at a time.
- Store progress as
{competition_id, season_id, page}. - Upsert matches first, then stats for matches with
xg_available. - Use a queue or a two-second delay on Starter plan limits.
- Re-run recent seasons nightly; older completed seasons can be treated as mostly stable.
FAQ
Should match stats be columns or JSON?
Use both. Store common model fields as columns, and keep the full response in raw jsonb for future fields.
How do I avoid duplicate rows?
Use API IDs as primary keys. Match IDs, competition IDs, and season IDs are stable prefixed strings.
Where should odds live?
In a separate odds table keyed by match_id, bookmaker, market, line, and outcome. Odds are nested and can have many rows per match.
Ready to Power Your Sports App?
Start your 7-day free trial. All endpoints included on every plan.