rdmpy: UK Rail Incidents & Delay Propagation

A comprehensive toolkit for analyzing UK rail network incidents and delay propagation. Integrating train schedules, incident records, and delay data from the Rail Data Marketplace to link system-level performance metrics with component-level operational events.

This toolkit enables researchers and engineers to build unified datasets connecting network-wide delays and cancellations with temporal and spatial patterns of initial incidents, train movements, and passenger impacts—addressing a critical gap in data availability for systems engineering in the rail domain.

Overview

The rdmpy repository serves to:

  • Integrate diverse data sources: Combine schedule, incident, and delay data from the Rail Data Marketplace

  • Preprocess railway operations data: Clean and structure raw data into analysis-ready formats

  • Support delay propagation analysis: Track how local incidents cascade across the rail network

  • Enable system-level assessment: Evaluate station performance, identify delay patterns, and explore cause-and-effect relationships

  • Provide interactive visualizations: Explore data through multiple perspectives (aggregate, incident, station, time-based, and train-level views)

Repository Structure

├─ rdmpy/                     # Main package
│  ├─ preprocessor/           # Data preprocessing module
│  └─ outputs/                # Analysis tools and data loaders
├─ demo/                      # Interactive Jupyter notebooks
│  ├─ aggregate_view.ipynb
│  ├─ incident_view.ipynb
│  ├─ station_view.ipynb
│  ├─ time_view.ipynb
│  └─ train_view.ipynb
├─ docs/                      # Documentation (Sphinx)
├─ tests/                     # Unit and integration tests
├─ processed_data/            # Preprocessed datasets by station (generated by user with preprocessor)
└─ LICENSE, README.md

Key Features

  • Unified Dataset: Connects system-level performance with component-level events

  • Flexible Preprocessing: Process all stations or target specific categories (A, B, C1, C2)

  • Multiple Analytical Perspectives: Five interactive demos for different levels of network analysis

  • Station Performance Assessment: Evaluate performance under varying demand scenarios

  • Delay Propagation Tracking: Understand how incidents ripple across the network

  • Open & Extensible: Built for integration with new data sources and analysis methods

Summary Workflow

Please read all the files contained in the docs/ folder for detailed instructions on how to set up and use the toolkit. Specifically, refer to the Getting Started guide for installation and setup instructions, and the User Guide for data preprocessing and analysis instructions. The general workflow is as follows:

  1. Install the toolkit:

    pip install rdmpy
    
  2. Download data from the Rail Data Marketplace:

    • NWR Historic Delay Attribution data (Transparency files)

    • NWR Schedule data (CIF_ALL_FULL_DAILY_toc-full.json.gz)

  3. Preprocess the data:

    python -m rdmpy.preprocessor --all-categories
    
  4. Explore the data using the Jupyter notebooks in the demo/ folder for different analytical perspectives (aggregate, incident, station, time-based, train-level).

  5. Contribute to the project by following the How to Contribute guide.

License

This project is licensed under the MIT License. See the LICENSE file for details.

Indices and tables