API Reference

This section provides detailed documentation of the rdmpy Python API, organized by module.

Data Pre-Processing

Core preprocessor module for matching schedules with delays and organizing results by station.

Save processed schedule and delay data to parquet files for railway stations by DFT category. This script can process any DFT category (A, B, C1, C2) or all categories at once. Processes schedule data, applies delays, and saves the results as pandas DataFrames organized by day of the week for each station.

Usage: - Single station: python -m preprocess.preprocessor <STANOX_CODE> - Category A: python -m preprocess.preprocessor –category-A - Category B: python -m preprocess.preprocessor –category-B - Category C1: python -m preprocess.preprocessor –category-C1 - Category C2: python -m preprocess.preprocessor –category-C2 - All categories: python -m preprocess.preprocessor –all-categories - Interactive: python -m preprocess.preprocessor

rdmpy.preprocessor.get_weekday_from_schedule_entry(entry)[source]

Extract the primary weekday from a schedule entry for sorting purposes.

Parameters:

entry – Schedule entry dictionary

Returns:

Weekday index (0=Monday, 6=Sunday) for sorting

Return type:

int

rdmpy.preprocessor.load_stations(category=None)[source]

Load stations from the reference JSON file.

Parameters:

category (str, optional) – DFT category to filter by (e.g., ‘A’, ‘B’, ‘C1’, ‘C2’). If None, returns all stations.

Returns:

List of STANOX codes for the specified category or all stations

Return type:

list

rdmpy.preprocessor.save_processed_data_by_weekday_to_dataframe(st_code, output_dir='processed_data', schedule_data_loaded=None, stanox_ref=None, tiploc_to_stanox=None, incident_data_loaded=None)[source]

Process schedule and delay data, then save to pandas DataFrame organized by weekday. OPTIMIZED VERSION: Accepts pre-loaded data to avoid file I/O for each station.

Parameters:
  • st_code (str) – STANOX code to process

  • output_dir (str) – Directory to save output files

  • schedule_data_loaded (pd.DataFrame, optional) – Pre-loaded schedule data

  • stanox_ref (pd.DataFrame, optional) – Pre-loaded STANOX reference data

  • tiploc_to_stanox (dict, optional) – Pre-loaded TIPLOC to STANOX mapping

  • incident_data_loaded (dict, optional) – Pre-loaded incident data by period

Returns:

Dictionary containing processed data as pandas DataFrames organized by weekday

Return type:

dict

rdmpy.preprocessor.save_stations_by_category(category=None, output_dir='processed_data')[source]

Process and save data for stations by DFT category as parquet files. FULLY OPTIMIZED VERSION: Loads schedule and delay data once, reuses for all stations. No redundant file I/O operations during batch processing.

Parameters:
  • category (str, optional) – DFT category to process (‘A’, ‘B’, ‘C1’, ‘C2’). If None, processes all categories.

  • output_dir (str) – Directory to save output files

Returns:

Summary of processing results

Return type:

dict

rdmpy.preprocessor.save_all_category_a_stations(output_dir='processed_data')[source]

Backward compatibility function for processing Category A stations only.

Parameters:

output_dir (str) – Directory to save output files

Returns:

Summary of processing results

Return type:

dict

rdmpy.preprocessor.main(st_code=None, process_category=None, process_all_categories=False)[source]

Main function to demonstrate the data processing and saving functionality.

Parameters:
  • st_code (str, optional) – STANOX code to process. If not provided and no category processing, will prompt for input.

  • process_category (str, optional) – DFT category to process (‘A’, ‘B’, ‘C1’, ‘C2’).

  • process_all_categories (bool) – If True, process all categories instead of a single station.

Utilities

Shared utility functions that support the preprocessing pipeline, including schedule and delay processing.

rdmpy.utils.load_schedule_data(st_code, schedule_data, reference_files)[source]

Load all necessary data for schedule processing.

Returns:

(train_count (deprecated, returns None), tiploc, schedule_data_loaded, stanox_ref, tiploc_to_stanox)

Return type:

tuple

rdmpy.utils.get_day_code_mapping()[source]

Create a mapping for day codes used throughout the application.

Returns:

Mapping from day indices to day codes (0=Monday, 1=Tuesday, …, 6=Sunday)

Return type:

dict

rdmpy.utils.extract_schedule_days_runs(schedule_entry)[source]

Extract schedule_days_runs from a schedule entry.

Parameters:

schedule_entry – Schedule entry dictionary

Returns:

Binary string representing days the schedule runs, or None if not found

Return type:

str

rdmpy.utils.get_english_day_types_from_schedule(schedule_entry)[source]

Convert schedule_days_runs to list of ENGLISH_DAY_TYPE values.

Parameters:

schedule_entry – Schedule entry dictionary

Returns:

List of ENGLISH_DAY_TYPE values that this schedule runs on

Return type:

list

rdmpy.utils.is_valid_schedule_entry(schedule_entry)[source]

Validate that a schedule entry has required structure.

Parameters:

schedule_entry – Schedule entry dictionary

Returns:

True if entry has required fields

Return type:

bool

rdmpy.utils.validate_schedule_locations(schedule_locations)[source]

Validate that schedule_locations is iterable and contains valid entries.

Parameters:

schedule_locations – List of location dictionaries

Returns:

True if valid

Return type:

bool

rdmpy.utils.is_valid_location_entry(location)[source]

Validate a single location entry has get method (dict-like).

Parameters:

location – Location entry to validate

Returns:

True if location is dict-like

Return type:

bool

rdmpy.utils.has_time_information(location)[source]

Check if location has either departure or arrival time.

Parameters:

location – Location dictionary

Returns:

True if either departure or arrival exists

Return type:

bool

rdmpy.utils.extract_location_time(location)[source]

Extract departure or arrival time from location, preferring departure.

Parameters:

location – Location dictionary

Returns:

Time in format ‘HHMM’ or None if unavailable

Return type:

str or None

rdmpy.utils.get_train_service_code(schedule_entry)[source]

Extract CIF_train_service_code from schedule entry.

Parameters:

schedule_entry – Schedule entry dictionary

Returns:

Train service code or None if not found

Return type:

str or None

rdmpy.utils.clean_dataframe_types(df, columns_to_convert)[source]

Standardize data types in a DataFrame for consistent merging.

Parameters:
  • df – DataFrame to clean

  • columns_to_convert – List of (column_name, target_type) tuples

Returns:

DataFrame with converted types

Return type:

DataFrame

rdmpy.utils.filter_valid_delay_entries(delay_df)[source]

Filter delay entries to keep only those with valid datetime strings.

Parameters:

delay_df – DataFrame of delay entries

Returns:

Filtered DataFrame

Return type:

DataFrame

rdmpy.utils.process_schedule(st_code, schedule_data=None, reference_files=None, train_count=None, tiploc=None, schedule_data_loaded=None, stanox_ref=None, tiploc_to_stanox=None)[source]

Generate a schedule timeline for all trains that match the specified STANOX code. OPTIMIZED VERSION - Accepts pre-loaded data to avoid reloading from files.

Parameters:
  • st_code (str) – STANOX code to process.

  • schedule_data (dict, optional) – Dictionary containing schedule data file paths.

  • reference_files (dict, optional) – Dictionary containing reference file paths.

  • train_count (int, optional) – Expected simple count of number of trains (from pre-loaded data).

  • tiploc (str, optional) – TIPLOC code corresponding to st_code.

  • schedule_data_loaded (list, optional) – Pre-loaded schedule data.

  • stanox_ref (dict, optional) – Pre-loaded STANOX reference data.

  • tiploc_to_stanox (dict, optional) – Pre-loaded TIPLOC to STANOX mapping.

Returns:

Sorted schedule timeline.

Return type:

list

rdmpy.utils.extract_day_of_week_from_delay(delay_entry)[source]

Extract the day of the week from PLANNED_ORIGIN_WTT_DATETIME in delay data.

Parameters:

delay_entry – Delay entry dictionary

Returns:

Day of week code (MO, TU, WE, TH, FR, SA, SU) or None if parsing fails

Return type:

str

rdmpy.utils.schedule_runs_on_day(schedule_entry, target_day)[source]

Check if a schedule entry runs on a specific day of the week.

Parameters:
  • schedule_entry – Schedule entry dictionary (from processed schedule)

  • target_day – Day code (MO, TU, WE, TH, FR, SA, SU)

Returns:

True if the schedule runs on the target day

Return type:

bool

rdmpy.utils.find_location_by_tiploc(schedule_locations, target_tiploc)[source]

Find first location matching target TIPLOC.

Parameters:
  • schedule_locations – List of location dictionaries

  • target_tiploc – TIPLOC code to match

Returns:

Matching location or None

Return type:

dict or None

rdmpy.utils.find_origin_location(schedule_locations, target_tiploc)[source]

Find origin location (LO or L0) at target TIPLOC.

Parameters:
  • schedule_locations – List of location dictionaries

  • target_tiploc – TIPLOC code to match

Returns:

Origin location or None

Return type:

dict or None

rdmpy.utils.find_destination_location(schedule_locations, target_tiploc)[source]

Find destination location (LT) at target TIPLOC.

Parameters:
  • schedule_locations – List of location dictionaries

  • target_tiploc – TIPLOC code to match

Returns:

Destination location or None

Return type:

dict or None

rdmpy.utils.determine_station_role(relevant_location, origin_location, destination_location, tiploc)[source]

Determine the role of station in train’s journey.

Parameters:
  • relevant_location – Matched location at target TIPLOC

  • origin_location – Origin location (if exists)

  • destination_location – Destination location (if exists)

  • tiploc – Target TIPLOC code

Returns:

“Origin”, “Destination”, “Intermediate”, or “Unknown”

Return type:

str

rdmpy.utils.build_train_record(train_service_code, origin_location, destination_location, relevant_location, s_time, schedule_day_types, tiploc, tiploc_to_stanox, stanox_ref, st_code)[source]

Build a single train record for processed schedule.

Parameters:
  • train_service_code – CIF service code

  • origin_location – Origin location dict

  • destination_location – Destination location dict

  • relevant_location – Location at target station

  • s_time – Time in HHMM format

  • schedule_day_types – List of day codes

  • tiploc – Target TIPLOC

  • tiploc_to_stanox – TIPLOC to STANOX mapping

  • stanox_ref – STANOX reference data (dict)

  • st_code – Target STANOX code

Returns:

Train record with all fields populated

Return type:

dict

rdmpy.utils.extract_time_components_from_delays(delays_df)[source]

Extract time components from delay DataFrame for matching.

Parameters:

delays_df – DataFrame of delays

Returns:

DataFrame with added origin_time, dest_time, event_time columns

Return type:

DataFrame

rdmpy.utils.expand_schedule_by_days(schedule_df)[source]

Expand schedule entries for multi-day schedules (one row per day).

Parameters:

schedule_df – Schedule DataFrame

Returns:

Expanded DataFrame with current_day column

Return type:

DataFrame

rdmpy.utils.process_delays(incident_files, st_code, output_dir)[source]

Processes delay files by converting them to vertical JSON, removing irrelevant columns, and filtering rows.

Parameters:
  • incident_files (dict) – Dictionary with period names as keys and file paths as values.

  • output_dir (str) – Directory to save the converted JSON files.

  • st_code (str) – The station code to filter delays.

Returns:

Dictionary with period names as keys and processed DataFrames as values.

Return type:

dict

rdmpy.utils.extract_day_from_each_delay(delays_df)[source]

Extract day of week for each delay entry.

Parameters:

delays_df – DataFrame of delays

Returns:

List of day codes (MO, TU, etc.) aligned with rows

Return type:

list

rdmpy.utils.add_delay_day_column(delays_df)[source]

Add delay_day column to delays DataFrame, filtering for valid entries.

Parameters:

delays_df – DataFrame of delays

Returns:

DataFrame with delay_day column (filtered to valid entries)

Return type:

DataFrame

rdmpy.utils.find_matched_delays_info(matched_results_df)[source]

Extract matched delay information for comparison with unmatched.

Parameters:

matched_results_df – Filtered DataFrame of matched results

Returns:

Set of tuples (TRAIN_SERVICE_CODE, DELAY_DAY, PFPI_MINUTES)

Return type:

set

rdmpy.utils.identify_unmatched_delays(delays_df, matched_delay_info)[source]

Identify delays that were not matched with schedule entries.

Parameters:
  • delays_df – DataFrame of all delays

  • matched_delay_info – Set of matched delay tuples

Returns:

DataFrame of unmatched delays

Return type:

DataFrame

rdmpy.utils.determine_planned_call_time(row, st_code)[source]

Determine planned call time for unmatched delay based on station role.

Parameters:
  • row – Delay row (Series or dict-like)

  • st_code – Station code being analyzed

Returns:

Time in HHMM format

Return type:

str

rdmpy.utils.build_unmatched_entry(delay_row, st_code)[source]

Build a record for unmatched delay entry.

Parameters:
  • delay_row – Delay Series/dict

  • st_code – Station code being analyzed

Returns:

Complete delay entry record

Return type:

dict

rdmpy.utils.apply_delays_to_matches(result_df, matched_mask)[source]

Update actual times and delay info for matched entries.

Parameters:
  • result_df – Result DataFrame (modified in-place)

  • matched_mask – Boolean mask of matched entries

Returns:

Updated DataFrame

Return type:

DataFrame

rdmpy.utils.filter_result_columns(combined_df)[source]

Filter result DataFrame to required columns only.

Parameters:

combined_df – Combined results DataFrame

Returns:

Filtered DataFrame with core columns

Return type:

DataFrame

rdmpy.utils.adjust_schedule_timeline(processed_schedule, processed_delays, st_code=None)[source]

Adjust the schedule timeline based on delays and generate an updated timeline. PANDAS OPTIMIZED VERSION: Uses pandas DataFrames for ultra-fast matching operations.

Parameters:
  • processed_schedule (list) – List of processed schedule dictionaries.

  • processed_delays (list) – List of delay records from all days.

  • st_code (str, optional) – The station code being analyzed to determine correct planned call times.

Returns:

Adjusted schedule timeline sorted by actual calls.

Return type:

list

rdmpy.utils.load_schedule_data_once(schedule_data, reference_files)[source]

Load schedule data once to avoid reloading for each station.

Parameters:
  • schedule_data (dict) – Dictionary containing schedule data file paths

  • reference_files (dict) – Dictionary containing reference file paths

Returns:

(schedule_data_loaded, stanox_ref, tiploc_to_stanox)

Return type:

tuple

rdmpy.utils.load_incident_data_once(incident_files)[source]

Load all incident data once to avoid reloading for each station.

Parameters:

incident_files (dict) – Dictionary with period names as keys and file paths as values

Returns:

Dictionary with period names as keys and loaded DataFrames as values

Return type:

dict

rdmpy.utils.process_delays_optimized(incident_data_loaded, st_code, output_dir=None)[source]

Process delays using pre-loaded incident data to avoid file I/O.

Parameters:
  • incident_data_loaded (dict) – Pre-loaded incident data by period

  • st_code (str) – The station code to filter delays

  • output_dir (str, optional) – Directory to save converted JSON files (not used in optimized mode)

Returns:

Dictionary with period names as keys and processed DataFrames as values

Return type:

dict

Data Loading

Tools for loading processed data from the processed_data/ folder.

rdmpy.outputs.load_data.load_processed_data(base_dir='processed_data')[source]

Load all .parquet files from the processed_data folder (recursively) into a single pandas DataFrame.

Automatically tries both pyarrow and fastparquet engines. Adds STANOX (folder name) and DAY (file name) columns.

Parameters:

base_dir (str)

Return type:

DataFrame

Analysis Tools

Functions for analyzing and visualizing network-level data across stations.

rdmpy.outputs.analysis_tools.find_processed_data_path()[source]

Find the processed_data directory by checking multiple possible locations. Returns the path if found, None otherwise.

rdmpy.outputs.analysis_tools.aggregate_view(incident_number, start_date)[source]

Multi-day incident analysis with clean separation of concerns. Creates 3 charts: hourly delays, severity distribution, and event timeline.

Parameters:

incident_numberint/str

The incident number to analyze

start_datestr

Starting date in ‘DD-MMM-YYYY’ format

Returns:

dict : Summary statistics for the incident

rdmpy.outputs.analysis_tools.calculate_incident_summary_stats(df, delay_data_all, unique_dates, files_processed, files_with_data, incident_number, num_days)[source]

Calculate final summary statistics for the incident.

Parameters:

dfpd.DataFrame

The incident data

delay_data_allpd.DataFrame

Filtered dataframe with only delay events

unique_dateslist

List of unique dates in the incident

files_processedint

Number of files processed

files_with_dataint

Number of files with matching data

incident_numberint/str

The incident number

num_daysint

Number of days the incident spans

Returns:

dict : Summary statistics dictionary

rdmpy.outputs.analysis_tools.aggregate_view_multiday(incident_number, start_date)[source]

Multi-day incident analysis that handles incidents spanning multiple days. Creates separate charts for each day with labels (a), (b), (c) or (a.1), (a.2), etc.

For single-day incidents, labels are simply (a), (b), (c). For multi-day incidents: (a.1), (a.2) for hourly charts, (b) for severity, (c.1), (c.2) for timelines.

WARNING: If incident spans more than 3 days, the same incident number may refer to multiple separate incidents.

Parameters:

incident_numberint/str

The incident number to analyze

start_datestr

Starting date in ‘DD-MMM-YYYY’ format (used for display purposes only - all days are loaded)

Returns:

dict : Summary statistics for the incident across all days

rdmpy.outputs.analysis_tools.incident_view(incident_code, incident_date, analysis_date, analysis_hhmm, period_minutes)[source]

Generate a detailed table showing each station affected by an incident with their calls and delays for a specific time period during the incident lifecycle. Shows trains that were shifted between time periods due to delays.

Parameters: incident_code (int/float): The incident number to analyze incident_date (str): Incident date in ‘DD-MMM-YYYY’ format (used to locate the incident) analysis_date (str): Specific date to analyze in ‘DD-MMM-YYYY’ format analysis_hhmm (str): Start time for analysis in ‘HHMM’ format (e.g., ‘1830’ for 18:30) period_minutes (int): Minutes from analysis start time to analyze

Returns: tuple: (pandas.DataFrame, str, str) - Results table, incident start time string, and analysis period string

rdmpy.outputs.analysis_tools.incident_view_heatmap_html(incident_code, incident_date, analysis_date, analysis_hhmm, period_minutes, interval_minutes=10, output_file=None)[source]

Create dynamic interactive HTML heatmap showing railway network delays. Displays delay intensity as vibrant heatmap visualization with incident locations and timeline animation.

Parameters: incident_code (int/float): The incident number to analyze incident_date (str): Date when incident started in ‘DD-MMM-YYYY’ format analysis_date (str): Specific date to analyze in ‘DD-MMM-YYYY’ format analysis_hhmm (str): Start time for analysis in ‘HHMM’ format (e.g., ‘1900’) period_minutes (int): Total duration of analysis period in minutes interval_minutes (int): Duration of each interval in minutes (default: 10) output_file (str): Optional HTML file path to save

Returns: str: HTML content of the interactive heatmap

rdmpy.outputs.analysis_tools.train_view(all_data, origin_code, destination_code, input_date_str)[source]

View all train journeys between an OD pair and check for incidents on a specific date. Corrects PLANNED_CALLS using ACTUAL_CALLS - PFPI_MINUTES.

Refactored for single responsibility: Data filtering + transformation -> Display

Parameters:

all_datapd.DataFrame

Complete train data with OD information

origin_codestr or int

Origin location code

destination_codestr or int

Destination location code

input_date_strstr

Date in ‘DD-MMM-YYYY’ format

Returns:

pd.DataFrame or str : Incident data or message

rdmpy.outputs.analysis_tools.get_stanox_for_service(all_data, train_service_code, origin_code, destination_code, date_str=None)[source]

Get ALL unique STANOX codes that a train service calls at, regardless of specific train instance. Returns a list of all stations that this service code stops at.

Strategy: 1. Filter to the specified service code and OD pair 2. Optionally filter by date if provided 3. Collect ALL unique STANOX codes that appear with valid scheduled stops 4. Return the complete set (map will connect them by proximity)

rdmpy.outputs.analysis_tools.map_train_journey_with_incidents(all_data, service_stanox, incident_results=None, stations_ref_path=None, incident_color='purple', service_code=None, date_str=None)[source]

Map train journey by connecting stations based on GEOGRAPHIC PROXIMITY (not chronological order).

  1. Load reference stations and prepare STANOX coordinate data

  2. Connect service stations + incident stations using minimum spanning tree

  3. Color-grade station markers by total delay

  4. Map each incident with chronologically-ranked numbered markers

Refactored to use focused helper functions for data preparation, calculations, and visualization.

rdmpy.outputs.analysis_tools.train_view_2(all_data, service_stanox, service_code, stations_ref_path=None)[source]

Compute reliability metrics for each station in the service_stanox list for a given train service code.

Metrics now exclude PFPI_MINUTES == 0.0 when computing mean/variance and incident counts. OnTime% is computed on the original PFPI distribution (<=0) so it still reflects punctuality.

Returns a DataFrame with columns: ServiceCode, StationName, MeanDelay, DelayVariance, OnTime%, IncidentCount

NEW: Also includes stations from all_data that experienced delays for this service code.

rdmpy.outputs.analysis_tools.plot_reliability_graphs(all_data, service_stanox, service_code, stations_ref_path=None, cap_minutes=75)[source]

Generate overlapping density (KDE) curves and cumulative distribution plots: Delay distribution per station (all curves overlapping, different colours), excluding delay==0.0 and capped at cap_minutes.

NEW: Also includes stations from all_data that experienced delays for this service code.

rdmpy.outputs.analysis_tools.create_time_view_html(date_str, all_data)[source]

Create an HTML map showing affected stations for a given date, with markers sized by incident count and colored by total PFPI minutes. Prints incident statistics for the specific date before generating the map.

Refactored to use focused helper functions for statistics, data aggregation, marker creation, and finalization.

rdmpy.outputs.analysis_tools.station_view_yearly(station_id, interval_minutes=30)[source]

Station analysis for yearly data across all incidents - simplified output. Analyzes all days of the week for a station and separates incident vs normal operations.

rdmpy.outputs.analysis_tools.plot_trains_in_system_vs_delay(station_id, all_data, time_window_minutes=60, num_platforms=12, figsize=(12, 8), max_delay_percentile=98, dwell_time_minutes=5, time_range=None)[source]

Visualize the relationship between normalized trains in system and mean delay per hour.

Similar to plot_variable_relationships but uses trains in system (occupancy) instead of flow (throughput) on the x-axis.

MERGED ANALYSIS: Combines weekdays and weekends into a single comprehensive view.

Uses the EXACT SAME logic as plot_variable_relationships: - X-axis: Normalized trains in system per hour (from plot_bottleneck_analysis calculation) - Y-axis: Mean delay per hour ONLY from DELAYED trains (delay > 0), NOT all trains - One scatter point per HOUR (not per train) - Binned by trains in system with Q25-Q75 delay ranges

THEORY: - As trains accumulate in the system (high occupancy), delays should increase - If delays remain low despite high trains in system, indicates good platform management - If delays spike at low trains in system, indicates operational inefficiencies

Parameters:

station_idstr

The station STANOX code

all_datapd.DataFrame

The complete dataset containing all train records

time_window_minutesint

Time window in minutes (default: 60)

num_platformsint

Number of platforms for normalization (default: 12)

figsizetuple

Figure size (default: (12, 8))

max_delay_percentileint

Percentile to trim extreme values (default: 98)

dwell_time_minutesint

Typical dwell time at station (default: 5 minutes)

time_rangetuple or None

Optional (start, end) tuple to filter by time range (default: None)

rdmpy.outputs.analysis_tools.explore_delay_outliers(station_id, all_data, num_platforms=6, dwell_time_minutes=5, figsize=(12, 8), time_range=None)[source]

Specialized visualization to explore delay outliers and extreme cases. Shows delay percentiles vs system load with binned averages.

Parameters:

station_idstr

The STANOX code for the station

all_datapd.DataFrame

The complete dataset with all station data

num_platformsint

Number of platforms at the station (for normalization)

dwell_time_minutesint

Typical dwell time at the station in minutes

figsizetuple

Figure size (width, height)

time_rangetuple or None

Optional (start, end) tuple to filter by time range (default: None)

Returns:

pd.DataFrame

Hourly statistics including delay percentiles and system load metrics

rdmpy.outputs.analysis_tools.station_view(station_id, all_data, num_platforms=6, time_window_minutes=60, max_delay_percentile=98, dwell_time_minutes=5, figsize=(8, 4.7), time_range=None)[source]

Comprehensive merged station performance analysis combining 3 visualization functions. Analyzes on-time performance and system load relationships.

Parameters:

station_idstr

The STANOX code for the station

all_datapd.DataFrame

The complete dataset with all station data

num_platformsint

Number of platforms at the station (for normalization)

time_window_minutesint

Time window for analysis (typically 60 for hourly)

max_delay_percentileint

Maximum delay percentile to consider (typically 98)

dwell_time_minutesint

Typical dwell time at the station in minutes

figsizetuple

Figure size (width, height) - applied to all plots

time_rangetuple or None

Optional (start, end) tuple to filter by time range. Will filter valid_data to only include rows where arrival_time falls within this range (default: None)

Returns:

dict

Dictionary containing hourly_stats and bin_stats DataFrames

rdmpy.outputs.analysis_tools.comprehensive_station_analysis(station_id, all_data, num_platforms=6, dwell_time_minutes=5, max_delay_percentile=98, time_range=None)[source]

Combined comprehensive station analysis displaying all visualizations in a single column figure.

Combines plot_trains_in_system_vs_delay, explore_delay_outliers, and station_view without changing any of their internal logic.

Parameters:

station_idstr

The STANOX code for the station

all_datapd.DataFrame

The complete dataset with all station data

num_platformsint

Number of platforms at the station (for normalization)

dwell_time_minutesint

Typical dwell time at the station in minutes

max_delay_percentileint

Maximum delay percentile to consider (typically 98)

time_rangetuple or None

Optional (start, end) tuple to filter by time range (default: None)

Returns:

dict

Dictionary containing all results from the three analyses

rdmpy.outputs.analysis_tools.station_analysis_with_time_range(station_id, all_data, time_range=None, num_platforms=6, dwell_time_minutes=5, max_delay_percentile=98)[source]

Wrapper around comprehensive_station_analysis that adds time_range filtering.

Filters data by optional time_range, then calls the original function with the filtered dataset. Original function logic remains unchanged.

Parameters:

station_idstr

The STANOX code for the station

all_datapd.DataFrame

Complete dataset with all station data

time_rangetuple or None

Tuple of (start, end) as dates or datetimes - Dates will be expanded to full day (00:00 to 23:59:59) - Same date for both will cover entire day - None uses all data (default) Examples:

(‘2024-01-15’, ‘2024-01-15’) # Single day (‘2024-01-01’, ‘2024-06-30’) # Date range (‘2024-01-15 08:00’, ‘2024-01-15 17:00’) # Specific times

num_platformsint

Number of platforms at station (default: 6)

dwell_time_minutesint

Typical dwell time at station in minutes (default: 5)

max_delay_percentileint

Maximum delay percentile to consider (default: 98)

Returns:

dict

Dictionary containing all results from comprehensive_station_analysis

rdmpy.outputs.analysis_tools.station_view_yearly_with_time_range(station_id, interval_minutes=30, time_range=None)[source]

Wrapper around station_view_yearly that adds time_range filtering.

Calls the original function and filters its results by optional time_range. Original function logic remains unchanged.

Parameters:

station_idstr

The STANOX code for the station

interval_minutesint

Interval size for analysis in minutes (default: 30)

time_rangetuple or None

Tuple of (start, end) as dates or datetimes - Dates will be expanded to full day (00:00 to 23:59:59) - Same date for both will cover entire day - None uses all data (default) Examples:

(‘2024-01-15’, ‘2024-01-15’) # Single day (‘2024-01-01’, ‘2024-06-30’) # Date range (‘2024-01-15 08:00’, ‘2024-01-15 17:00’) # Specific times

Returns:

tuple

(incident_summary, normal_summary) DataFrames filtered by time_range