API Reference

This section provides detailed documentation of the rdmpy Python API, organized by module.

Data Pre-Processing

Core preprocessor module for matching schedules with delays and organizing results by station.

Save processed schedule and delay data to parquet files for railway stations by DFT category. This script can process any DFT category (A, B, C1, C2) or all categories at once. Processes schedule data, applies delays, and saves the results as pandas DataFrames organized by day of the week for each station.

Usage: - Single station: python -m preprocess.preprocessor <STANOX_CODE> - Category A: python -m preprocess.preprocessor –category-A - Category B: python -m preprocess.preprocessor –category-B - Category C1: python -m preprocess.preprocessor –category-C1 - Category C2: python -m preprocess.preprocessor –category-C2 - All categories: python -m preprocess.preprocessor –all-categories - Interactive: python -m preprocess.preprocessor

rdmpy.preprocessor.get_weekday_from_schedule_entry(entry)[source]

Extract the primary weekday from a schedule entry for sorting purposes.

Parameters:: entry – Schedule entry dictionary
Returns:: Weekday index (0=Monday, 6=Sunday) for sorting
Return type:: int

rdmpy.preprocessor.load_stations(category=None)[source]

Load stations from the reference JSON file.

Parameters:: category (str, optional) – DFT category to filter by (e.g., ‘A’, ‘B’, ‘C1’, ‘C2’). If None, returns all stations.
Returns:: List of STANOX codes for the specified category or all stations
Return type:: list

rdmpy.preprocessor.save_processed_data_by_weekday_to_dataframe(st_code, output_dir='processed_data', schedule_data_loaded=None, stanox_ref=None, tiploc_to_stanox=None, incident_data_loaded=None)[source]

Process schedule and delay data, then save to pandas DataFrame organized by weekday. OPTIMIZED VERSION: Accepts pre-loaded data to avoid file I/O for each station.

Parameters:

st_code (str) – STANOX code to process
output_dir (str) – Directory to save output files
schedule_data_loaded (pd.DataFrame, optional) – Pre-loaded schedule data
stanox_ref (pd.DataFrame, optional) – Pre-loaded STANOX reference data
tiploc_to_stanox (dict, optional) – Pre-loaded TIPLOC to STANOX mapping
incident_data_loaded (dict, optional) – Pre-loaded incident data by period

Returns:

Dictionary containing processed data as pandas DataFrames organized by weekday

Return type:

dict

rdmpy.preprocessor.save_stations_by_category(category=None, output_dir='processed_data')[source]

Process and save data for stations by DFT category as parquet files. FULLY OPTIMIZED VERSION: Loads schedule and delay data once, reuses for all stations. No redundant file I/O operations during batch processing.

Parameters:

category (str, optional) – DFT category to process (‘A’, ‘B’, ‘C1’, ‘C2’). If None, processes all categories.
output_dir (str) – Directory to save output files

Returns:

Summary of processing results

Return type:

dict

rdmpy.preprocessor.save_all_category_a_stations(output_dir='processed_data')[source]

Backward compatibility function for processing Category A stations only.

Parameters:: output_dir (str) – Directory to save output files
Returns:: Summary of processing results
Return type:: dict

rdmpy.preprocessor.main(st_code=None, process_category=None, process_all_categories=False)[source]

Main function to demonstrate the data processing and saving functionality.

Parameters:

st_code (str, optional) – STANOX code to process. If not provided and no category processing, will prompt for input.
process_category (str, optional) – DFT category to process (‘A’, ‘B’, ‘C1’, ‘C2’).
process_all_categories (bool) – If True, process all categories instead of a single station.

Utilities

Shared utility functions that support the preprocessing pipeline, including schedule and delay processing.

rdmpy.utils.load_schedule_data(st_code, schedule_data, reference_files)[source]

Load all necessary data for schedule processing.

Returns:: (train_count (deprecated, returns None), tiploc, schedule_data_loaded, stanox_ref, tiploc_to_stanox)
Return type:: tuple

rdmpy.utils.get_day_code_mapping()[source]

Create a mapping for day codes used throughout the application.

Returns:: Mapping from day indices to day codes (0=Monday, 1=Tuesday, …, 6=Sunday)
Return type:: dict

rdmpy.utils.extract_schedule_days_runs(schedule_entry)[source]

Extract schedule_days_runs from a schedule entry.

Parameters:: schedule_entry – Schedule entry dictionary
Returns:: Binary string representing days the schedule runs, or None if not found
Return type:: str

rdmpy.utils.get_english_day_types_from_schedule(schedule_entry)[source]

Convert schedule_days_runs to list of ENGLISH_DAY_TYPE values.

Parameters:: schedule_entry – Schedule entry dictionary
Returns:: List of ENGLISH_DAY_TYPE values that this schedule runs on
Return type:: list

rdmpy.utils.is_valid_schedule_entry(schedule_entry)[source]

Validate that a schedule entry has required structure.

Parameters:: schedule_entry – Schedule entry dictionary
Returns:: True if entry has required fields
Return type:: bool

rdmpy.utils.validate_schedule_locations(schedule_locations)[source]

Validate that schedule_locations is iterable and contains valid entries.

Parameters:: schedule_locations – List of location dictionaries
Returns:: True if valid
Return type:: bool

rdmpy.utils.is_valid_location_entry(location)[source]

Validate a single location entry has get method (dict-like).

Parameters:: location – Location entry to validate
Returns:: True if location is dict-like
Return type:: bool

rdmpy.utils.has_time_information(location)[source]

Check if location has either departure or arrival time.

Parameters:: location – Location dictionary
Returns:: True if either departure or arrival exists
Return type:: bool

rdmpy.utils.extract_location_time(location)[source]

Extract departure or arrival time from location, preferring departure.

Parameters:: location – Location dictionary
Returns:: Time in format ‘HHMM’ or None if unavailable
Return type:: str or None

rdmpy.utils.get_train_service_code(schedule_entry)[source]

Extract CIF_train_service_code from schedule entry.

Parameters:: schedule_entry – Schedule entry dictionary
Returns:: Train service code or None if not found
Return type:: str or None

rdmpy.utils.clean_dataframe_types(df, columns_to_convert)[source]

Standardize data types in a DataFrame for consistent merging.

Parameters:

df – DataFrame to clean
columns_to_convert – List of (column_name, target_type) tuples

Returns:

DataFrame with converted types

Return type:

DataFrame

rdmpy.utils.filter_valid_delay_entries(delay_df)[source]

Filter delay entries to keep only those with valid datetime strings.

Parameters:: delay_df – DataFrame of delay entries
Returns:: Filtered DataFrame
Return type:: DataFrame

rdmpy.utils.process_schedule(st_code, schedule_data=None, reference_files=None, train_count=None, tiploc=None, schedule_data_loaded=None, stanox_ref=None, tiploc_to_stanox=None)[source]

Generate a schedule timeline for all trains that match the specified STANOX code. OPTIMIZED VERSION - Accepts pre-loaded data to avoid reloading from files.

Parameters:

st_code (str) – STANOX code to process.
schedule_data (dict, optional) – Dictionary containing schedule data file paths.
reference_files (dict, optional) – Dictionary containing reference file paths.
train_count (int, optional) – Expected simple count of number of trains (from pre-loaded data).
tiploc (str, optional) – TIPLOC code corresponding to st_code.
schedule_data_loaded (list, optional) – Pre-loaded schedule data.
stanox_ref (dict, optional) – Pre-loaded STANOX reference data.
tiploc_to_stanox (dict, optional) – Pre-loaded TIPLOC to STANOX mapping.

Returns:

Sorted schedule timeline.

Return type:

list

rdmpy.utils.extract_day_of_week_from_delay(delay_entry)[source]

Extract the day of the week from PLANNED_ORIGIN_WTT_DATETIME in delay data.

Parameters:: delay_entry – Delay entry dictionary
Returns:: Day of week code (MO, TU, WE, TH, FR, SA, SU) or None if parsing fails
Return type:: str

rdmpy.utils.schedule_runs_on_day(schedule_entry, target_day)[source]

Check if a schedule entry runs on a specific day of the week.

Parameters:

schedule_entry – Schedule entry dictionary (from processed schedule)
target_day – Day code (MO, TU, WE, TH, FR, SA, SU)

Returns:

True if the schedule runs on the target day

Return type:

bool

rdmpy.utils.find_location_by_tiploc(schedule_locations, target_tiploc)[source]

Find first location matching target TIPLOC.

Parameters:

schedule_locations – List of location dictionaries
target_tiploc – TIPLOC code to match

Returns:

Matching location or None

Return type:

dict or None

rdmpy.utils.find_origin_location(schedule_locations, target_tiploc)[source]

Find origin location (LO or L0) at target TIPLOC.

Parameters:

schedule_locations – List of location dictionaries
target_tiploc – TIPLOC code to match

Returns:

Origin location or None

Return type:

dict or None

rdmpy.utils.find_destination_location(schedule_locations, target_tiploc)[source]

Find destination location (LT) at target TIPLOC.

Parameters:

schedule_locations – List of location dictionaries
target_tiploc – TIPLOC code to match

Returns:

Destination location or None

Return type:

dict or None

rdmpy.utils.determine_station_role(relevant_location, origin_location, destination_location, tiploc)[source]

Determine the role of station in train’s journey.

Parameters:

relevant_location – Matched location at target TIPLOC
origin_location – Origin location (if exists)
destination_location – Destination location (if exists)
tiploc – Target TIPLOC code

Returns:

“Origin”, “Destination”, “Intermediate”, or “Unknown”

Return type:

str

rdmpy.utils.build_train_record(train_service_code, origin_location, destination_location, relevant_location, s_time, schedule_day_types, tiploc, tiploc_to_stanox, stanox_ref, st_code)[source]

Build a single train record for processed schedule.

Parameters:

train_service_code – CIF service code
origin_location – Origin location dict
destination_location – Destination location dict
relevant_location – Location at target station
s_time – Time in HHMM format
schedule_day_types – List of day codes
tiploc – Target TIPLOC
tiploc_to_stanox – TIPLOC to STANOX mapping
stanox_ref – STANOX reference data (dict)
st_code – Target STANOX code

Returns:

Train record with all fields populated

Return type:

dict

rdmpy.utils.extract_time_components_from_delays(delays_df)[source]

Extract time components from delay DataFrame for matching.

Parameters:: delays_df – DataFrame of delays
Returns:: DataFrame with added origin_time, dest_time, event_time columns
Return type:: DataFrame

rdmpy.utils.expand_schedule_by_days(schedule_df)[source]

Expand schedule entries for multi-day schedules (one row per day).

Parameters:: schedule_df – Schedule DataFrame
Returns:: Expanded DataFrame with current_day column
Return type:: DataFrame

rdmpy.utils.process_delays(incident_files, st_code, output_dir)[source]

Processes delay files by converting them to vertical JSON, removing irrelevant columns, and filtering rows.

Parameters:

incident_files (dict) – Dictionary with period names as keys and file paths as values.
output_dir (str) – Directory to save the converted JSON files.
st_code (str) – The station code to filter delays.

Returns:

Dictionary with period names as keys and processed DataFrames as values.

Return type:

dict

rdmpy.utils.extract_day_from_each_delay(delays_df)[source]

Extract day of week for each delay entry.

Parameters:: delays_df – DataFrame of delays
Returns:: List of day codes (MO, TU, etc.) aligned with rows
Return type:: list

rdmpy.utils.add_delay_day_column(delays_df)[source]

Add delay_day column to delays DataFrame, filtering for valid entries.

Parameters:: delays_df – DataFrame of delays
Returns:: DataFrame with delay_day column (filtered to valid entries)
Return type:: DataFrame

rdmpy.utils.find_matched_delays_info(matched_results_df)[source]

Extract matched delay information for comparison with unmatched.

Parameters:: matched_results_df – Filtered DataFrame of matched results
Returns:: Set of tuples (TRAIN_SERVICE_CODE, DELAY_DAY, PFPI_MINUTES)
Return type:: set

rdmpy.utils.identify_unmatched_delays(delays_df, matched_delay_info)[source]

Identify delays that were not matched with schedule entries.

Parameters:

delays_df – DataFrame of all delays
matched_delay_info – Set of matched delay tuples

Returns:

DataFrame of unmatched delays

Return type:

DataFrame

rdmpy.utils.determine_planned_call_time(row, st_code)[source]

Determine planned call time for unmatched delay based on station role.

Parameters:

row – Delay row (Series or dict-like)
st_code – Station code being analyzed

Returns:

Time in HHMM format

Return type:

str

rdmpy.utils.build_unmatched_entry(delay_row, st_code)[source]

Build a record for unmatched delay entry.

Parameters:

delay_row – Delay Series/dict
st_code – Station code being analyzed

Returns:

Complete delay entry record

Return type:

dict

rdmpy.utils.apply_delays_to_matches(result_df, matched_mask)[source]

Update actual times and delay info for matched entries.

Parameters:

result_df – Result DataFrame (modified in-place)
matched_mask – Boolean mask of matched entries

Returns:

Updated DataFrame

Return type:

DataFrame

rdmpy.utils.filter_result_columns(combined_df)[source]

Filter result DataFrame to required columns only.

Parameters:: combined_df – Combined results DataFrame
Returns:: Filtered DataFrame with core columns
Return type:: DataFrame

rdmpy.utils.adjust_schedule_timeline(processed_schedule, processed_delays, st_code=None)[source]

Adjust the schedule timeline based on delays and generate an updated timeline. PANDAS OPTIMIZED VERSION: Uses pandas DataFrames for ultra-fast matching operations.

Parameters:

processed_schedule (list) – List of processed schedule dictionaries.
processed_delays (list) – List of delay records from all days.
st_code (str, optional) – The station code being analyzed to determine correct planned call times.

Returns:

Adjusted schedule timeline sorted by actual calls.

Return type:

list

rdmpy.utils.load_schedule_data_once(schedule_data, reference_files)[source]

Load schedule data once to avoid reloading for each station.

Parameters:

schedule_data (dict) – Dictionary containing schedule data file paths
reference_files (dict) – Dictionary containing reference file paths

Returns:

(schedule_data_loaded, stanox_ref, tiploc_to_stanox)

Return type:

tuple

rdmpy.utils.load_incident_data_once(incident_files)[source]

Load all incident data once to avoid reloading for each station.

Parameters:: incident_files (dict) – Dictionary with period names as keys and file paths as values
Returns:: Dictionary with period names as keys and loaded DataFrames as values
Return type:: dict

rdmpy.utils.process_delays_optimized(incident_data_loaded, st_code, output_dir=None)[source]

Process delays using pre-loaded incident data to avoid file I/O.

Parameters:

incident_data_loaded (dict) – Pre-loaded incident data by period
st_code (str) – The station code to filter delays
output_dir (str, optional) – Directory to save converted JSON files (not used in optimized mode)

Returns:

Dictionary with period names as keys and processed DataFrames as values

Return type:

dict

Data Loading

Tools for loading processed data from the processed_data/ folder.

rdmpy.outputs.load_data.load_processed_data(base_dir='processed_data')[source]

Load all .parquet files from the processed_data folder (recursively) into a single pandas DataFrame.

Automatically tries both pyarrow and fastparquet engines. Adds STANOX (folder name) and DAY (file name) columns.

Parameters:: base_dir (str)
Return type:: DataFrame

Analysis Tools

Functions for analyzing and visualizing network-level data across stations.

rdmpy.outputs.analysis_tools.find_processed_data_path()[source]: Find the processed_data directory by checking multiple possible locations. Returns the path if found, None otherwise.

rdmpy.outputs.analysis_tools.aggregate_view(incident_number, start_date)[source]

Multi-day incident analysis with clean separation of concerns. Creates 3 charts: hourly delays, severity distribution, and event timeline.

Parameters:

incident_numberint/str: The incident number to analyze
start_datestr: Starting date in ‘DD-MMM-YYYY’ format

Returns:

dict : Summary statistics for the incident

rdmpy.outputs.analysis_tools.calculate_incident_summary_stats(df, delay_data_all, unique_dates, files_processed, files_with_data, incident_number, num_days)[source]

Calculate final summary statistics for the incident.

Parameters:

dfpd.DataFrame: The incident data
delay_data_allpd.DataFrame: Filtered dataframe with only delay events
unique_dateslist: List of unique dates in the incident
files_processedint: Number of files processed
files_with_dataint: Number of files with matching data
incident_numberint/str: The incident number
num_daysint: Number of days the incident spans

Returns:

dict : Summary statistics dictionary

rdmpy.outputs.analysis_tools.aggregate_view_multiday(incident_number, start_date)[source]

Multi-day incident analysis that handles incidents spanning multiple days. Creates separate charts for each day with labels (a), (b), (c) or (a.1), (a.2), etc.

For single-day incidents, labels are simply (a), (b), (c). For multi-day incidents: (a.1), (a.2) for hourly charts, (b) for severity, (c.1), (c.2) for timelines.

WARNING: If incident spans more than 3 days, the same incident number may refer to multiple separate incidents.

Parameters:

incident_numberint/str: The incident number to analyze
start_datestr: Starting date in ‘DD-MMM-YYYY’ format (used for display purposes only - all days are loaded)

Returns:

dict : Summary statistics for the incident across all days

rdmpy.outputs.analysis_tools.incident_view(incident_code, incident_date, analysis_date, analysis_hhmm, period_minutes)[source]

Generate a detailed table showing each station affected by an incident with their calls and delays for a specific time period during the incident lifecycle. Shows trains that were shifted between time periods due to delays.

Parameters: incident_code (int/float): The incident number to analyze incident_date (str): Incident date in ‘DD-MMM-YYYY’ format (used to locate the incident) analysis_date (str): Specific date to analyze in ‘DD-MMM-YYYY’ format analysis_hhmm (str): Start time for analysis in ‘HHMM’ format (e.g., ‘1830’ for 18:30) period_minutes (int): Minutes from analysis start time to analyze

Returns: tuple: (pandas.DataFrame, str, str) - Results table, incident start time string, and analysis period string

rdmpy.outputs.analysis_tools.incident_view_heatmap_html(incident_code, incident_date, analysis_date, analysis_hhmm, period_minutes, interval_minutes=10, output_file=None)[source]

Create dynamic interactive HTML heatmap showing railway network delays. Displays delay intensity as vibrant heatmap visualization with incident locations and timeline animation.

Parameters: incident_code (int/float): The incident number to analyze incident_date (str): Date when incident started in ‘DD-MMM-YYYY’ format analysis_date (str): Specific date to analyze in ‘DD-MMM-YYYY’ format analysis_hhmm (str): Start time for analysis in ‘HHMM’ format (e.g., ‘1900’) period_minutes (int): Total duration of analysis period in minutes interval_minutes (int): Duration of each interval in minutes (default: 10) output_file (str): Optional HTML file path to save

Returns: str: HTML content of the interactive heatmap

rdmpy.outputs.analysis_tools.train_view(all_data, origin_code, destination_code, input_date_str)[source]

View all train journeys between an OD pair and check for incidents on a specific date. Corrects PLANNED_CALLS using ACTUAL_CALLS - PFPI_MINUTES.

Refactored for single responsibility: Data filtering + transformation -> Display

Parameters:

all_datapd.DataFrame: Complete train data with OD information
origin_codestr or int: Origin location code
destination_codestr or int: Destination location code
input_date_strstr: Date in ‘DD-MMM-YYYY’ format

Returns:

pd.DataFrame or str : Incident data or message

rdmpy.outputs.analysis_tools.get_stanox_for_service(all_data, train_service_code, origin_code, destination_code, date_str=None)[source]

Get ALL unique STANOX codes that a train service calls at, regardless of specific train instance. Returns a list of all stations that this service code stops at.

Strategy: 1. Filter to the specified service code and OD pair 2. Optionally filter by date if provided 3. Collect ALL unique STANOX codes that appear with valid scheduled stops 4. Return the complete set (map will connect them by proximity)

rdmpy.outputs.analysis_tools.map_train_journey_with_incidents(all_data, service_stanox, incident_results=None, stations_ref_path=None, incident_color='purple', service_code=None, date_str=None)[source]

Map train journey by connecting stations based on GEOGRAPHIC PROXIMITY (not chronological order).

Load reference stations and prepare STANOX coordinate data
Connect service stations + incident stations using minimum spanning tree
Color-grade station markers by total delay
Map each incident with chronologically-ranked numbered markers

Refactored to use focused helper functions for data preparation, calculations, and visualization.

rdmpy.outputs.analysis_tools.train_view_2(all_data, service_stanox, service_code, stations_ref_path=None)[source]

Compute reliability metrics for each station in the service_stanox list for a given train service code.

Metrics now exclude PFPI_MINUTES == 0.0 when computing mean/variance and incident counts. OnTime% is computed on the original PFPI distribution (<=0) so it still reflects punctuality.

Returns a DataFrame with columns: ServiceCode, StationName, MeanDelay, DelayVariance, OnTime%, IncidentCount

NEW: Also includes stations from all_data that experienced delays for this service code.

rdmpy.outputs.analysis_tools.plot_reliability_graphs(all_data, service_stanox, service_code, stations_ref_path=None, cap_minutes=75)[source]

Generate overlapping density (KDE) curves and cumulative distribution plots: Delay distribution per station (all curves overlapping, different colours), excluding delay==0.0 and capped at cap_minutes.

NEW: Also includes stations from all_data that experienced delays for this service code.

rdmpy.outputs.analysis_tools.create_time_view_html(date_str, all_data)[source]

Create an HTML map showing affected stations for a given date, with markers sized by incident count and colored by total PFPI minutes. Prints incident statistics for the specific date before generating the map.

Refactored to use focused helper functions for statistics, data aggregation, marker creation, and finalization.

rdmpy.outputs.analysis_tools.station_view_yearly(station_id, interval_minutes=30)[source]: Station analysis for yearly data across all incidents - simplified output. Analyzes all days of the week for a station and separates incident vs normal operations.

rdmpy.outputs.analysis_tools.plot_trains_in_system_vs_delay(station_id, all_data, time_window_minutes=60, num_platforms=12, figsize=(12, 8), max_delay_percentile=98, dwell_time_minutes=5, time_range=None)[source]

Visualize the relationship between normalized trains in system and mean delay per hour.

Similar to plot_variable_relationships but uses trains in system (occupancy) instead of flow (throughput) on the x-axis.

MERGED ANALYSIS: Combines weekdays and weekends into a single comprehensive view.

Uses the EXACT SAME logic as plot_variable_relationships: - X-axis: Normalized trains in system per hour (from plot_bottleneck_analysis calculation) - Y-axis: Mean delay per hour ONLY from DELAYED trains (delay > 0), NOT all trains - One scatter point per HOUR (not per train) - Binned by trains in system with Q25-Q75 delay ranges

THEORY: - As trains accumulate in the system (high occupancy), delays should increase - If delays remain low despite high trains in system, indicates good platform management - If delays spike at low trains in system, indicates operational inefficiencies

Parameters:

station_idstr: The station STANOX code
all_datapd.DataFrame: The complete dataset containing all train records
time_window_minutesint: Time window in minutes (default: 60)
num_platformsint: Number of platforms for normalization (default: 12)
figsizetuple: Figure size (default: (12, 8))
max_delay_percentileint: Percentile to trim extreme values (default: 98)
dwell_time_minutesint: Typical dwell time at station (default: 5 minutes)
time_rangetuple or None: Optional (start, end) tuple to filter by time range (default: None)

rdmpy.outputs.analysis_tools.explore_delay_outliers(station_id, all_data, num_platforms=6, dwell_time_minutes=5, figsize=(12, 8), time_range=None)[source]

Specialized visualization to explore delay outliers and extreme cases. Shows delay percentiles vs system load with binned averages.

Parameters:

station_idstr: The STANOX code for the station
all_datapd.DataFrame: The complete dataset with all station data
num_platformsint: Number of platforms at the station (for normalization)
dwell_time_minutesint: Typical dwell time at the station in minutes
figsizetuple: Figure size (width, height)
time_rangetuple or None: Optional (start, end) tuple to filter by time range (default: None)

Returns:

pd.DataFrame: Hourly statistics including delay percentiles and system load metrics

rdmpy.outputs.analysis_tools.station_view(station_id, all_data, num_platforms=6, time_window_minutes=60, max_delay_percentile=98, dwell_time_minutes=5, figsize=(8, 4.7), time_range=None)[source]

Comprehensive merged station performance analysis combining 3 visualization functions. Analyzes on-time performance and system load relationships.

Parameters:

station_idstr: The STANOX code for the station
all_datapd.DataFrame: The complete dataset with all station data
num_platformsint: Number of platforms at the station (for normalization)
time_window_minutesint: Time window for analysis (typically 60 for hourly)
max_delay_percentileint: Maximum delay percentile to consider (typically 98)
dwell_time_minutesint: Typical dwell time at the station in minutes
figsizetuple: Figure size (width, height) - applied to all plots
time_rangetuple or None: Optional (start, end) tuple to filter by time range. Will filter valid_data to only include rows where arrival_time falls within this range (default: None)

Returns:

dict: Dictionary containing hourly_stats and bin_stats DataFrames

rdmpy.outputs.analysis_tools.comprehensive_station_analysis(station_id, all_data, num_platforms=6, dwell_time_minutes=5, max_delay_percentile=98, time_range=None)[source]

Combined comprehensive station analysis displaying all visualizations in a single column figure.

Combines plot_trains_in_system_vs_delay, explore_delay_outliers, and station_view without changing any of their internal logic.

Parameters:

station_idstr: The STANOX code for the station
all_datapd.DataFrame: The complete dataset with all station data
num_platformsint: Number of platforms at the station (for normalization)
dwell_time_minutesint: Typical dwell time at the station in minutes
max_delay_percentileint: Maximum delay percentile to consider (typically 98)
time_rangetuple or None: Optional (start, end) tuple to filter by time range (default: None)

Returns:

dict: Dictionary containing all results from the three analyses

rdmpy.outputs.analysis_tools.station_analysis_with_time_range(station_id, all_data, time_range=None, num_platforms=6, dwell_time_minutes=5, max_delay_percentile=98)[source]

Wrapper around comprehensive_station_analysis that adds time_range filtering.

Filters data by optional time_range, then calls the original function with the filtered dataset. Original function logic remains unchanged.

Parameters:

station_idstr: The STANOX code for the station
all_datapd.DataFrame: Complete dataset with all station data
time_rangetuple or None: Tuple of (start, end) as dates or datetimes - Dates will be expanded to full day (00:00 to 23:59:59) - Same date for both will cover entire day - None uses all data (default) Examples:

(‘2024-01-15’, ‘2024-01-15’) # Single day (‘2024-01-01’, ‘2024-06-30’) # Date range (‘2024-01-15 08:00’, ‘2024-01-15 17:00’) # Specific times
num_platformsint: Number of platforms at station (default: 6)
dwell_time_minutesint: Typical dwell time at station in minutes (default: 5)
max_delay_percentileint: Maximum delay percentile to consider (default: 98)

Returns:

dict: Dictionary containing all results from comprehensive_station_analysis

rdmpy.outputs.analysis_tools.station_view_yearly_with_time_range(station_id, interval_minutes=30, time_range=None)[source]

Wrapper around station_view_yearly that adds time_range filtering.

Calls the original function and filters its results by optional time_range. Original function logic remains unchanged.

Parameters:

station_idstr: The STANOX code for the station
interval_minutesint: Interval size for analysis in minutes (default: 30)
time_rangetuple or None: Tuple of (start, end) as dates or datetimes - Dates will be expanded to full day (00:00 to 23:59:59) - Same date for both will cover entire day - None uses all data (default) Examples:

(‘2024-01-15’, ‘2024-01-15’) # Single day (‘2024-01-01’, ‘2024-06-30’) # Date range (‘2024-01-15 08:00’, ‘2024-01-15 17:00’) # Specific times

Returns:

tuple: (incident_summary, normal_summary) DataFrames filtered by time_range