API Reference
This section provides detailed documentation of the rdmpy Python API, organized by module.
Data Pre-Processing
Core preprocessor module for matching schedules with delays and organizing results by station.
Save processed schedule and delay data to parquet files for railway stations by DFT category. This script can process any DFT category (A, B, C1, C2) or all categories at once. Processes schedule data, applies delays, and saves the results as pandas DataFrames organized by day of the week for each station.
Usage: - Single station: python -m preprocess.preprocessor <STANOX_CODE> - Category A: python -m preprocess.preprocessor –category-A - Category B: python -m preprocess.preprocessor –category-B - Category C1: python -m preprocess.preprocessor –category-C1 - Category C2: python -m preprocess.preprocessor –category-C2 - All categories: python -m preprocess.preprocessor –all-categories - Interactive: python -m preprocess.preprocessor
- rdmpy.preprocessor.get_weekday_from_schedule_entry(entry)[source]
Extract the primary weekday from a schedule entry for sorting purposes.
- Parameters:
entry – Schedule entry dictionary
- Returns:
Weekday index (0=Monday, 6=Sunday) for sorting
- Return type:
int
- rdmpy.preprocessor.load_stations(category=None)[source]
Load stations from the reference JSON file.
- Parameters:
category (str, optional) – DFT category to filter by (e.g., ‘A’, ‘B’, ‘C1’, ‘C2’). If None, returns all stations.
- Returns:
List of STANOX codes for the specified category or all stations
- Return type:
list
- rdmpy.preprocessor.save_processed_data_by_weekday_to_dataframe(st_code, output_dir='processed_data', schedule_data_loaded=None, stanox_ref=None, tiploc_to_stanox=None, incident_data_loaded=None)[source]
Process schedule and delay data, then save to pandas DataFrame organized by weekday. OPTIMIZED VERSION: Accepts pre-loaded data to avoid file I/O for each station.
- Parameters:
st_code (str) – STANOX code to process
output_dir (str) – Directory to save output files
schedule_data_loaded (pd.DataFrame, optional) – Pre-loaded schedule data
stanox_ref (pd.DataFrame, optional) – Pre-loaded STANOX reference data
tiploc_to_stanox (dict, optional) – Pre-loaded TIPLOC to STANOX mapping
incident_data_loaded (dict, optional) – Pre-loaded incident data by period
- Returns:
Dictionary containing processed data as pandas DataFrames organized by weekday
- Return type:
dict
- rdmpy.preprocessor.save_stations_by_category(category=None, output_dir='processed_data')[source]
Process and save data for stations by DFT category as parquet files. FULLY OPTIMIZED VERSION: Loads schedule and delay data once, reuses for all stations. No redundant file I/O operations during batch processing.
- Parameters:
category (str, optional) – DFT category to process (‘A’, ‘B’, ‘C1’, ‘C2’). If None, processes all categories.
output_dir (str) – Directory to save output files
- Returns:
Summary of processing results
- Return type:
dict
- rdmpy.preprocessor.save_all_category_a_stations(output_dir='processed_data')[source]
Backward compatibility function for processing Category A stations only.
- Parameters:
output_dir (str) – Directory to save output files
- Returns:
Summary of processing results
- Return type:
dict
- rdmpy.preprocessor.main(st_code=None, process_category=None, process_all_categories=False)[source]
Main function to demonstrate the data processing and saving functionality.
- Parameters:
st_code (str, optional) – STANOX code to process. If not provided and no category processing, will prompt for input.
process_category (str, optional) – DFT category to process (‘A’, ‘B’, ‘C1’, ‘C2’).
process_all_categories (bool) – If True, process all categories instead of a single station.
Utilities
Shared utility functions that support the preprocessing pipeline, including schedule and delay processing.
- rdmpy.utils.load_schedule_data(st_code, schedule_data, reference_files)[source]
Load all necessary data for schedule processing.
- Returns:
(train_count (deprecated, returns None), tiploc, schedule_data_loaded, stanox_ref, tiploc_to_stanox)
- Return type:
tuple
- rdmpy.utils.get_day_code_mapping()[source]
Create a mapping for day codes used throughout the application.
- Returns:
Mapping from day indices to day codes (0=Monday, 1=Tuesday, …, 6=Sunday)
- Return type:
dict
- rdmpy.utils.extract_schedule_days_runs(schedule_entry)[source]
Extract schedule_days_runs from a schedule entry.
- Parameters:
schedule_entry – Schedule entry dictionary
- Returns:
Binary string representing days the schedule runs, or None if not found
- Return type:
str
- rdmpy.utils.get_english_day_types_from_schedule(schedule_entry)[source]
Convert schedule_days_runs to list of ENGLISH_DAY_TYPE values.
- Parameters:
schedule_entry – Schedule entry dictionary
- Returns:
List of ENGLISH_DAY_TYPE values that this schedule runs on
- Return type:
list
- rdmpy.utils.is_valid_schedule_entry(schedule_entry)[source]
Validate that a schedule entry has required structure.
- Parameters:
schedule_entry – Schedule entry dictionary
- Returns:
True if entry has required fields
- Return type:
bool
- rdmpy.utils.validate_schedule_locations(schedule_locations)[source]
Validate that schedule_locations is iterable and contains valid entries.
- Parameters:
schedule_locations – List of location dictionaries
- Returns:
True if valid
- Return type:
bool
- rdmpy.utils.is_valid_location_entry(location)[source]
Validate a single location entry has get method (dict-like).
- Parameters:
location – Location entry to validate
- Returns:
True if location is dict-like
- Return type:
bool
- rdmpy.utils.has_time_information(location)[source]
Check if location has either departure or arrival time.
- Parameters:
location – Location dictionary
- Returns:
True if either departure or arrival exists
- Return type:
bool
- rdmpy.utils.extract_location_time(location)[source]
Extract departure or arrival time from location, preferring departure.
- Parameters:
location – Location dictionary
- Returns:
Time in format ‘HHMM’ or None if unavailable
- Return type:
str or None
- rdmpy.utils.get_train_service_code(schedule_entry)[source]
Extract CIF_train_service_code from schedule entry.
- Parameters:
schedule_entry – Schedule entry dictionary
- Returns:
Train service code or None if not found
- Return type:
str or None
- rdmpy.utils.clean_dataframe_types(df, columns_to_convert)[source]
Standardize data types in a DataFrame for consistent merging.
- Parameters:
df – DataFrame to clean
columns_to_convert – List of (column_name, target_type) tuples
- Returns:
DataFrame with converted types
- Return type:
DataFrame
- rdmpy.utils.filter_valid_delay_entries(delay_df)[source]
Filter delay entries to keep only those with valid datetime strings.
- Parameters:
delay_df – DataFrame of delay entries
- Returns:
Filtered DataFrame
- Return type:
DataFrame
- rdmpy.utils.process_schedule(st_code, schedule_data=None, reference_files=None, train_count=None, tiploc=None, schedule_data_loaded=None, stanox_ref=None, tiploc_to_stanox=None)[source]
Generate a schedule timeline for all trains that match the specified STANOX code. OPTIMIZED VERSION - Accepts pre-loaded data to avoid reloading from files.
- Parameters:
st_code (str) – STANOX code to process.
schedule_data (dict, optional) – Dictionary containing schedule data file paths.
reference_files (dict, optional) – Dictionary containing reference file paths.
train_count (int, optional) – Expected simple count of number of trains (from pre-loaded data).
tiploc (str, optional) – TIPLOC code corresponding to st_code.
schedule_data_loaded (list, optional) – Pre-loaded schedule data.
stanox_ref (dict, optional) – Pre-loaded STANOX reference data.
tiploc_to_stanox (dict, optional) – Pre-loaded TIPLOC to STANOX mapping.
- Returns:
Sorted schedule timeline.
- Return type:
list
- rdmpy.utils.extract_day_of_week_from_delay(delay_entry)[source]
Extract the day of the week from PLANNED_ORIGIN_WTT_DATETIME in delay data.
- Parameters:
delay_entry – Delay entry dictionary
- Returns:
Day of week code (MO, TU, WE, TH, FR, SA, SU) or None if parsing fails
- Return type:
str
- rdmpy.utils.schedule_runs_on_day(schedule_entry, target_day)[source]
Check if a schedule entry runs on a specific day of the week.
- Parameters:
schedule_entry – Schedule entry dictionary (from processed schedule)
target_day – Day code (MO, TU, WE, TH, FR, SA, SU)
- Returns:
True if the schedule runs on the target day
- Return type:
bool
- rdmpy.utils.find_location_by_tiploc(schedule_locations, target_tiploc)[source]
Find first location matching target TIPLOC.
- Parameters:
schedule_locations – List of location dictionaries
target_tiploc – TIPLOC code to match
- Returns:
Matching location or None
- Return type:
dict or None
- rdmpy.utils.find_origin_location(schedule_locations, target_tiploc)[source]
Find origin location (LO or L0) at target TIPLOC.
- Parameters:
schedule_locations – List of location dictionaries
target_tiploc – TIPLOC code to match
- Returns:
Origin location or None
- Return type:
dict or None
- rdmpy.utils.find_destination_location(schedule_locations, target_tiploc)[source]
Find destination location (LT) at target TIPLOC.
- Parameters:
schedule_locations – List of location dictionaries
target_tiploc – TIPLOC code to match
- Returns:
Destination location or None
- Return type:
dict or None
- rdmpy.utils.determine_station_role(relevant_location, origin_location, destination_location, tiploc)[source]
Determine the role of station in train’s journey.
- Parameters:
relevant_location – Matched location at target TIPLOC
origin_location – Origin location (if exists)
destination_location – Destination location (if exists)
tiploc – Target TIPLOC code
- Returns:
“Origin”, “Destination”, “Intermediate”, or “Unknown”
- Return type:
str
- rdmpy.utils.build_train_record(train_service_code, origin_location, destination_location, relevant_location, s_time, schedule_day_types, tiploc, tiploc_to_stanox, stanox_ref, st_code)[source]
Build a single train record for processed schedule.
- Parameters:
train_service_code – CIF service code
origin_location – Origin location dict
destination_location – Destination location dict
relevant_location – Location at target station
s_time – Time in HHMM format
schedule_day_types – List of day codes
tiploc – Target TIPLOC
tiploc_to_stanox – TIPLOC to STANOX mapping
stanox_ref – STANOX reference data (dict)
st_code – Target STANOX code
- Returns:
Train record with all fields populated
- Return type:
dict
- rdmpy.utils.extract_time_components_from_delays(delays_df)[source]
Extract time components from delay DataFrame for matching.
- Parameters:
delays_df – DataFrame of delays
- Returns:
DataFrame with added origin_time, dest_time, event_time columns
- Return type:
DataFrame
- rdmpy.utils.expand_schedule_by_days(schedule_df)[source]
Expand schedule entries for multi-day schedules (one row per day).
- Parameters:
schedule_df – Schedule DataFrame
- Returns:
Expanded DataFrame with current_day column
- Return type:
DataFrame
- rdmpy.utils.process_delays(incident_files, st_code, output_dir)[source]
Processes delay files by converting them to vertical JSON, removing irrelevant columns, and filtering rows.
- Parameters:
incident_files (dict) – Dictionary with period names as keys and file paths as values.
output_dir (str) – Directory to save the converted JSON files.
st_code (str) – The station code to filter delays.
- Returns:
Dictionary with period names as keys and processed DataFrames as values.
- Return type:
dict
- rdmpy.utils.extract_day_from_each_delay(delays_df)[source]
Extract day of week for each delay entry.
- Parameters:
delays_df – DataFrame of delays
- Returns:
List of day codes (MO, TU, etc.) aligned with rows
- Return type:
list
- rdmpy.utils.add_delay_day_column(delays_df)[source]
Add delay_day column to delays DataFrame, filtering for valid entries.
- Parameters:
delays_df – DataFrame of delays
- Returns:
DataFrame with delay_day column (filtered to valid entries)
- Return type:
DataFrame
- rdmpy.utils.find_matched_delays_info(matched_results_df)[source]
Extract matched delay information for comparison with unmatched.
- Parameters:
matched_results_df – Filtered DataFrame of matched results
- Returns:
Set of tuples (TRAIN_SERVICE_CODE, DELAY_DAY, PFPI_MINUTES)
- Return type:
set
- rdmpy.utils.identify_unmatched_delays(delays_df, matched_delay_info)[source]
Identify delays that were not matched with schedule entries.
- Parameters:
delays_df – DataFrame of all delays
matched_delay_info – Set of matched delay tuples
- Returns:
DataFrame of unmatched delays
- Return type:
DataFrame
- rdmpy.utils.determine_planned_call_time(row, st_code)[source]
Determine planned call time for unmatched delay based on station role.
- Parameters:
row – Delay row (Series or dict-like)
st_code – Station code being analyzed
- Returns:
Time in HHMM format
- Return type:
str
- rdmpy.utils.build_unmatched_entry(delay_row, st_code)[source]
Build a record for unmatched delay entry.
- Parameters:
delay_row – Delay Series/dict
st_code – Station code being analyzed
- Returns:
Complete delay entry record
- Return type:
dict
- rdmpy.utils.apply_delays_to_matches(result_df, matched_mask)[source]
Update actual times and delay info for matched entries.
- Parameters:
result_df – Result DataFrame (modified in-place)
matched_mask – Boolean mask of matched entries
- Returns:
Updated DataFrame
- Return type:
DataFrame
- rdmpy.utils.filter_result_columns(combined_df)[source]
Filter result DataFrame to required columns only.
- Parameters:
combined_df – Combined results DataFrame
- Returns:
Filtered DataFrame with core columns
- Return type:
DataFrame
- rdmpy.utils.adjust_schedule_timeline(processed_schedule, processed_delays, st_code=None)[source]
Adjust the schedule timeline based on delays and generate an updated timeline. PANDAS OPTIMIZED VERSION: Uses pandas DataFrames for ultra-fast matching operations.
- Parameters:
processed_schedule (list) – List of processed schedule dictionaries.
processed_delays (list) – List of delay records from all days.
st_code (str, optional) – The station code being analyzed to determine correct planned call times.
- Returns:
Adjusted schedule timeline sorted by actual calls.
- Return type:
list
- rdmpy.utils.load_schedule_data_once(schedule_data, reference_files)[source]
Load schedule data once to avoid reloading for each station.
- Parameters:
schedule_data (dict) – Dictionary containing schedule data file paths
reference_files (dict) – Dictionary containing reference file paths
- Returns:
(schedule_data_loaded, stanox_ref, tiploc_to_stanox)
- Return type:
tuple
- rdmpy.utils.load_incident_data_once(incident_files)[source]
Load all incident data once to avoid reloading for each station.
- Parameters:
incident_files (dict) – Dictionary with period names as keys and file paths as values
- Returns:
Dictionary with period names as keys and loaded DataFrames as values
- Return type:
dict
- rdmpy.utils.process_delays_optimized(incident_data_loaded, st_code, output_dir=None)[source]
Process delays using pre-loaded incident data to avoid file I/O.
- Parameters:
incident_data_loaded (dict) – Pre-loaded incident data by period
st_code (str) – The station code to filter delays
output_dir (str, optional) – Directory to save converted JSON files (not used in optimized mode)
- Returns:
Dictionary with period names as keys and processed DataFrames as values
- Return type:
dict
Data Loading
Tools for loading processed data from the processed_data/ folder.
- rdmpy.outputs.load_data.load_processed_data(base_dir='processed_data')[source]
Load all .parquet files from the processed_data folder (recursively) into a single pandas DataFrame.
Automatically tries both pyarrow and fastparquet engines. Adds STANOX (folder name) and DAY (file name) columns.
- Parameters:
base_dir (str)
- Return type:
DataFrame
Analysis Tools
Functions for analyzing and visualizing network-level data across stations.
- rdmpy.outputs.analysis_tools.find_processed_data_path()[source]
Find the processed_data directory by checking multiple possible locations. Returns the path if found, None otherwise.
- rdmpy.outputs.analysis_tools.aggregate_view(incident_number, start_date)[source]
Multi-day incident analysis with clean separation of concerns. Creates 3 charts: hourly delays, severity distribution, and event timeline.
Parameters:
- incident_numberint/str
The incident number to analyze
- start_datestr
Starting date in ‘DD-MMM-YYYY’ format
Returns:
dict : Summary statistics for the incident
- rdmpy.outputs.analysis_tools.calculate_incident_summary_stats(df, delay_data_all, unique_dates, files_processed, files_with_data, incident_number, num_days)[source]
Calculate final summary statistics for the incident.
Parameters:
- dfpd.DataFrame
The incident data
- delay_data_allpd.DataFrame
Filtered dataframe with only delay events
- unique_dateslist
List of unique dates in the incident
- files_processedint
Number of files processed
- files_with_dataint
Number of files with matching data
- incident_numberint/str
The incident number
- num_daysint
Number of days the incident spans
Returns:
dict : Summary statistics dictionary
- rdmpy.outputs.analysis_tools.aggregate_view_multiday(incident_number, start_date)[source]
Multi-day incident analysis that handles incidents spanning multiple days. Creates separate charts for each day with labels (a), (b), (c) or (a.1), (a.2), etc.
For single-day incidents, labels are simply (a), (b), (c). For multi-day incidents: (a.1), (a.2) for hourly charts, (b) for severity, (c.1), (c.2) for timelines.
WARNING: If incident spans more than 3 days, the same incident number may refer to multiple separate incidents.
Parameters:
- incident_numberint/str
The incident number to analyze
- start_datestr
Starting date in ‘DD-MMM-YYYY’ format (used for display purposes only - all days are loaded)
Returns:
dict : Summary statistics for the incident across all days
- rdmpy.outputs.analysis_tools.incident_view(incident_code, incident_date, analysis_date, analysis_hhmm, period_minutes)[source]
Generate a detailed table showing each station affected by an incident with their calls and delays for a specific time period during the incident lifecycle. Shows trains that were shifted between time periods due to delays.
Parameters: incident_code (int/float): The incident number to analyze incident_date (str): Incident date in ‘DD-MMM-YYYY’ format (used to locate the incident) analysis_date (str): Specific date to analyze in ‘DD-MMM-YYYY’ format analysis_hhmm (str): Start time for analysis in ‘HHMM’ format (e.g., ‘1830’ for 18:30) period_minutes (int): Minutes from analysis start time to analyze
Returns: tuple: (pandas.DataFrame, str, str) - Results table, incident start time string, and analysis period string
- rdmpy.outputs.analysis_tools.incident_view_heatmap_html(incident_code, incident_date, analysis_date, analysis_hhmm, period_minutes, interval_minutes=10, output_file=None)[source]
Create dynamic interactive HTML heatmap showing railway network delays. Displays delay intensity as vibrant heatmap visualization with incident locations and timeline animation.
Parameters: incident_code (int/float): The incident number to analyze incident_date (str): Date when incident started in ‘DD-MMM-YYYY’ format analysis_date (str): Specific date to analyze in ‘DD-MMM-YYYY’ format analysis_hhmm (str): Start time for analysis in ‘HHMM’ format (e.g., ‘1900’) period_minutes (int): Total duration of analysis period in minutes interval_minutes (int): Duration of each interval in minutes (default: 10) output_file (str): Optional HTML file path to save
Returns: str: HTML content of the interactive heatmap
- rdmpy.outputs.analysis_tools.train_view(all_data, origin_code, destination_code, input_date_str)[source]
View all train journeys between an OD pair and check for incidents on a specific date. Corrects PLANNED_CALLS using ACTUAL_CALLS - PFPI_MINUTES.
Refactored for single responsibility: Data filtering + transformation -> Display
Parameters:
- all_datapd.DataFrame
Complete train data with OD information
- origin_codestr or int
Origin location code
- destination_codestr or int
Destination location code
- input_date_strstr
Date in ‘DD-MMM-YYYY’ format
Returns:
pd.DataFrame or str : Incident data or message
- rdmpy.outputs.analysis_tools.get_stanox_for_service(all_data, train_service_code, origin_code, destination_code, date_str=None)[source]
Get ALL unique STANOX codes that a train service calls at, regardless of specific train instance. Returns a list of all stations that this service code stops at.
Strategy: 1. Filter to the specified service code and OD pair 2. Optionally filter by date if provided 3. Collect ALL unique STANOX codes that appear with valid scheduled stops 4. Return the complete set (map will connect them by proximity)
- rdmpy.outputs.analysis_tools.map_train_journey_with_incidents(all_data, service_stanox, incident_results=None, stations_ref_path=None, incident_color='purple', service_code=None, date_str=None)[source]
Map train journey by connecting stations based on GEOGRAPHIC PROXIMITY (not chronological order).
Load reference stations and prepare STANOX coordinate data
Connect service stations + incident stations using minimum spanning tree
Color-grade station markers by total delay
Map each incident with chronologically-ranked numbered markers
Refactored to use focused helper functions for data preparation, calculations, and visualization.
- rdmpy.outputs.analysis_tools.train_view_2(all_data, service_stanox, service_code, stations_ref_path=None)[source]
Compute reliability metrics for each station in the service_stanox list for a given train service code.
Metrics now exclude PFPI_MINUTES == 0.0 when computing mean/variance and incident counts. OnTime% is computed on the original PFPI distribution (<=0) so it still reflects punctuality.
Returns a DataFrame with columns: ServiceCode, StationName, MeanDelay, DelayVariance, OnTime%, IncidentCount
NEW: Also includes stations from all_data that experienced delays for this service code.
- rdmpy.outputs.analysis_tools.plot_reliability_graphs(all_data, service_stanox, service_code, stations_ref_path=None, cap_minutes=75)[source]
Generate overlapping density (KDE) curves and cumulative distribution plots: Delay distribution per station (all curves overlapping, different colours), excluding delay==0.0 and capped at cap_minutes.
NEW: Also includes stations from all_data that experienced delays for this service code.
- rdmpy.outputs.analysis_tools.create_time_view_html(date_str, all_data)[source]
Create an HTML map showing affected stations for a given date, with markers sized by incident count and colored by total PFPI minutes. Prints incident statistics for the specific date before generating the map.
Refactored to use focused helper functions for statistics, data aggregation, marker creation, and finalization.
- rdmpy.outputs.analysis_tools.station_view_yearly(station_id, interval_minutes=30)[source]
Station analysis for yearly data across all incidents - simplified output. Analyzes all days of the week for a station and separates incident vs normal operations.
- rdmpy.outputs.analysis_tools.plot_trains_in_system_vs_delay(station_id, all_data, time_window_minutes=60, num_platforms=12, figsize=(12, 8), max_delay_percentile=98, dwell_time_minutes=5, time_range=None)[source]
Visualize the relationship between normalized trains in system and mean delay per hour.
Similar to plot_variable_relationships but uses trains in system (occupancy) instead of flow (throughput) on the x-axis.
MERGED ANALYSIS: Combines weekdays and weekends into a single comprehensive view.
Uses the EXACT SAME logic as plot_variable_relationships: - X-axis: Normalized trains in system per hour (from plot_bottleneck_analysis calculation) - Y-axis: Mean delay per hour ONLY from DELAYED trains (delay > 0), NOT all trains - One scatter point per HOUR (not per train) - Binned by trains in system with Q25-Q75 delay ranges
THEORY: - As trains accumulate in the system (high occupancy), delays should increase - If delays remain low despite high trains in system, indicates good platform management - If delays spike at low trains in system, indicates operational inefficiencies
Parameters:
- station_idstr
The station STANOX code
- all_datapd.DataFrame
The complete dataset containing all train records
- time_window_minutesint
Time window in minutes (default: 60)
- num_platformsint
Number of platforms for normalization (default: 12)
- figsizetuple
Figure size (default: (12, 8))
- max_delay_percentileint
Percentile to trim extreme values (default: 98)
- dwell_time_minutesint
Typical dwell time at station (default: 5 minutes)
- time_rangetuple or None
Optional (start, end) tuple to filter by time range (default: None)
- rdmpy.outputs.analysis_tools.explore_delay_outliers(station_id, all_data, num_platforms=6, dwell_time_minutes=5, figsize=(12, 8), time_range=None)[source]
Specialized visualization to explore delay outliers and extreme cases. Shows delay percentiles vs system load with binned averages.
Parameters:
- station_idstr
The STANOX code for the station
- all_datapd.DataFrame
The complete dataset with all station data
- num_platformsint
Number of platforms at the station (for normalization)
- dwell_time_minutesint
Typical dwell time at the station in minutes
- figsizetuple
Figure size (width, height)
- time_rangetuple or None
Optional (start, end) tuple to filter by time range (default: None)
Returns:
- pd.DataFrame
Hourly statistics including delay percentiles and system load metrics
- rdmpy.outputs.analysis_tools.station_view(station_id, all_data, num_platforms=6, time_window_minutes=60, max_delay_percentile=98, dwell_time_minutes=5, figsize=(8, 4.7), time_range=None)[source]
Comprehensive merged station performance analysis combining 3 visualization functions. Analyzes on-time performance and system load relationships.
Parameters:
- station_idstr
The STANOX code for the station
- all_datapd.DataFrame
The complete dataset with all station data
- num_platformsint
Number of platforms at the station (for normalization)
- time_window_minutesint
Time window for analysis (typically 60 for hourly)
- max_delay_percentileint
Maximum delay percentile to consider (typically 98)
- dwell_time_minutesint
Typical dwell time at the station in minutes
- figsizetuple
Figure size (width, height) - applied to all plots
- time_rangetuple or None
Optional (start, end) tuple to filter by time range. Will filter valid_data to only include rows where arrival_time falls within this range (default: None)
Returns:
- dict
Dictionary containing hourly_stats and bin_stats DataFrames
- rdmpy.outputs.analysis_tools.comprehensive_station_analysis(station_id, all_data, num_platforms=6, dwell_time_minutes=5, max_delay_percentile=98, time_range=None)[source]
Combined comprehensive station analysis displaying all visualizations in a single column figure.
Combines plot_trains_in_system_vs_delay, explore_delay_outliers, and station_view without changing any of their internal logic.
Parameters:
- station_idstr
The STANOX code for the station
- all_datapd.DataFrame
The complete dataset with all station data
- num_platformsint
Number of platforms at the station (for normalization)
- dwell_time_minutesint
Typical dwell time at the station in minutes
- max_delay_percentileint
Maximum delay percentile to consider (typically 98)
- time_rangetuple or None
Optional (start, end) tuple to filter by time range (default: None)
Returns:
- dict
Dictionary containing all results from the three analyses
- rdmpy.outputs.analysis_tools.station_analysis_with_time_range(station_id, all_data, time_range=None, num_platforms=6, dwell_time_minutes=5, max_delay_percentile=98)[source]
Wrapper around comprehensive_station_analysis that adds time_range filtering.
Filters data by optional time_range, then calls the original function with the filtered dataset. Original function logic remains unchanged.
Parameters:
- station_idstr
The STANOX code for the station
- all_datapd.DataFrame
Complete dataset with all station data
- time_rangetuple or None
Tuple of (start, end) as dates or datetimes - Dates will be expanded to full day (00:00 to 23:59:59) - Same date for both will cover entire day - None uses all data (default) Examples:
(‘2024-01-15’, ‘2024-01-15’) # Single day (‘2024-01-01’, ‘2024-06-30’) # Date range (‘2024-01-15 08:00’, ‘2024-01-15 17:00’) # Specific times
- num_platformsint
Number of platforms at station (default: 6)
- dwell_time_minutesint
Typical dwell time at station in minutes (default: 5)
- max_delay_percentileint
Maximum delay percentile to consider (default: 98)
Returns:
- dict
Dictionary containing all results from comprehensive_station_analysis
- rdmpy.outputs.analysis_tools.station_view_yearly_with_time_range(station_id, interval_minutes=30, time_range=None)[source]
Wrapper around station_view_yearly that adds time_range filtering.
Calls the original function and filters its results by optional time_range. Original function logic remains unchanged.
Parameters:
- station_idstr
The STANOX code for the station
- interval_minutesint
Interval size for analysis in minutes (default: 30)
- time_rangetuple or None
Tuple of (start, end) as dates or datetimes - Dates will be expanded to full day (00:00 to 23:59:59) - Same date for both will cover entire day - None uses all data (default) Examples:
(‘2024-01-15’, ‘2024-01-15’) # Single day (‘2024-01-01’, ‘2024-06-30’) # Date range (‘2024-01-15 08:00’, ‘2024-01-15 17:00’) # Specific times
Returns:
- tuple
(incident_summary, normal_summary) DataFrames filtered by time_range