1 of 1

AutoEncoders in Keras

Trajectory Forecasting with Gate Recurrent Units AutoEncoders

Introduction

By the end of this tutorial, you will understand the benefits of using teacher forcing to improve model accuracy, as well as other tweaks to enhance forecasting capabilities. We'll use AutoEncoders, neural networks that learn compressed data representations, to achieve this.

We will guide you through preparing AIS data for training an AutoEncoder, setting up layers, compiling the model, and defining the training process with teacher forcing.

Given the complexity of this task, we will revisit it to explore the benefits of teacher forcing, a technique that can improve sequence-to-sequence learning in neural networks.

This tutorial focuses on Trajectory Forecasting, which predicts an object's future path based on past positions. We will work with AIS messages, a type of temporal data that provides information about vessels' location, speed, and heading over time.

AISdb Querying

Automatic Identification System (AIS) messages broadcast essential ship information such as position, speed, and course. The temporal nature of these messages is pivotal for our tutorial, where we'll train an auto-encoder neural network for trajectory forecasting. This task involves predicting a ship's future path based on its past AIS messages, making it ideal for auto-encoders, which are optimized for learning patterns in sequential data.

For querying the entire database at once, use the following code:

For querying the database in batches of hours, use the following code:

Several functions were defined using AISdb, an AIS framework developed by MERIDIAN at Dalhousie University, to efficiently extract AIS messages from SQLite databases. AISdb is designed for effective data storage, retrieval, and preparation for AIS-related tasks. It provides comprehensive tools for interacting with AIS data, including APIs for data reading and writing, parsing AIS messages, and performing various data transformations.

Data Visualization

Our next step is to create a coverage map of Atlantic Canada to visualize our dataset. We will include a 100km radius circle on the map to show the areas of the ocean where vessels can send AIS messages. Although overlapping circles may contain duplicate data from the same MMSI, we have already eliminated those from our dataset. However, messages might still appear incorrectly in inland areas.

Loading a shapefile to help us define whether a vessel is on land or in water during the trajectory:

Check if a given coordinate (latitude, longitude) is on land:

Check if any coordinate of a track is on land:

Filter out tracks with any point on land for a given MMSI:

Use a ThreadPoolExecutor to parallelize the processing of MMSIs:

Count the number of segments per MMSI after removing duplicates and inaccurate track segments:

Dataset Preparation

In this analysis, we observe that most MMSIs in the dataset exhibit between 1 and 49 segments during the search period within AISdb. However, a minor fraction of vessels have significantly more segments, with some reaching up to 176. Efficient processing involves categorizing the data by MMSI instead of merely considering its volume. This method allows us to better evaluate the model's ability to discern various movement behaviors from both the same vessel and different ones.

To prevent our model from favoring shorter trajectories, we need a balanced mix of short-term and long-term voyages in the training and test sets. We'll categorize trajectories with 30 or more segments as long-term and those with fewer segments as short-term. Implement an 80-20 split strategy to ensure an equitable distribution of both types in the datasets.

Splitting the data respecting the voyage length distribution:

Visualizing the distribution of the dataset:

Inputs & Outputs

Understanding input and output timesteps and variables is crucial in trajectory forecasting tasks. Trajectory data comprises spatial coordinates and related features that depict an object's movement over time. The aim is to predict future positions of the object based on its historical data and associated features.

INPUT_TIMESTEPS: This parameter determines the consecutive observations used to predict future trajectories. Its selection impacts the model's ability to capture temporal dependencies and patterns. Too few time steps may prevent the model from capturing all movement dynamics, resulting in inaccurate predictions. Conversely, too many time steps can add noise and complexity, increasing the risk of overfitting.
INPUT_VARIABLES: Features describe each timestep in the input sequence for trajectory forecasting. These variables can include spatial coordinates, velocities, accelerations, object types, and relevant features that aid in predicting system dynamics. Choosing the right input variables is crucial; irrelevant or redundant ones may confuse the model while missing important variables can result in poor predictions.

Understanding the roles of input and output timesteps and variables is key to developing accurate trajectory forecasting models. By carefully selecting these elements, we can create models that effectively capture object movement dynamics, resulting in more accurate and meaningful predictions across various applications.

For this tutorial, we'll input 4 hours of data into the model to forecast the next 8 hours of vessel movement. Consequently, we'll filter out all voyages with less than 12 hours of AIS messages. By interpolating the messages every 5 minutes, we require a minimum of 144 sequential messages (12 hours at 12 messages/hour).

With data provided by AISdb, we have AIS information, including Longitude, Latitude, Course Over Ground (COG), and Speed Over Ground (SOG), representing a ship's position and movement. Longitude and Latitude specify the ship's location, while COG and SOG indicate its heading and speed. By using all features for training the neural network, our output will be the Longitude and Latitude pair. This methodology allows the model to predict the ship's future positions based on historical data.

In this tutorial, we'll include AIS data deltas as features, which were excluded in the previous tutorial. Incorporating deltas can help the model capture temporal changes and patterns, enhancing its effectiveness in sequence-to-sequence modeling. Deltas provides information on the rate of change in features, improving the model's accuracy, especially in predicting outcomes that depend on temporal dynamics.

Data Filtering

Data Statistics

Sample Weighting

Distance Function

To improve our model, we'll prioritize training samples based on trajectory straightness. We'll compute the geographical distance between a segment's start and end points using the Haversine formula. Comparing this to the total distance of all consecutive points will give a straightness metric. Our model will focus on complex trajectories with multiple direction changes, leading to better generalization and more accurate predictions.

Complexity Score

Trajectory straightness calculation using the Haversine:

Sample Windowing

To predict 96 data points (output) using the preceding 48 data points (input) in a trajectory time series, we create a sliding window. First, we select the initial 48 data points as the input sequence and the subsequent 96 as the output sequence. We then slide the window forward by one step and repeat the process. This continues until the end of the sequence, helping our model capture temporal dependencies and patterns in the data.

Our training strategy uses the sliding window technique, requiring unique weights for each sample. Sliding Windows (SW) transforms time series data into an appropriate format for machine learning. They generate overlapping windows with a fixed number of consecutive points by sliding the window one step at a time through the series.

In this project, the input data includes four features: Longitude, Latitude, COG (Course over Ground), and SOG (Speed over Ground), while the output data includes only Longitude and Latitude. To enhance the model's learning, we need to normalize the data through three main steps.

First, normalize Longitude, Latitude, COG, and SOG to the [0, 1] range using domain-specific parameters. This ensures the model performs well in Atlantic Canada waters by restricting the geographical scope of the AIS data and maintaining a similar scale for all features.

Second, the input and output data are standardized by subtracting the mean and dividing by the standard deviation. This centers the data around zero and scales it by its variance, preventing vanishing gradients during training.

Finally, another zero-one normalization is applied to scale the data to the [0, 1] range, aligning it with the expected range for many neural network activation functions.

Denormalizing Y output to the original scale of the data:

Denormalizing X output to the original scale of the data:

machine-learningWe have successfully prepared the data for our machine-learning task. With the data ready, it's time for the modeling phase. Next, we will create, train, and evaluate a machine-learning model to forecast vessel trajectories using the processed dataset. Let's explore how our model performs in Atlantic Canada!

Gated Recurrent Unit AutoEncoder

A GRU Autoencoder is a neural network that compresses and reconstructs sequential data utilizing a Gated Recurrent Unit. GRUs are highly effective at handling time-series data, which are sequential data points captured over time, as they can model intricate temporal dependencies and patterns. To perform time-series forecasting, a GRU Autoencoder can be trained on a historical time-series dataset to discern patterns and trends, subsequently compressing a sequence of future data points into a lower-dimensional representation that can be decoded to generate a forecast of the upcoming data points. With this in mind, we will begin by constructing a model architecture composed of two GRU layers with 64 units each, taking input of shape (48, 4) and (96, 4), respectively, followed by a dense layer with 2 units.

Custom Loss Function

Model Summary

Training Callbacks

The following function lists callbacks used during the model training process. Callbacks are utilities at specific points during training to monitor progress or take actions based on the model's performance. The function pre-define the parameters and behavior of these callbacks:

WandbMetricsLogger: This callback logs the training and validation metrics for visualization and monitoring on the Weights & Biases (W&B) platform. This can be useful for tracking the training progress but may introduce additional overhead due to the logging process. You can remove this callback if you don't need to use W&B or want to reduce the overhead.
TerminateOnNaN: This callback terminates training if the loss becomes NaN (Not a Number) during the training process. It helps to stop the training process early when the model diverges and encounters an unstable state.

WandbMetricsLogger is the most computationally costly among these callbacks due to the logging process. You can remove this callback if you don't need to use Weights & Biases for monitoring or want to reduce overhead. The other callbacks help optimize the training process and are less computationally demanding. It's important to note that the Weights & Biases (W&B) platform is also used in other parts of the code. If you decide to remove the WandbMetricsLogger callback, please ensure that you also remove any other references to W&B in the code to avoid potential issues. If you choose to use W&B for monitoring and logging, you must register and log in to the . During the execution of the code, you'll be prompted for an authentication key to connect your script to your W&B account. This key can be obtained from your W&B account settings. Once you have the key, you can use it to enable W&B's monitoring and logging features provided by W&B.

Model Training

Model Evaluation

Hyperparameters Tuning

In this step, we define a function called model_placeholder that uses the Keras Tuner to create a model with tunable hyperparameters. The function takes a hyperparameter object as input, which defines the search space for the hyperparameters of interest. Specifically, we are searching for the best number of units in the encoder and decoder GRU layers and the optimal learning rate for the AdamW optimizer. The model_placeholder function constructs a GRU-AutoEncoder model with these tunable hyperparameters and compiles the model using the Mean Absolute Error (MAE) as the loss function. Keras Tuner will use this model during the hyperparameter optimization process to find the best combination of hyperparameters that minimizes the validation loss at the expanse of long computing time.

Helper for saving the training history:

Helper for restoring the training history:

Defining the model to be optimized:

HyperOpt Objective Function:

Search for Best Model

Swiping the project folder for other pre-trained weights shared with this tutorial:

Evaluating Best Model

Model Explainability

Permutation Feature Importance (PFI)

Deep learning models, although powerful, are often criticized for their lack of explainability, making it difficult to comprehend their decision-making process and raising concerns about trust and reliability. To address this issue, we can use techniques like the PFI method, a simple, model-agnostic approach that helps visualize the importance of features in deep learning models. This method works by shuffling individual feature values in the dataset and observing the impact on the model's performance. By measuring the change in a designated metric when each feature's values are randomly permuted, we can infer the importance of that specific feature. The idea is that if a feature is crucial for the model's performance, shuffling its values should lead to a significant shift in performance; otherwise if a feature has little impact, its value permutation should result in a minor change. Applying the permutation feature importance method to the best model, obtained after hyperparameter tuning, can give us a more transparent understanding of how the model makes its decisions.

Sensitivity Analysis

Permutation feature importance has some limitations, such as assuming features are independent and producing biased results when features are highly correlated. It also doesn't provide detailed explanations for individual data points. An alternative is sensitivity analysis, which studies how input features affect model predictions. By perturbing each input feature individually and observing the prediction changes, we can understand which features significantly impact the model's output. This approach offers insights into the model's decision-making process and helps identify influential features. However, it does not account for feature interactions and can be computationally expensive for many features or perturbation steps.

UMAP: Uniform Manifold Approximation and Projection

UMAP is a nonlinear dimensionality reduction technique that visualizes high-dimensional data in a lower-dimensional space, preserving the local and global structure. In trajectory forecasting, UMAP can project high-dimensional model representations into 2D or 3D to clarify the relationships between input features and outputs. Unlike sensitivity analysis, which measures prediction changes due to input feature perturbations, UMAP reveals data structure without perturbations. It also differs from feature permutation, which evaluates feature importance by shuffling values and assessing model performance changes. UMAP focuses on visualizing intrinsic data structures and relationships.

Final Considerations

GRUs can effectively forecast vessel trajectories but have notable downsides. A primary limitation is their struggle with long-term dependencies due to the vanishing gradient problem, causing the loss of relevant information from earlier time steps. This makes capturing long-term patterns in vessel trajectories challenging. Additionally, GRUs are computationally expensive with large datasets and long sequences, resulting in longer training times and higher memory use. While outperforming basic RNNs, they may not always surpass advanced architectures like LSTMs or Transformer models. Furthermore, the interpretability of GRU-based models is a challenge, which can hinder their adoption in safety-critical applications like vessel trajectory forecasting.

AutoEncoders in Keras

Trajectory Forecasting with Gate Recurrent Units AutoEncoders

Introduction

We will guide you through preparing AIS data for training an AutoEncoder, setting up layers, compiling the model, and defining the training process with teacher forcing.

Given the complexity of this task, we will revisit it to explore the benefits of teacher forcing, a technique that can improve sequence-to-sequence learning in neural networks.

AISdb Querying

For querying the entire database at once, use the following code:

For querying the database in batches of hours, use the following code:

Data Visualization

Loading a shapefile to help us define whether a vessel is on land or in water during the trajectory:

Check if a given coordinate (latitude, longitude) is on land:

Check if any coordinate of a track is on land:

Filter out tracks with any point on land for a given MMSI:

Use a ThreadPoolExecutor to parallelize the processing of MMSIs:

Count the number of segments per MMSI after removing duplicates and inaccurate track segments:

Dataset Preparation

Splitting the data respecting the voyage length distribution:

Visualizing the distribution of the dataset:

Inputs & Outputs

INPUT_TIMESTEPS: This parameter determines the consecutive observations used to predict future trajectories. Its selection impacts the model's ability to capture temporal dependencies and patterns. Too few time steps may prevent the model from capturing all movement dynamics, resulting in inaccurate predictions. Conversely, too many time steps can add noise and complexity, increasing the risk of overfitting.
INPUT_VARIABLES: Features describe each timestep in the input sequence for trajectory forecasting. These variables can include spatial coordinates, velocities, accelerations, object types, and relevant features that aid in predicting system dynamics. Choosing the right input variables is crucial; irrelevant or redundant ones may confuse the model while missing important variables can result in poor predictions.

Data Filtering

Data Statistics

Sample Weighting

Distance Function

Complexity Score

Trajectory straightness calculation using the Haversine:

Sample Windowing

Finally, another zero-one normalization is applied to scale the data to the [0, 1] range, aligning it with the expected range for many neural network activation functions.

Denormalizing Y output to the original scale of the data:

Denormalizing X output to the original scale of the data:

Gated Recurrent Unit AutoEncoder

Custom Loss Function

Model Summary

Training Callbacks

WandbMetricsLogger: This callback logs the training and validation metrics for visualization and monitoring on the Weights & Biases (W&B) platform. This can be useful for tracking the training progress but may introduce additional overhead due to the logging process. You can remove this callback if you don't need to use W&B or want to reduce the overhead.
TerminateOnNaN: This callback terminates training if the loss becomes NaN (Not a Number) during the training process. It helps to stop the training process early when the model diverges and encounters an unstable state.

Model Training

Model Evaluation

Hyperparameters Tuning

Helper for saving the training history:

Helper for restoring the training history:

Defining the model to be optimized:

HyperOpt Objective Function:

Search for Best Model

Swiping the project folder for other pre-trained weights shared with this tutorial:

Evaluating Best Model

Model Explainability

Permutation Feature Importance (PFI)

Sensitivity Analysis

UMAP: Uniform Manifold Approximation and Projection

Final Considerations

def custom_loss(y_true, y_pred): tf.debugging.check_numerics(y_true, "y_true contains NaNs") tf.debugging.check_numerics(y_pred, "y_pred contains NaNs") # Denormalize true and predicted y y_true_denorm = denormalize_y(y_true, y_mean, y_std, y_min, y_max) y_pred_denorm = denormalize_y(y_pred, y_mean, y_std, y_min, y_max) # Compute haversine distance for true and predicted y from the second time-step true_dist = haversine_distance(y_true_denorm[:, 1:, 0], y_true_denorm[:, 1:, 1], y_true_denorm[:, :-1, 0], y_true_denorm[:, :-1, 1]) pred_dist = haversine_distance(y_pred_denorm[:, 1:, 0], y_pred_denorm[:, 1:, 1], y_pred_denorm[:, :-1, 0], y_pred_denorm[:, :-1, 1]) # Convert maximum speed from knots to meters per 5 minutes max_speed_m_per_5min = 50 * 1.852 * 1000 * 5 / 60 # Compute the difference in distances dist_diff = tf.abs(true_dist - pred_dist) # Apply penalty if the predicted distance is greater than the maximum possible distance dist_diff = tf.where(pred_dist > max_speed_m_per_5min, pred_dist - max_speed_m_per_5min, dist_diff) # Penalty for the first output coordinate not being the same as the last input input_output_diff = haversine_distance(y_true_denorm[:, 0, 0], y_true_denorm[:, 0, 1], y_pred_denorm[:, 0, 0], y_pred_denorm[:, 0, 1]) # Compute RMSE excluding the first element rmse = K.sqrt(K.mean(K.square(y_true_denorm[:, 1:, :] - y_pred_denorm[:, 1:, :]), axis=1)) tf.debugging.check_numerics(y_true_denorm, "y_true_denorm contains NaNs") tf.debugging.check_numerics(y_pred_denorm, "y_pred_denorm contains NaNs") tf.debugging.check_numerics(true_dist, "true_dist contains NaNs") tf.debugging.check_numerics(pred_dist, "pred_dist contains NaNs") tf.debugging.check_numerics(dist_diff, "dist_diff contains NaNs") tf.debugging.check_numerics(input_output_diff, "input_output_diff contains NaNs") tf.debugging.check_numerics(rmse, "rmse contains NaNs") # Final loss with weights # return 0.25 * K.mean(input_output_diff) + 0.35 * K.mean(dist_diff) + 0.40 * K.mean(rmse) return K.mean(rmse)

Model: "Training" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== Encoder_Input (InputLayer) [(None, 48, 8)] 0 [] Encoder (GRU) (None, 64) 14208 ['Encoder_Input[0][0]'] Repeater (RepeatVector) (None, 95, 64) 0 ['Encoder[0][0]'] Decoder (GRU) (None, 95, 64) 24960 ['Repeater[0][0]', 'Encoder[0][0]'] Output_Adjust (TimeDistributed (None, 95, 2) 130 ['Decoder[0][0]', ) 'Decoder-TF[0][0]'] Decoder-GT-Input (InputLayer) [(None, 95, 2)] 0 [] Mixing_Probability (InputLayer [(None, 1)] 0 [] ) Probabilistic_Teacher_Forcing (None, 95, 2) 0 ['Decoder-GT-Input[0][0]', (ProbabilisticTeacherForcing) 'Output_Adjust[0][0]', 'Mixing_Probability[0][0]'] Decoder-TF (GRU) (None, 95, 64) 13056 ['Probabilistic_Teacher_Forcing[0 ][0]', 'Encoder[0][0]'] ================================================================================================== Total params: 52,354 Trainable params: 52,354 Non-trainable params: 0 __________________________________________________________________________________________________ Model: "Inference" __________________________________________________________________________________________________ Layer (type) Output Shape Param # Connected to ================================================================================================== Encoder_Input (InputLayer) [(None, 48, 8)] 0 [] Encoder (GRU) (None, 64) 14208 ['Encoder_Input[0][0]'] Repeater (RepeatVector) (None, 95, 64) 0 ['Encoder[0][0]'] Decoder (GRU) (None, 95, 64) 24960 ['Repeater[0][0]', 'Encoder[0][0]'] Output_Adjust (TimeDistributed (None, 95, 2) 130 ['Decoder[0][0]'] ) ================================================================================================== Total params: 39,298 Trainable params: 39,298 Non-trainable params: 0 __________________________________________________________________________________________________

def train_model(model, x_train, y_train, batch_size, epochs, validation_split, model_name): run = wandb.init(project="kAISdb", anonymous="allow") # start the wandb run # Set the initial mixing probability mixing_prob = 0.5 # Update y_train to have the same dimensions as the output y_train = y_train[:, :(OUTPUT_TIMESTEPS - 1), :] # Create the ground truth input for the decoder by appending a padding at the beginning of the sequence decoder_ground_truth_input_data = (np.zeros((y_train.shape[0], 1, y_train.shape[2])), y_train[:, :-1, :]) decoder_ground_truth_input_data = np.concatenate(decoder_ground_truth_input_data, axis=1) try: # Train the model with Teacher Forcing with tf.device(tf.test.gpu_device_name()): training_model.fit([x_train, decoder_ground_truth_input_data, np.full((x_train.shape[0], 1), mixing_prob)], y_train, batch_size=batch_size, epochs=epochs, verbose=2, validation_split=validation_split, callbacks=create_callbacks(model_name)) # , sample_weight=straightness_ratios) except KeyboardInterrupt as e: print("\nRestoring best weights [...]") # Load the weights of the teacher-forcing model training_model.load_weights(model_name) # Transfering the weights to the inference model for layer in model.layers: if layer.name in [l.name for l in training_model.layers]: layer.set_weights(training_model.get_layer(layer.name).get_weights()) run.finish() # finish the wandb run model_name = "TF-GRU-AE.h5" full_path = os.path.join(ROOT, MODELS, model_name) if True:#not os.path.exists(full_path): train_model(model, x_train, y_train, batch_size=1024, epochs=250, validation_split=0.2, model_name=model_name) else: training_model.load_weights(full_path) for layer in model.layers: # inference model initialization if layer.name in [l.name for l in training_model.layers]: layer.set_weights(training_model.get_layer(layer.name).get_weights())

def evaluate_model(model, x_test, y_test, y_mean, y_std, y_min, y_max, y_pred=None): def single_trajectory_error(y_test, y_pred, index): distances = haversine_distance(y_test[index, :, 0], y_test[index, :, 1], y_pred[index, :, 0], y_pred[index, :, 1]) return np.min(distances), np.max(distances), np.mean(distances), np.median(distances) # Modify this function to handle teacher-forced models with 95 output variables instead of 96 def all_trajectory_error(y_test, y_pred): errors = [single_trajectory_error(y_test[:, 1:], y_pred, i) for i in range(y_test.shape[0])] min_errors, max_errors, mean_errors, median_errors = zip(*errors) return min(min_errors), max(max_errors), np.mean(mean_errors), np.median(median_errors) def plot_trajectory(x_test, y_test, y_pred, sample_index): min_error, max_error, mean_error, median_error = single_trajectory_error(y_test, y_pred, sample_index) fig = go.Figure() fig.add_trace(go.Scatter(x=x_test[sample_index, :, 0], y=x_test[sample_index, :, 1], mode="lines", name="Input Data", line=dict(color="green"))) fig.add_trace(go.Scatter(x=y_test[sample_index, :, 0], y=y_test[sample_index, :, 1], mode="lines", name="Ground Truth", line=dict(color="blue"))) fig.add_trace(go.Scatter(x=y_pred[sample_index, :, 0], y=y_pred[sample_index, :, 1], mode="lines", name="Forecasted Trajectory", line=dict(color="red"))) fig.update_layout(title=f"Sample Index: {sample_index} | Distance Errors (in meteres):<br>Min: {min_error:.2f}m, Max: {max_error:.2f}m, " f"Mean: {mean_error:.2f}m, Median: {median_error:.2f}m", xaxis_title="Longitude", yaxis_title="Latitude", plot_bgcolor="#e4eaf0", paper_bgcolor="#fcfcfc", width=700, height=600) max_lon, max_lat = -58.705587131108196, 47.89066160591873 min_lon, min_lat = -61.34247286889181, 46.09201839408127 fig.update_xaxes(range=[min_lon, max_lon]) fig.update_yaxes(range=[min_lat, max_lat]) return fig if y_pred is None: with tf.device(tf.test.gpu_device_name()): y_pred = model.predict(x_test, verbose=0) y_pred_o = y_pred # preserve the result x_test = denormalize_x(x_test, x_mean, x_std, x_min, x_max) y_pred = denormalize_y(y_pred_o, y_mean, y_std, y_min, y_max) # Modify this line to handle teacher-forced models with 95 output variables instead of 96 for sample_index in [1000, 2500, 5000, 7500]: display(plot_trajectory(x_test, y_test[:, 1:], y_pred, sample_index)) # The metrics require a lower dimension (no impact on the results) y_test_reshaped = np.reshape(y_test[:, 1:], (-1, y_test.shape[2])) y_pred_reshaped = np.reshape(y_pred, (-1, y_pred.shape[2])) # Physical Distance Error given in meters all_min_error, all_max_error, all_mean_error, all_median_error = all_trajectory_error(y_test, y_pred) print("\nAll Trajectories Min DE: {:.4f}m".format(all_min_error)) print("All Trajectories Max DE: {:.4f}m".format(all_max_error)) print("All Trajectories Mean DE: {:.4f}m".format(all_mean_error)) print("All Trajectories Median DE: {:.4f}m".format(all_median_error)) # Calculate evaluation metrics on the test data r2 = r2_score(y_test_reshaped, y_pred_reshaped) mse = mean_squared_error(y_test_reshaped, y_pred_reshaped) mae = mean_absolute_error(y_test_reshaped, y_pred_reshaped) evs = explained_variance_score(y_test_reshaped, y_pred_reshaped) mape = mean_absolute_percentage_error(y_test_reshaped, y_pred_reshaped) rmse = np.sqrt(mse) print(f"\nTest R^2: {r2:.4f}") print(f"Test MAE: {mae:.4f}") print(f"Test MSE: {mse:.4f}") print(f"Test RMSE: {rmse:.4f}") print(f"Test MAPE: {mape:.4f}") print(f"Test Explained Variance Score: {evs:.4f}") return y_pred_o _ = evaluate_model(model, x_test, y_test, y_mean, y_std, y_min, y_max)

def objective(hyperparams, x_train, y_train, straightness_ratios, model_prefix): # Get the best hyperparameters from the optimization results enc_units_1 = hyperparams["enc_units_1"] dec_units_1 = hyperparams["dec_units_1"] mixing_prob = hyperparams["mixing_prob"] lr = hyperparams["learning_rate"] # Create the model name using the best hyperparameters model_name = f"{model_prefix}-{enc_units_1}-{dec_units_1}-{mixing_prob}-{lr}.h5" full_path = os.path.join(ROOT, MODELS, model_name) # best model full path # Check if the model results file with this name already exists if not os.path.exists(full_path.replace(".h5", ".pkl")): print(f"Saving under {model_name}.") # Define the model architecture training_model, _ = build_model(enc_units_1=enc_units_1, dec_units_1=dec_units_1) compile_model(training_model, learning_rate=lr, clipnorm=1, jit_compile=True, skip_summary=True) # Update y_train to have the same dimensions as the output y_train = y_train[:, :(OUTPUT_TIMESTEPS - 1), :] # Create the ground truth input for the decoder by appending a padding at the beginning of the sequence decoder_ground_truth_input_data = (np.zeros((y_train.shape[0], 1, y_train.shape[2])), y_train[:, :-1, :]) decoder_ground_truth_input_data = np.concatenate(decoder_ground_truth_input_data, axis=1) # Train the model on the data, using GPU if available with tf.device(tf.test.gpu_device_name()): history = training_model.fit([x_train, decoder_ground_truth_input_data, np.full((x_train.shape[0], 1), mixing_prob)], y_train, batch_size=10240, epochs=250, validation_split=.2, verbose=0, workers=multiprocessing.cpu_count(), use_multiprocessing=True, callbacks=create_callbacks(model_name, skip_wandb=True)) #, sample_weight=straightness_ratios) # Save the training history save_history(history.history, model_name) # Clear the session to release resources del training_model; tf.keras.backend.clear_session() else: print("Loading pre-trained weights.") history = load_history(model_name) if type(history) == dict: # validation loss of the model return {"loss": history["val_loss"][-1], "status": STATUS_OK} else: return {"loss": history.history["val_loss"][-1], "status": STATUS_OK}

def optimize_hyperparameters(max_evals, model_prefix, x_train, y_train, sample_size=5000): def build_space(n_min=2, n_steps=9): # Defining a custom 2^N range function n_range = lambda n_min, n_steps: np.array( [2**n for n in range(n_min, n_steps) if 2**n >= n_min]) # Defining the unconstrained search space encoder_1_range = n_range(n_min, n_steps) decoder_1_range = n_range(n_min, n_steps) learning_rate_range = [.01, .001, .0001] mixing_prob_range = [.25, .5, .75] # Enforcinf contraints to the search space enc_units_1 = np.random.choice(encoder_1_range) dec_units_1 = np.random.choice(decoder_1_range[np.where(decoder_1_range == enc_units_1)]) learning_rate = np.random.choice(learning_rate_range) mixing_prob = np.random.choice(mixing_prob_range) # Returns a single element of the search space return dict(enc_units_1=enc_units_1, dec_units_1=dec_units_1, learning_rate=learning_rate, mixing_prob=mixing_prob) # Select the search space based on a pre-set sampled random space search_space = hp.choice("hyperparams", [build_space() for _ in range(sample_size)]) trials = Trials() # initialize Hyperopt trials # Define the objective function for Hyperopt fn = lambda hyperparams: objective(hyperparams, x_train, y_train, straightness_ratios, model_prefix) # Perform Hyperopt optimization and find the best hyperparameters best = fmin(fn=fn, space=search_space, algo=tpe.suggest, max_evals=max_evals, trials=trials) best_hyperparams = space_eval(search_space, best) # Get the best hyperparameters from the optimization results enc_units_1 = best_hyperparams["enc_units_1"] dec_units_1 = best_hyperparams["dec_units_1"] mixing_prob = best_hyperparams["mixing_prob"] lr = best_hyperparams["learning_rate"] # Create the model name using the best hyperparameters model_name = f"{model_prefix}-{enc_units_1}-{dec_units_1}-{mixing_prob}-{lr}.h5" full_path = os.path.join(ROOT, MODELS, model_name) # best model full path t_model, i_model = build_model(enc_units_1=enc_units_1, dec_units_1=dec_units_1) t_model = tf.keras.models.load_model(full_path) for layer in i_model.layers: # inference model initialization if layer.name in [l.name for l in t_model.layers]: layer.set_weights(t_model.get_layer(layer.name).get_weights()) print(f"Best hyperparameters:") print(f" Encoder units 1: {enc_units_1}") print(f" Decoder units 1: {dec_units_1}") print(f" Mixing proba.: {mixing_prob}") print(f" Learning rate: {lr}") return i_model max_evals, model_prefix = 100, "TF-GRU" # best_model = optimize_hyperparameters(max_evals, model_prefix, x_train, y_train) # [NOTE] YOU CAN SKIP THIS STEP BY LOADING THE PRE-TRAINED WEIGHTS ON THE NEXT CELL.

def permutation_feature_importance(model, x_test, y_test, metric): # Function to calculate permutation feature importance def PFI(model, x, y_true, metric): # Reshape the true values for easier comparison with predictions y_true = np.reshape(y_true, (-1, y_true.shape[2])) # Predict using the model and reshape the predicted values with tf.device(tf.test.gpu_device_name()): y_pred = model.predict(x, verbose=0) y_pred = np.reshape(y_pred, (-1, y_pred.shape[2])) # Calculate the baseline score using the given metric baseline_score = metric(y_true, y_pred) # Initialize an array for feature importances feature_importances = np.zeros(x.shape[2]) # Calculate the importance for each feature for feature_idx in range(x.shape[2]): x_permuted = x.copy() x_permuted[:, :, feature_idx] = np.random.permutation(x[:, :, feature_idx]) # Predict using the permuted input and reshape the predicted values with tf.device(tf.test.gpu_device_name()): y_pred_permuted = model.predict(x_permuted, verbose=0) y_pred_permuted = np.reshape(y_pred_permuted, (-1, y_pred_permuted.shape[2])) # Calculate the score with permuted input permuted_score = metric(y_true, y_pred_permuted) # Compute the feature importance as the difference between permuted and baseline scores feature_importances[feature_idx] = permuted_score - baseline_score return feature_importances feature_importances = PFI(model, x_test, y_test, metric) # Prepare the data for plotting (require a dataframe) feature_names = ["Longitude", "Latitude", "COG", "SOG"] feature_importance_df = pd.DataFrame({"features": feature_names, "importance": feature_importances}) # Create the bar plot with Altair bar_plot = alt.Chart(feature_importance_df).mark_bar(size=40, color="mediumblue", opacity=0.8).encode( x=alt.X("features:N", title="Features", axis=alt.Axis(labelFontSize=12, titleFontSize=14)), y=alt.Y("importance:Q", title="Permutation Importance", axis=alt.Axis(labelFontSize=12, titleFontSize=14)), ).properties(title=alt.TitleParams(text="Feature Importance", fontSize=16, fontWeight="bold"), width=400, height=300) return bar_plot, feature_importances permutation_feature_importance(best_model, x_test, y_test, mean_absolute_error)[0].display()

def sensitivity_analysis(model, x_sample, perturbation_range=(-0.1, 0.1), num_steps=10, plot_nrows=4): # Get the number of features and outputs num_features = x_sample.shape[1] num_outputs = model.output_shape[-1] * model.output_shape[-2] # Create an array of perturbations perturbations = np.linspace(perturbation_range[0], perturbation_range[1], num_steps) # Initialize sensitivity array sensitivity = np.zeros((num_features, num_outputs, num_steps)) # Get the original prediction for the input sample original_prediction = model.predict(x_sample.reshape(1, -1, 4), verbose=0).reshape(-1) # Iterate over input features and perturbations for feature_idx in range(num_features): for i, perturbation in enumerate(perturbations): # Create a perturbed version of the input sample perturbed_sample = x_sample.copy() perturbed_sample[:, feature_idx] += perturbation # Get the prediction for the perturbed input sample perturbed_prediction = model.predict(perturbed_sample.reshape(1, -1, 4), verbose=0).reshape(-1) # Calculate the absolute prediction change and store it in the sensitivity array sensitivity[feature_idx, :, i] = np.abs(perturbed_prediction - original_prediction) # Determine the number of rows and columns in the plot ncols = 6 nrows = max(min(plot_nrows, math.ceil(num_outputs / ncols)), 1) # Define feature names feature_names = ["Longitude", "Latitude", "COG", "SOG"] # Create the sensitivity plot fig, axs = plt.subplots(nrows, ncols, figsize=(18, 3 * nrows), sharex=True, sharey=True) axs = axs.ravel() output_idx = 0 for row in range(nrows): for col in range(ncols): if output_idx < num_outputs: # Plot sensitivity curves for each feature for feature_idx in range(num_features): axs[output_idx].plot(perturbations, sensitivity[feature_idx, output_idx], label=f'{feature_names[feature_idx]}') # Set the title for each subplot axs[output_idx].set_title(f'Output {output_idx // 2 + 1}, {"Longitude" if output_idx % 2 == 0 else "Latitude"}') output_idx += 1 # Set common labels and legend fig.text(0.5, 0.04, 'Perturbation', ha='center', va='center') fig.text(0.06, 0.5, 'Absolute Prediction Change', ha='center', va='center', rotation='vertical') handles, labels = axs[0].get_legend_handles_labels() fig.legend(handles, labels, loc='upper center', ncol=num_features, bbox_to_anchor=(.5, .87)) plt.tight_layout() plt.subplots_adjust(top=0.8, bottom=0.1, left=0.1, right=0.9) plt.show() return sensitivity x_sample = x_test[100] # Select a sample from the test set sensitivity = sensitivity_analysis(best_model, x_sample)

AutoEncoders in Keras

hashtagIntroduction

hashtagAISdb Querying

hashtagData Visualization

hashtagDataset Preparation

hashtagInputs & Outputs

hashtagData Filtering

hashtagData Statistics

hashtagSample Weighting

hashtagDistance Function

hashtagComplexity Score

hashtagSample Windowing

hashtagGated Recurrent Unit AutoEncoder

hashtagCustom Loss Function

hashtagModel Summary

hashtagTraining Callbacks

hashtagModel Training

hashtagModel Evaluation

hashtagHyperparameters Tuning

hashtagSearch for Best Model

hashtagEvaluating Best Model

hashtagModel Explainability

hashtagPermutation Feature Importance (PFI)

hashtagSensitivity Analysis

hashtagUMAP: Uniform Manifold Approximation and Projection

hashtagFinal Considerations

AutoEncoders in Keras

hashtagIntroduction

hashtagAISdb Querying

hashtagData Visualization

hashtagDataset Preparation

hashtagInputs & Outputs

hashtagData Filtering

hashtagData Statistics

hashtagSample Weighting

hashtagDistance Function

hashtagComplexity Score

hashtagSample Windowing

hashtagGated Recurrent Unit AutoEncoder

hashtagCustom Loss Function

hashtagModel Summary

hashtagTraining Callbacks

hashtagModel Training

hashtagModel Evaluation

hashtagHyperparameters Tuning

hashtagSearch for Best Model

hashtagEvaluating Best Model

hashtagModel Explainability

hashtagPermutation Feature Importance (PFI)

hashtagSensitivity Analysis

hashtagUMAP: Uniform Manifold Approximation and Projection

hashtagFinal Considerations

Introduction

AISdb Querying

Data Visualization

Dataset Preparation

Inputs & Outputs

Data Filtering

Data Statistics

Sample Weighting

Distance Function

Complexity Score

Sample Windowing

Gated Recurrent Unit AutoEncoder

Custom Loss Function

Model Summary

Training Callbacks

Model Training

Model Evaluation

Hyperparameters Tuning

Search for Best Model

Evaluating Best Model

Model Explainability

Permutation Feature Importance (PFI)

Sensitivity Analysis

UMAP: Uniform Manifold Approximation and Projection

Final Considerations

Introduction

AISdb Querying

Data Visualization

Dataset Preparation

Inputs & Outputs

Data Filtering

Data Statistics

Sample Weighting

Distance Function

Complexity Score

Sample Windowing

Gated Recurrent Unit AutoEncoder

Custom Loss Function

Model Summary

Training Callbacks

Model Training

Model Evaluation

Hyperparameters Tuning

Search for Best Model

Evaluating Best Model

Model Explainability

Permutation Feature Importance (PFI)

Sensitivity Analysis

UMAP: Uniform Manifold Approximation and Projection

Final Considerations