Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
A little bit about where we stand.
Welcome to AISdb - a comprehensive gateway for Automatic Identification System (AIS) data uses and applications. AISdb is part of the Making Vessels Tracking Data Available to Everyone (AISViz) project within the Marine Environmental Research Infrastructure for Data Integration and Application Network (MERIDIAN) initiative at Dalhousie University, designed to streamline the collection, processing, and analysis of AIS data, both in live-streaming scenarios and through historical records.
The primary features AISdb provides include:
SQL database for storing AIS position reports and vessel metadata: At the heart of AISdb is a robust database system built on SQLite, giving users a friendly Python interface with which to interact. This interface simplifies tasks like database creation, data querying, processing, visualization, and even exporting data to CSV format for diverse uses. To cater to advanced needs, AISdb supports using Postgres, offering superior concurrency handling and data-sharing capabilities for collaborative team environments.
Vessel data cleaning and trajectory modeling: AISdb includes features for vessel position cleaning and trajectory modeling. This ensures that the data used for analyses is accurate and reliable, providing a solid foundation for further studies and applications.
Integration with environmental context and external metadata: One of AISdb's unique features is its ability to enrich AIS datasets with environmental context. Users can seamlessly integrate oceanographic and bathymetric data in raster formats to bring depth to their analyses — quite literally, as the tool allows for incorporating seafloor depth data underneath vessel positions. Such versatility ensures that AISdb users can merge various environmental data points with AIS information, resulting in richer, multi-faceted maritime studies.
Advanced features for maritime studies: AISdb offers network graph analysis, MMSI deduplication, interpolation, and other processing utilities. These features enable advanced data processing and analysis, supporting complex maritime studies and applications.
Python interface and machine learning for vessel behavior modeling: AISdb includes a Python interface with a RUST background that paves the way for incorporating machine learning and deep learning techniques into vessel behavior modeling in an optimized way. This aspect of AISdb enhances the reproducibility and scalability of research, be it for academic exploration or practical industry applications.
Research support: AISdb is not just a storage and processing utility; it's a platform that facilitates robust research methods. Academics, industry experts, and researchers will find the values that AISdb offers, including extensive Canadian AIS data of up to 100km from any side of the Canadian coast, stretching from January 2012 to the present, with monthly updates. This regularly updated repository offers raw and parsed data formats readily available for AIS-related research endeavors. By removing the preprocessing barrier, AISdb accelerates and simplifies the research process for everyone involved. Although AISdb is open-source and can be used with any AIS dataset, accessing our preprocessed dataset requires a formal partnership with our research initiative.
The AISViz team, under the umbrella of the MERIDIAN project, is based in the Maritime Risk and Safety (MARS) research group at Dalhousie University. Funded by the Department of Fisheries and Oceans Canada (DFO), our mission revolves around democratizing AIS data use, making it accessible and understandable across multiple sectors, from government and academia to NGOs and the broader public. Besides, AISViz aims to introduce advanced machine learning applications into AIS data handling of AISdb. This innovation seeks to streamline user interactions with AIS data, enhancing the user experience by simplifying data access and manipulation.
Our commitment goes beyond just providing tools. Through AISViz, we're opening doors to innovative research and policy development, targeting environmental conservation, maritime traffic management, and much more. Whether you're a professional in the field, an educator, or a maritime enthusiast, AISViz and its components, including AISdb, offer the knowledge and technology to deepen your understanding and significantly impact marine vessel tracking and the well-being of our oceans.
Ruixin Song is a research assistant in the Computer Science Department at Dalhousie University. She has an M.Sc. in Computer Science and a B.Eng. in Spatial Information and Digital Technology. Her recent work focuses on marine traffic data analysis and physics-inspired models, particularly in relation to biological invasions in the ocean. Her research interests include mobility data mining, graph neural networks, and network flow and optimization problems.
Contact: rsong@dal.ca
Jinkun Chen is a Ph.D. student in Computer Science at Dalhousie University, specializing in Explainable AI, Natural Language Processing (NLP), and Visualization. He earned a bachelor's degree in Computer Science with First-Class Honours from Dalhousie University. Jinkun is actively involved in research, working on advancing fairness, responsibility, trustworthiness, and explainability within Large Language Models (LLMs) and AI. In addition to his academic pursuits, Jinkun also serves as an AIS Data Analyst at MERIDIAN and is a valuable member of the HyperMatrix Lab and MALNIS Lab, all of which contribute to his research-related activities.
Contact: jinkun.chen@dal.ca
Gabriel Spadon is an Assistant Professor at the Faculty of Computer Science at Dalhousie University, Halifax - NS, Canada. He holds a Ph.D. and an MSc in Computer Science from the University of Sao Paulo, Sao Carlos - SP, Brazil. His research focuses on spatio-temporal analytics, time-series forecasting, and complex network mining, with deep involvement in Data Science & Engineering and GeoInformatics and a particular interest in mobility-related problems.
Contact: spadon@dal.ca
Ron Pelot has a Ph.D. in Management Sciences and is a Professor of Industrial Engineering at Dalhousie University. For the last 30 years, he and his team (MARS) have been working on developing new software tools and analysis methods for maritime traffic safety, coastal zone security, and marine spills. Their research methods include spatial risk analysis, vessel traffic modeling, data processing, pattern analysis, location models for response resource allocation, safety analyses, and cumulative shipping impact studies.
Contact: ronald.pelot@dal.ca
Adjunct Members
Vaishnav Vaidheeswaran is a Master's student in Computer Science at Dalhousie University. He holds a B.Tech in Computer Science and Engineering and has three years of experience as a software engineer in India, working at cutting-edge startups. His ongoing work addresses ways to handle the curse of dimensionality in machine learning models, using vessel tracking and multi-source high-dimensional datasets. His research interests include large language models, graph neural networks, and reinforcement learning.
Contact: vaishnav@dal.ca
Jay Kumar has a Ph.D. in Computer Science and Technology and was a postdoctoral fellow at the Department of Industrial Engineering at Dalhousie University. He has researched AI models for time-series data for over five years, focusing on Recurrent Neural models, probabilistic modeling, and feature engineering data analytics applied to ocean traffic. His research interests include Spatio-temporal Data Mining, Stochastic Modeling, Machine Learning, and Deep Learning.
Matthew Smith has a BSc degree in Applied Computer Science from Dalhousie University and specializes in managing and analyzing vessel tracking data. He is currently a Software Engineer at Radformation in Toronto, ON. Matt served as the AIS data manager on the MERIDIAN project, where he supported research groups across Canada in accessing and utilizing AIS data. The data was used to answer a range of scientific queries, including the impact of shipping on underwater noise pollution and the danger posed to endangered marine mammals by vessel collisions.
Casey Hilliard has a BSc degree in Computer Science from Dalhousie University and was a Senior Data Manager at the Institute for Big Data Analytics. He is currently a Chief Architect at GSTS (Global Spatial Technology Solutions) in Dartmouth, NS. Casey was a long-time research support staff member at the Institute and an expert in managing and using AIS vessel-tracking data. During his time, he assisted in advancing the Institute's research projects by managing and organizing large datasets, ensuring data integrity, and facilitating data usage in research.
Stan Matwin was the director of the Institute for Big Data Analytics, Dalhousie University, Halifax, Nova Scotia; he is a professor and Canada Research Chair (Tier 1) in Interpretability for Machine Learning. He is also a distinguished professor (Emeritus) at the University of Ottawa and a full professor with the Institute of Computer Science, Polish Academy of Sciences. His main research interests include big data, text mining, machine learning, and data privacy. He is a member of the Editorial Boards of IEEE Transactions on Knowledge and Data Engineering and the Journal of Intelligent Information Systems. He received the Lifetime Achievement Award of the Canadian AI Association (CAIAC).
We are passionate about fostering a collaborative and engaged community. We welcome your questions, insights, and feedback as vital components of our continuous improvement and innovation. Should you have any inquiries about AISdb, desire further information on our research, or wish to explore potential collaborations, please don't hesitate to contact us. Staying connected with users and researchers plays a crucial role in shaping the tool's development and ensuring it meets the diverse needs of our growing user base. You can easily contact our team via email or our GitHub team platform. In addition to addressing individual queries, we are committed to organizing webinars and workshops and presenting at conferences to share knowledge, gather feedback, and widen our outreach (stay tuned for more information about these). Together, let's advance the understanding and utilization of marine data for a brighter, more informed future in ocean research and preservation.
When loading data into the database, messages will be sorted into SQL tables determined by the message type and month. The names of these tables follow the following format, which {YYYYMM}
indicates the table year and month in the format YYYYMM.
Some additional tables containing computed data may be created depending on the indexes used. For example, an aggregate of vessel static data by month or a virtual table is used as a covering index.
Additional tables are also included for storing data not directly derived from AIS message reports.
For quick reference to data types and detailed explanations of these table entries, please see the Detailed Table Description.
In addition to querying the database using DBQuery
module, there is an option to customize the query with your own SQL code.
Example of listing all the tables in your database:
As messages are separated into tables by message type and month, queries spanning multiple message types or months should use UNIONs and JOINs to combine results as appropriate.
Example of querying tables with `JOIN`:
More information about SQL queries can be looked up from online tutorials.
The R* tree virtual tables should be queried for AIS position reports instead of the default tables. Query performance can be significantly improved using the R* tree index when restricting output to a narrow range of MMSIs, timestamps, longitudes, and latitudes. However, querying a wide range will not yield much benefit. If custom indexes are required for specific manual queries, these should be defined on message tables 1_2_3, 5, 18, and 24 directly instead of upon the virtual tables.
Timestamps are stored as epoch minutes in the database. To facilitate querying the database manually, use the dt_2_epoch()
function to convert datetime values to epoch minutes and the epoch_2_dt()
function to convert epoch minutes back to datetime values. Here is how you can use dt_2_epoch()
with the example above:
For more examples, please see the SQL code in aisdb_sql/
that is used to create database tables and associated queries.
ais_{YYYYMM}_dynamic
tablesmmsi
INTEGER
Maritime Mobile Service Identity, a unique identifier for vessels.
time
INTEGER
Timestamp of the AIS message, in epoch seconds.
longitude
REAL
Longitude of the vessel in decimal degrees.
latitude
REAL
Latitude of the vessel in decimal degrees.
rot
REAL
Rate of turn, indicating how fast the vessel is turning.
sog
REAL
Speed over ground, in knots.
cog
REAL
Course over ground, in degrees.
heading
REAL
Heading of the vessel, in degrees.
maneuver
BOOLEAN
Indicator for whether the vessel is performing a special maneuver.
utc_second
INTEGER
Second of the UTC timestamp when the message was generated.
source
TEXT
Source of the AIS data.
ais_{YYYYMM}_static
tablesmmsi
INTEGER
Maritime Mobile Service Identity, a unique identifier for vessels.
time
INTEGER
Timestamp of the AIS message, in epoch seconds.
vessel_name
TEXT
Name of the vessel.
ship_type
INTEGER
Numeric code representing the type of ship.
call_sign
TEXT
International radio call sign of the vessel.
imo
INTEGER
International Maritime Organization number, another unique vessel identifier.
dim_bow
INTEGER
Distance from the AIS transmitter to the bow (front) of the vessel.
dim_stern
INTEGER
Distance from the AIS transmitter to the stern (back) of the vessel.
dim_port
INTEGER
Distance from the AIS transmitter to the port (left) side of the vessel.
dim_star
INTEGER
Distance from the AIS transmitter to the starboard (right) side of the vessel.
draught
REAL
Maximum depth of the vessel's hull below the waterline, in meters.
destination
TEXT
Destination port or location where the vessel is heading.
ais_version
INTEGER
AIS protocol version used by the vessel.
fixing_device
TEXT
Type of device used for fixing the vessel's position (e.g., GPS).
eta_month
INTEGER
Estimated time of arrival month.
eta_day
INTEGER
Estimated time of arrival day.
eta_hour
INTEGER
Estimated time of arrival hour.
eta_minute
INTEGER
Estimated time of arrival minute.
source
TEXT
Source of the AIS data (e.g., specific AIS receiver or data provider).
static_{YYYYMM}_aggregate
tablesmmsi
INTEGER
Maritime Mobile Service Identity, a unique identifier for vessels.
imo
INTEGER
International Maritime Organization number, another unique vessel identifier.
vessel_name
TEXT
Name of the vessel.
ship_type
INTEGER
Numeric code representing the type of ship.
call_sign
TEXT
International radio call sign of the vessel.
dim_bow
INTEGER
Distance from the AIS transmitter to the bow (front) of the vessel.
dim_stern
INTEGER
Distance from the AIS transmitter to the stern (back) of the vessel.
dim_port
INTEGER
Distance from the AIS transmitter to the port (left) side of the vessel.
dim_star
INTEGER
Distance from the AIS transmitter to the starboard (right) side of the vessel.
draught
REAL
Maximum depth of the vessel's hull below the waterline, in meters.
destination
TEXT
Destination port or location where the vessel is heading.
eta_month
INTEGER
Estimated time of arrival month.
eta_day
INTEGER
Estimated time of arrival day.
eta_hour
INTEGER
Estimated time of arrival hour.
eta_minute
INTEGER
Estimated time of arrival minute.
This tutorial will guide you in using the AISdb package to load AIS data into a database and perform queries. We will begin with AISdb installation and environment setup, then proceed to examples of querying the loaded data and creating simple visualizations.
Preparing a Python virtual environment for AISdb is a safe practice. It allows you to manage dependencies and prevent conflicts with other projects, ensuring a clean and isolated setup for your work with AISdb. Run these commands in your terminal based on the operating system you are using:
Now you can check your installation by running:
If you're using AISdb in Jupyter Notebook, please include the following commands in your notebook cells:
Then, import the required packages:
This section will show you how to efficiently load AIS data into a database.
AISdb includes two database connection approaches:
SQLite database connection; and,
PostgreSQL database connection.
We are working with the SQLite database in most of the usage scenarios. Here is an example of loading data using the sample data included in the AISdb package:
The code above decodes the AIS messages from the CSV file specified in filepaths
and inserts them into the SQLite database connected via dbconn
.
Following is a quick example of a query and visualization of the data we just loaded with AISdb:
In addition to SQLite database connection, PostgreSQL is used in AISdb for its superior concurrency handling and data-sharing capabilities, making it suitable for collaborative environments and handling larger datasets efficiently. The structure and interactions with PostgreSQL are designed to provide robust and scalable solutions for AIS data storage and querying. For PostgreSQL, you need the psycopg2
library:
To connect to a PostgreSQL database, AISdb uses the PostgresDBConn
class:
Example of performing queries and visualizations with PostgreSQL database:
Moreover, if you wish to use your own AIS data to create and process a database with AISdb, please check out our instructional guide on data processing and database creation: Using Your AIS Data.
A hands-on quick start guide for using AISdb.
To work with the AISdb Python package, please ensure that you have Python version 3.8 or higher. If you plan to use SQLite, no additional installation is required, as it is included with Python by default. However, for those who prefer using a PostgreSQL server, it will need to be installed separately.
The AISdb Python package can be conveniently installed using pip. It's highly recommended that a virtual Python environment be created and the package installed within it.
Alternatively, you may also use AISdb on Docker. Regardless of the installation procedure you decide to use, you can test your installation by running the following commands:
Notice that if you are running Jupyter, ensure it is installed in the same environment as AISdb:
The Python code in the rest of this document can be run in the Python environment you created.
For using nightly builds (not mandatory), you can install it from the source:
We may introduce new changes on different branches; however, the master branch contains changes that have passed testing and is generally more stable.
This option requires an optional dependency psycopg
for interfacing with Postgres databases. Beware that Postgres accepts these keyword arguments. Alternatively, a connection string may be used. Information on connection strings and Postgres URI format can be found here.
Querying SQLite is as easy as informing the name of a ".db" file with the same entity-relationship as the databases supported by AIS, which are detailed in the SQL Database section. We prepared an example SQLite database example_data.db
based on 2-month of AIS data (01/01/2022 - 03/01/2022) from Marine Cadastre, which is available in this Tutorial GitHub repository.
If you want to create your own database using your data, we have a tutorial with examples that shows you how to create an SQLite database from open-source data.
Parameters for the database query can be defined using aisdb.database.dbqry.DBQuery
. Iterate over rows returned from the database for each vessel with aisdb.database.dbqry.DBQuery.gen_qry()
. Convert the results into generator-yielding dictionaries with NumPy arrays describing position vectors, e.g., lon, lat, and time, using aisdb.track_gen.TrackGen()
.
The following query will return vessel trajectories from a given 1-hour time window:
A specific region can be queried for AIS data using aisdb.gis.Domain
or one of its sub-classes to define a collection of shapely
polygon features. For this example, the domain contains a single bounding box polygon derived from a longitude/latitude coordinate pair and radial distance specified in meters. If multiple features are included in the domain object, the domain boundaries will encompass the convex hull of all features.
Additional query callbacks for filtering by region, timeframe, identifier, etc. can be found in aisdb.database.sql_query_strings
and aisdb.database.sqlfcn_callbacks
The above generator can be input into a processing function, yielding modified results. For example, to model the activity of vessels on a per-voyage or per-transit basis, each voyage is defined as a continuous vector of vessel positions where the time between observed timestamps never exceeds a 24-hour period.
A common problem with AIS data is noise, where multiple vessels might broadcast using the same identifier simultaneously. AISdb integrates data cleaning techniques to denoise the vessel track data; for details:
(1) Denoising with Encoder: The aisdb.denoising_encoder.encode_greatcircledistance()
function checks the approximate distance between each vessel’s position. It separates vectors where a vessel couldn’t reasonably travel using the most direct path, such as speeds over 50 knots.
(2) Distance and Speed Thresholds: A distance and speed threshold limits the maximum distance or time between messages that can be considered continuous.
(3) Scoring and Segment Concatenation: A score is computed for each position delta, with sequential messages nearby at shorter intervals given a higher score. This score is calculated by dividing the Haversine distance by elapsed time. Any deltas with a score not reaching the minimum threshold are considered the start of a new segment. New segments are compared to the end of existing segments with the same vessel identifier; if the score exceeds the minimum, they are concatenated. If multiple segments meet the minimum score, the new segment is concatenated to the existing segment with the highest score.
Processing functions may be executed in sequence as a processing chain or pipeline, so after segmenting the individual voyages as shown above, results can be input into the encoder to remove noise and correct for vessels with duplicate identifiers effectively.
Building on the above processing pipeline, the resulting cleaned trajectories can be geofenced and filtered for results contained by at least one domain polygon and interpolated for uniformity.
Additional processing functions can be found in the aisdb.track_gen
module.
The resulting processed voyage data can be exported in CSV format instead of being printed:
AISDB supports integrating external data sources such as bathymetric charts and other raster grids.
To determine the approximate ocean depth at each vessel position, theaisdb.webdata.bathymetry
module can be used.
Once the data has been downloaded, the Gebco()
class may be used to append bathymetric data to tracks in the context of a TrackGen()
processing pipeline like the processing functions described above.
Also, see aisdb.webdata.shore_dist.ShoreDist
for determining the approximate nearest distance to shore from vessel positions.
Similarly, arbitrary raster coordinate-gridded data may be appended to vessel tracks
AIS data from the database may be overlayed on a map such as the one shown above using the aisdb.web_interface.visualize()
function. This function accepts a generator of track dictionaries such as those output by aisdb.track_gen.TrackGen()
.
Data querying with AISdb involves setting up a connection to the database, defining query parameters, creating and executing the query, and processing the results. Following the previous tutorial, Database Loading, we set up a database connection and made simple queries and visualizations. This tutorial will dig into data query functions and parameters and show you the queries you can make with AISdb.
Data querying with AISdb includes two components: DBQuery
and TrackGen
. In this section, we will introduce each component with examples. Before starting data querying, please ensure you have connected to the database. If you have not done so, please follow the instructions and examples in Database Loading or Quick Start.
The DBQuery
class is used to create a query object that specifies the parameters for data retrieval, including the time range, spatial domain, and any filtering callbacks. Here is an example to create a DBQuery object and use parameters to specify the time range and geographical locations:
Callback functions are used in the DBQuery
class to filter data based on specific criteria. Some common callbacks include: in_bbox
, in_time_bbox
, valid_mmsi
, and in_time_bbox_validmmsi
. These callbacks ensure that the data retrieved matches the specific criteria defined in the query. Please find examples of using different callbacks with other parameters in Query types with practical examples.
gen_qry
The function gen_qry
is a method of the DBQuery
class in AISdb. It is responsible for generating rows of data that match the query criteria specified when creating the DBQuery
object. This function acts as a generator, yielding one row at a time and efficiently handling large datasets.
After creating the DBQuery
object, we can generate rows with gen_qry
:
Each row from gen_qry
is a tuple or dictionary representing a record in the database.
The TrackGen
class converts the generated rows from gen_qry
into tracks (trajectories). It takes the row generator and, optionally, a decimate
parameter to control point reduction. This conversion is essential for analyzing vessel movements, identifying patterns, and visualizing trajectories in later steps.
Following the generated rows above, here is how to use the TrackGen
class:
The TrackGen
class yields "tracks," which is a generator object. While iterating over tracks, each component is a dictionary representing a track for a specific vessel:
This is the output with our sample data:
In this section, we will provide practical examples of the most common querying types you can make using the DBQuery
class, including querying within a time range, geographical areas, and tracking vessels by MMSI. Different queries can be achieved by changing the callbacks
parameters and other parameters defined in the DBQuery
class. Then, we will use TrackGen
to convert these query results into structured tracks for further analysis and visualization.
First, we need to import the necessary packages and prepare data:
Querying data within a specified time range can be done by using the in_timerange_validmmsi
callback in the DBQuery
class:
This will display the queried vessel tracks (within a time range, has a valid MMSI) on the map:
You may find noise in some of the track data. In Data Cleaning, we introduced the de-noising methods in AISdb that can effectively remove unreasonable or error data points, ensuring more accurate and reliable vessel trajectories.
In practical scenarios, people may have specific points/areas of interest. DBQuery
includes parameters to define a bounding box and has relevant callbacks. Let's look at an example:
This will show all the vessel tracks with valid MMSI in the defined bounding box:
In the above examples, we queried data in a time range and a geographical area. If you want to combine multiple query criteria, please check out available types of callbacks in the API Docs. In the last example above, we can simply modify the callback type to obtain vessel tracks within both the time range and geographical area:
The displayed vessel tracks:
In addition to time and location range, you can track single and multiple vessel(s) of interest by specifying their MMSI in the query. Here is an example of tracking several vessels within a time range:
A common issue with AIS data is noise, where multiple vessels may broadcast using the same identifier simultaneously. AISdb incorporates data cleaning techniques to remove noise from vessel track data. For more details:
Denoising with Encoder: The aisdb.denoising_encoder.encode_greatcircledistance()
function checks the approximate distance between each vessel’s position. It separates vectors where a vessel couldn’t reasonably travel using the most direct path, such as speeds over 50 knots.
Distance and Speed Thresholds: Distance and speed thresholds limit the maximum distance or time between messages that can be considered continuous.
Scoring and Segment Concatenation: A score is computed for each position delta, with sequential messages nearby at shorter intervals given a higher score. This score is calculated by dividing the Haversine distance by elapsed time. Any deltas with a score not reaching the minimum threshold are considered the start of a new segment. New segments are compared to the end of existing segments with the same vessel identifier; if the score exceeds the minimum, they are concatenated. If multiple segments meet the minimum score, the new segment is concatenated to the existing segment with the highest score.
Processing functions may be executed in sequence as a processing chain or pipeline, so after segmenting the individual voyages, results can be input into the encoder to remove noise and correct for vessels with duplicate identifiers effectively.
After segmentation and encoding, the tracks are shown as:
For comparison, this is a shot of tracks before cleaning:
AISdb includes a function called aisdb.gis.delta_meters
that calculates the Haversine distance in meters between consecutive positions within a vessel track. This function is essential for analyzing vessel movement patterns and ensuring accurate distance calculations on the Earth's curved surface. It is also integrated into the denoising encoder, which compares distances against a threshold to aid in the data-cleaning process.
Here is an example of calculating the Haversine distance between each pair of consecutive points on a track:
If we visualize this track on the map, we can observe:
Track interpolation with AISdb involves generating estimated positions of vessels at specific intervals when actual AIS data points are unavailable. This process is important for filling in gaps in the vessel's trajectory, which can occur due to signal loss, data filtering, or other disruptions.
In this tutorial, we introduce different types of track interpolation implemented in AISdb with usage examples.
First, we defined functions to transform and visualize the track data (a generator object), with options to view the data points or the tracks:
We will use an actual track retrieved from the database for the examples in this tutorial and interpolate additional data points based on this track. The visualization will show the original track data points:
Linear interpolation estimates the vessel's position by drawing a straight line between two known points and calculating the positions at intermediate times. It is simple, fast, and straightforward but may not accurately represent complex movements.
This method estimates the position of a vessel at regular time intervals (e.g., every 10 minutes). To perform linear interpolation with an equal time window on the track defined above:
This method estimates the position of a vessel at regular spatial intervals (e.g., every 1 km along its path). To perform linear interpolation with equal distance intervals on the pseudo track defined above:
This method estimates the positions of a vessel along a curved path using the principles of geometry, particularly involving great-circle routes.
Given a set of data points, cubic spline interpolation fits a smooth curve through these points. The curve is represented as a series of cubic polynomials between each pair of data points. Each polynomial ensures a smooth curve at the data points (i.e., the first and second derivatives are continuous).
In addition to the standard interpolation methods provided by AISdb, users can implement other interpolation techniques tailored to their specific analytical needs. For instance, B-spline (Basis Spline) interpolation is a mathematical technique that creates a smooth curve through data points. This smoothness is important in trajectory analysis as it avoids sharp, unrealistic turns and maintains a natural flow.
Here is an implementation and example of using B-splines interpolation:
Then, we can apply the function just implemented on the vessel tracks generator:
The visualization of the interpolation shows as:
This tutorial introduces visualization options for vessel trajectories processed using AISdb, including AISdb's integrated web interface and alternative approaches with popular Python visualization packages. Practical examples were provided for each tool, illustrating how to process and visualize AISdb tracks effectively.
AISdb provides an integrated data visualization feature through the aisdb.web_interface.visualize
module, which allows users to generate interactive maps displaying vessel tracks. This built-in tool is designed for simplicity and ease of use, offering customizable visualizations directly from AIS data without requiring extensive setup.
Here is an example of using the web interface module to show queried data with colors. To display vessel tracks in a single color:
If you want to visualize vessel tracks in different colors based on MMSI, here's an example that demonstrates how to color-code tracks for easy identification:
Several alternative Python packages can be leveraged for users seeking more advanced or specialized visualization capabilities. For instance, Basemap
and Cartopy
are excellent for creating detailed 2D plots, while Plotly
offering powerful interactive graphs. Additionally, Kepler.gl
caters to users needing dynamic, large-scale visualizations or 3D mapping. These alternatives allow for a deeper exploration of AIS data, offering flexibility in how data is presented and analyzed beyond the default capabilities of AISdb.
How to deploy your own Automatic Identification System (AIS) receiver.
In addition to utilizing AIS data provided by Spire for the Canadian coasts, you can install AIS receiver hardware to capture AIS data directly. The received data can be processed and stored in databases, which can then be used with AISdb. This approach offers additional data sources and allows users to collect and process their data (as illustrated in the pipeline below). Doing so allows you to customize your data collection efforts to meet specific needs and seamlessly integrate the data with AISdb for enhanced analysis and application. At the same time, you can share the data you collect with others.
Raspberry Pi or other computers with internet working capability
162MHz receiver, such as the Wegmatt dAISy 2 Channel Receiver
An antenna in the VHF frequency band (30MHz - 300MHz) e.g. Shakespeare QC-4 VHF Antenna
Optionally, you may want
Antenna mount
A filtered preamp, such as this one sold by Uputronics, to improve signal range and quality
An additional option includes free AIS receivers from MarrineTraffic. This option may require you to share the data with the organization to help expand its AIS-receiving network.
When setting up your antenna, place it as high as possible and far away from obstructions and other equipment as is practical.
Connect the antenna to the receiver. If using a preamp filter, connect it between the antenna and the receiver.
Connect the receiver to your Linux device via a USB cable. If using a preamp filter, power it with a USB cable.
Validate the hardware configuration
When connected via USB, the AIS receiver is typically found under /dev/
with a name beginning with ttyACM
, for example /dev/ttyACM0
. Ensure the device is listed in this directory.
To test the receiver, use the command sudo cat /dev/ttyACM0
to display its output. If all works as intended, you will see streams of bytes appearing on the screen.
A visual example of the antenna hardware setup that MERIDIAN has available is as follows:
Connect the receiver to the Raspberry Pi via a USB port, and then run the configure_rpi.sh
script. This will install the Rust toolchain, AISdb dispatcher, and AISdb system service (described below), allowing the receiver to start at boot.
Install Raspberry Pi OS with SSH enabled: Visit https://www.raspberrypi.com/software/ to download and install the Raspberry Pi OS. If using the RPi imager, please ensure you run it as an administrator.
Connect the receiver: Attach the receiver to the Raspberry Pi using a USB cable. Then log in to the Raspberry Pi and update the system with the following command: sudo apt-get update
Install the Rust toolchain: Install the Rust toolchain on the Raspberry Pi using the following command: curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs | sh
Afterward, log out and log back in to add Rust and Cargo to the system path.
Install the network client and dispatcher: (a) From crates.io, using cargo install mproxy-client
(b) To install from the source, use the local path instead, e.g. cargo install --path ./dispatcher/client
Install systemd services: Set up new systemd services to run the AIS receiver and dispatcher. First, create a new text file ./ais_rcv.service
with contents in the block below, replace User=ais
and /home/ais
with the username and home directory chosen in step 1.
This service will broadcast receiver input downstream to aisdb.meridian.cs.dal.ca via UDP. You can add additional endpoints at this stage; for more information, see mproxy-client --help.
Additional AIS networking tools, such as mproxy-forward
, mproxy-server
, and mproxy-reverse
, are available in the ./dispatcher
source directory.
Next, link and enable the service on the Raspberry Pi to ensure the receiver starts at boot:
See more examples in docker-compose.yml
This section demonstrates integrating AIS data with external bathymetric data to enrich our analysis. In the following example, we identified all vessels within a 500-kilometer radius around the central area of Halifax, Canada, on January 1, 2018.
First, we imported the necessary packages and prepared the bathymetry data. It’s important to note that the downloaded bathymetric data is divided into eight segments, organized by latitude and longitude. In a later step, you will need to select the appropriate bathymetric raster file based on the geographical region covered by your vessel track data.
We defined a coloring criterion to classify tracks based on their average depths relative to the bathymetry. Tracks that traverse shallow waters with an average depth of less than 100 meters are colored in yellow. Those spanning depths between 100 and 1,000 meters are represented in orange, indicating a transition to deeper waters. As the depth increases, tracks reaching up to 20 kilometers are marked pink. The deepest tracks, descending beyond 20 kilometers, are distinctly colored in red.
Next, we query the AIS data to be integrated with the bathymetric raster file and apply the coloring function to mark the tracks based on their average depths relative to the bathymetry.
The integrated results are color-coded and can be visualized as shown below:
In addition to accessing data stored on the AISdb server, you can download open-source AIS data or import your datasets for processing and analysis using AISdb. This tutorial guides you through downloading AIS data from popular websites, creating SQLite and PostgreSQL databases compatible with AISdb, and establishing database connections. We provide two examples: , which demonstrates working with small data samples and creating an SQLite database, and , which outlines our approach to handling multiple data file downloads and creating a PostgreSQL database.
The U.S. vessel traffic data across user-defined geographies and periods are available at . This resource offers comprehensive AIS data that can be accessed for various maritime analysis purposes. We can tailor the dataset based on research needs by selecting specific regions and timeframes.
In the following example, we will show how to download and process a single data file and import the data to a newly created SQLite database.
First, download the AIS data of the day using the curl command:
Then, extract the downloaded ZIP file to a specific path:
We will look into the number of columns in the downloaded CSV file.
The required columns for AISdb have specific names and may differ from the imported dataset. Therefore, let's define the exact list of columns needed.
Next, we update the name of columns in the existing dataframe df_
and change the time format as required. The timestamp of an AIS message is represented by BaseDateTime
in the default format YYYY-MM-DDTHH:MM:SS
. For AISdb, however, the time is represented in UNIX format. We now read the CSV and apply the necessary changes to the date format:
In the code, we can see that we have mapped the column named accordingly. Additionally, the data type of some columns has also been changed. Additionally, the nm4 file usually contains raw messages, separating static messages from dynamic ones. However, the MarineCadastre Data does not have such a Message_ID to indicate the type. Thus, adding static messages is necessary for database creation so that a table related to metadata is created.
Let's process the CSV to create an SQLite database using the aisdb package.
A SQLite database has been created now.
To download and extract the data, simply run the two scripts in sequence:
After downloading and extracting the AIS data, the 2-merge.py
script consolidates the daily CSV files into monthly files while the 3-deduplicate.py
script removes duplicate rows, retaining unique AIS messages. To perform the execution, simply run:
The output of these two scripts will be cleaned CSV files, which will be stored in a new folder named /merged
on your working directory.
The final script, 4-postgresql-database.py
, creates a PostgreSQL database with a specified name. To do this, the script connects to a PostgreSQL server, requiring you to provide your username and password to establish the connection. After creating the database, the script verifies that the number of columns in the CSV files matches the headers. The script creates a corresponding table in the database for each CSV file and loads the data into it. To run this script, you need to provide three command-line arguments: -dbname
for the new database name, -user
for your PostgreSQL username, and -password
for your PostgreSQL password. Additionally, there are two optional arguments: -host
(default is localhost
) and -port
(default is 5432
), you can adjust the -host
and -port
values if your PostgreSQL server is running on a different host or port.
When the program prompts that the task is finished, you may check the created database and loaded tables by connecting to the PostgreSQL server and using the psql
command-line interface:
Once connected, you can list all tables in the database by running the \dt
command. In our example using 2023 AIS data (default download), the tables will appear as follows:
Extracting distance features from and to points-of-interest using raster files.
The distances of a vessel from the nearest shore, coast, and port are essential to perform particular tasks such as vessel behavior analysis, environmental monitoring, and maritime safety assessments. AISdb offers functions to acquire these distances for specific vessel positions. In this tutorial, we provide examples of calculating the distance in kilometers from shore and from the nearest port for a given point.
First, we create a sample track:
Here is what the sample track looks like:
Similar to acquiring the distance from shore, CoastDist
is implemented to obtain the distance between the given track positions and the coastline.
In AISdb, the speed of a vessel is calculated using the aisdb.gis.delta_knots
function, which computes the speed over ground (SOG) in knots between consecutive positions within a given track. This calculation is important for the , as it compares the vessel's speed against a set threshold to aid in the data cleaning process.
Vessel speed calculation requires the distance the vessel has traveled between two consecutive positions and the time interval. This distance is computed using the function, and the time interval is simply the difference in timestamps between the two consecutive AIS position reports. The speed is then computed using the formula:
The factor 1.9438445
converts the speed from meters per second to knots, the standard speed unit used in maritime contexts.
With the example track we created in , we can calculate the vessel speed between each two consecutive positions:
This tutorial demonstrates how to access vessel metadata using MMSI and SQLite databases. In many cases, AIS messages do not contain metadata. Therefore, this tutorial introduces the built-in functions in AISdb and external APIs to extract detailed vessel information associated with a specific MMSI from web sources.
We introduced two methods implemented in AISdb for scraping metadata: using session requests for direct access and employing web drivers with browsers to handle modern websites with dynamic content. Additionally, we provided an example of utilizing a third-party API to access vessel information.
The session request method in Python is a straightforward and efficient approach for retrieving metadata from websites. In AISdb, the aisdb.webdata._scraper.search_metadata_vesselfinder
function leverages this method to scrape detailed information about vessels based on their MMSI numbers. This function efficiently gathers a range of data, including vessel name, type, flag, tonnage, and navigation status.
This is an example of how to use the search_metadata_vesselfinder
feature in AISdb to scrape data from website:
In addition to metadata scraping, we may also use the available API the data provides. MarineTraffic offers an option to subscribe to its API to access vessel data, forecast voyages, position the vessels, etc. Here is an example of retrieving :
If you already have a database containing AIS track data, then vessel metadata can be downloaded and stored in a separate database.
Building on the previous section, where we used AIS data to create AISdb databases, users can export AIS data from these databases into CSV format. In this section, we provide examples of exporting data from SQLite or PostgreSQL databases into CSV files. While we demonstrate these operations using internal data, you can apply the same techniques to your databases.
In the first example, we connected to a SQLite database, queried data in a specific time range and area of interest, and then exported the queried data to a CSV file:
Now we can check the data in the exported CSV file:
Similar to exporting data from a SQLite database to a CSV file, the only difference this time is that you'll need to connect to your PostgreSQL database and query the data you want to export to CSV. We showed a full example as follows:
We can check the output CSV file now:
This section provides an example of downloading and processing multiple files, creating a PostgreSQL database, and loading data into tables. The steps are outlined in a series of pipeline scripts available in this , which should be executed in the order indicated by their numbers.
The first script, 0-download-ais.py
, allows you to download AIS data from by specifying your needed years. If no years are specified, the script will default to downloading data for 2023. The downloaded ZIP files will be stored in a /data
folder created in your current working directory. The second script, 1-zip2csv.py
, extracts the CSV files from the downloaded ZIP files in /data
and saves them in a new directory named /zip
.
The class is used to calculate the nearest distance to shore, along with a raster file containing shore distance data. Currently, calling the get_distance
function in ShoreDist
will automatically download the shore distance raster file from our server. The function then merges the tracks in the provided track list, creates a new key, "km_from_shore", and stores the shore distance as the value for this key.
Like the distances from the coast and shore, the class determines the distance between the track positions and the nearest ports.
Loading Data from Database
If you want vessel metadata
Some approaches remove pings near the shore. An example to calculate the distance is provided: Distance from Shore
Interpolating Tracks
Saving into CSV
QGIS is a cross-platform desktop geographic information system application that supports viewing, editing, printing, and analyzing geospatial data.
The CSV can be imported by Menu > Layers > Add Layer > Add delimiter text layer
The tracks can be generated by Points to Path
a function in QGIS tools using Track_ID as a grouping parameter.
Reading CSV and the data transformation depends on the type of task we want to perform on Tracks. Here, we provide an example of using the CSV in a sequence-to-sequence model to predict 3 next points as output while giving the model 10 AIS messages as input.
Trajectory Forecasting with Gate Recurrent Units AutoEncoders
By the end of this tutorial, you will understand the benefits of using teacher forcing to improve model accuracy, as well as other tweaks to enhance forecasting capabilities. We'll use AutoEncoders, neural networks that learn compressed data representations, to achieve this.
We will guide you through preparing AIS data for training an AutoEncoder, setting up layers, compiling the model, and defining the training process with teacher forcing.
Given the complexity of this task, we will revisit it to explore the benefits of teacher forcing, a technique that can improve sequence-to-sequence learning in neural networks.
This tutorial focuses on Trajectory Forecasting, which predicts an object's future path based on past positions. We will work with AIS messages, a type of temporal data that provides information about vessels' location, speed, and heading over time.
Automatic Identification System (AIS) messages broadcast essential ship information such as position, speed, and course. The temporal nature of these messages is pivotal for our tutorial, where we'll train an auto-encoder neural network for trajectory forecasting. This task involves predicting a ship's future path based on its past AIS messages, making it ideal for auto-encoders, which are optimized for learning patterns in sequential data.
For querying the entire database at once, use the following code:
For querying the database in batches of hours, use the following code:
Several functions were defined using AISdb, an AIS framework developed by MERIDIAN at Dalhousie University, to efficiently extract AIS messages from SQLite databases. AISdb is designed for effective data storage, retrieval, and preparation for AIS-related tasks. It provides comprehensive tools for interacting with AIS data, including APIs for data reading and writing, parsing AIS messages, and performing various data transformations.
Our next step is to create a coverage map of Atlantic Canada to visualize our dataset. We will include a 100km radius circle on the map to show the areas of the ocean where vessels can send AIS messages. Although overlapping circles may contain duplicate data from the same MMSI, we have already eliminated those from our dataset. However, messages might still appear incorrectly in inland areas.
Loading a shapefile to help us define whether a vessel is on land or in water during the trajectory:
Check if a given coordinate (latitude, longitude) is on land:
Check if any coordinate of a track is on land:
Filter out tracks with any point on land for a given MMSI:
Use a ThreadPoolExecutor to parallelize the processing of MMSIs:
Count the number of segments per MMSI after removing duplicates and inaccurate track segments:
In this analysis, we observe that most MMSIs in the dataset exhibit between 1 and 49 segments during the search period within AISdb. However, a minor fraction of vessels have significantly more segments, with some reaching up to 176. Efficient processing involves categorizing the data by MMSI instead of merely considering its volume. This method allows us to better evaluate the model's ability to discern various movement behaviors from both the same vessel and different ones.
To prevent our model from favoring shorter trajectories, we need a balanced mix of short-term and long-term voyages in the training and test sets. We'll categorize trajectories with 30 or more segments as long-term and those with fewer segments as short-term. Implement an 80-20 split strategy to ensure an equitable distribution of both types in the datasets.
Splitting the data respecting the voyage length distribution:
Visualizing the distribution of the dataset:
Understanding input and output timesteps and variables is crucial in trajectory forecasting tasks. Trajectory data comprises spatial coordinates and related features that depict an object's movement over time. The aim is to predict future positions of the object based on its historical data and associated features.
INPUT_TIMESTEPS: This parameter determines the consecutive observations used to predict future trajectories. Its selection impacts the model's ability to capture temporal dependencies and patterns. Too few time steps may prevent the model from capturing all movement dynamics, resulting in inaccurate predictions. Conversely, too many time steps can add noise and complexity, increasing the risk of overfitting.
INPUT_VARIABLES: Features describe each timestep in the input sequence for trajectory forecasting. These variables can include spatial coordinates, velocities, accelerations, object types, and relevant features that aid in predicting system dynamics. Choosing the right input variables is crucial; irrelevant or redundant ones may confuse the model while missing important variables can result in poor predictions.
OUTPUT_TIMESTEPS: This parameter sets the number of future time steps the model should predict, known as the prediction horizon. Choosing the right horizon size is critical. Predicting too few timesteps may not serve the application's needs while predicting too many can increase uncertainty and degrade performance. Select a value based on your application's specific requirements and data quality.
OUTPUT_VARIABLES: In trajectory forecasting, output variables include predicted spatial coordinates and sometimes other relevant features. Reducing the number of output variables can simplify prediction tasks and decrease model complexity. However, this approach might also lead to a less effective model.
Understanding the roles of input and output timesteps and variables is key to developing accurate trajectory forecasting models. By carefully selecting these elements, we can create models that effectively capture object movement dynamics, resulting in more accurate and meaningful predictions across various applications.
For this tutorial, we'll input 4 hours of data into the model to forecast the next 8 hours of vessel movement. Consequently, we'll filter out all voyages with less than 12 hours of AIS messages. By interpolating the messages every 5 minutes
, we require a minimum of 144 sequential messages (12 hours at 12 messages/hour).
With data provided by AISdb, we have AIS information, including Longitude, Latitude, Course Over Ground (COG), and Speed Over Ground (SOG), representing a ship's position and movement. Longitude and Latitude specify the ship's location, while COG and SOG indicate its heading and speed. By using all features for training
the neural network, our output will be the Longitude and Latitude pair. This methodology allows the model to predict the ship's future positions based on historical data.
In this tutorial, we'll include AIS data deltas as features, which were excluded in the previous tutorial. Incorporating deltas can help the model capture temporal changes and patterns, enhancing its effectiveness in sequence-to-sequence modeling. Deltas provides information on the rate of change in features, improving the model's accuracy, especially in predicting outcomes that depend on temporal dynamics.
To improve our model, we'll prioritize training samples based on trajectory straightness. We'll compute the geographical distance between a segment's start and end points using the Haversine
formula. Comparing this to the total distance of all consecutive points will give a straightness metric. Our model will focus on complex trajectories with multiple direction changes, leading to better generalization and more accurate predictions.
Trajectory straightness calculation using the Haversine:
To predict 96 data points (output) using the preceding 48 data points (input) in a trajectory time series, we create a sliding window. First, we select the initial 48 data points as the input sequence and the subsequent 96 as the output sequence. We then slide the window forward by one step and repeat the process. This continues until the end of the sequence, helping our model capture temporal dependencies and patterns in the data.
Our training strategy uses the sliding window technique, requiring unique weights for each sample. Sliding Windows (SW) transforms time series data into an appropriate format for machine learning. They generate overlapping windows with a fixed number of consecutive points by sliding the window one step at a time through the series.
In this project, the input data includes four features: Longitude, Latitude, COG (Course over Ground), and SOG (Speed over Ground), while the output data includes only Longitude and Latitude. To enhance the model's learning, we need to normalize the data through three main steps.
First, normalize Longitude, Latitude, COG, and SOG to the [0, 1] range using domain-specific parameters. This ensures the model performs well in Atlantic Canada waters by restricting the geographical scope of the AIS data and maintaining a similar scale for all features.
Second, the input and output data are standardized by subtracting the mean and dividing by the standard deviation. This centers the data around zero and scales it by its variance, preventing vanishing gradients during training.
Finally, another zero-one normalization is applied to scale the data to the [0, 1] range, aligning it with the expected range for many neural network activation functions.
Denormalizing Y output to the original scale of the data:
Denormalizing X output to the original scale of the data:
machine-learningWe have successfully prepared the data for our machine-learning task. With the data ready, it's time for the modeling phase. Next, we will create, train, and evaluate a machine-learning model to forecast vessel trajectories using the processed dataset. Let's explore how our model performs in Atlantic Canada!
A GRU Autoencoder is a neural network that compresses and reconstructs sequential data utilizing a Gated Recurrent Unit. GRUs are highly effective at handling time-series data, which are sequential data points captured over time, as they can model intricate temporal dependencies and patterns. To perform time-series forecasting, a GRU Autoencoder can be trained on a historical time-series dataset to discern patterns and trends, subsequently compressing a sequence of future data points into a lower-dimensional representation that can be decoded to generate a forecast of the upcoming data points. With this in mind, we will begin by constructing a model architecture composed of two GRU layers with 64 units each, taking input of shape (48, 4) and (96, 4), respectively, followed by a dense layer with 2 units.
The following function lists callbacks used during the model training process. Callbacks are utilities at specific points during training to monitor progress or take actions based on the model's performance. The function pre-define the parameters and behavior of these callbacks:
WandbMetricsLogger: This callback logs the training and validation metrics for visualization and monitoring on the Weights & Biases (W&B) platform. This can be useful for tracking the training progress but may introduce additional overhead due to the logging process. You can remove this callback if you don't need to use W&B or want to reduce the overhead.
TerminateOnNaN: This callback terminates training if the loss becomes NaN
(Not a Number) during the training process. It helps to stop the training process early when the model diverges and encounters an unstable state.
ReduceLROnPlateau: This callback reduces the learning rate by a specified factor when the monitored metric has stopped improving for several epochs. It helps fine-tune the model using a lower learning rate when it no longer improves significantly.
EarlyStopping: This callback stops the training process early when the monitored metric has not improved for a specified number of epochs. It restores the model's best weights when the training is terminated, preventing overfitting and reducing the training time.
ModelCheckpoint: This callback saves the best model (based on the monitored metric) to a file during training.
WandbMetricsLogger
is the most computationally costly among these callbacks due to the logging process. You can remove this callback if you don't need to use Weights & Biases for monitoring or want to reduce overhead. The other callbacks help optimize the training process and are less computationally demanding. It's important to note that the Weights & Biases (W&B) platform is also used in other parts of the code. If you decide to remove the WandbMetricsLogger
callback, please ensure that you also remove any other references to W&B in the code to avoid potential issues. If you choose to use W&B for monitoring and logging, you must register and log in to the W&B website. During the execution of the code, you'll be prompted for an authentication key to connect your script to your W&B account. This key can be obtained from your W&B account settings. Once you have the key, you can use it to enable W&B's monitoring and logging features provided by W&B.
In this step, we define a function called model_placeholder
that uses the Keras Tuner to create a model with tunable hyperparameters. The function takes a hyperparameter object as input, which defines the search space for the hyperparameters of interest. Specifically, we are searching for the best number of units in the encoder and decoder GRU layers and the optimal learning rate for the AdamW optimizer. The model_placeholder
function constructs a GRU-AutoEncoder model with these tunable hyperparameters and compiles the model using the Mean Absolute Error (MAE) as the loss function. Keras Tuner will use this model during the hyperparameter optimization process to find the best combination of hyperparameters that minimizes the validation loss at the expanse of long computing time.
Helper for saving the training history:
Helper for restoring the training history:
Defining the model to be optimized:
HyperOpt Objective Function:
Swiping the project folder for other pre-trained weights shared with this tutorial:
Deep learning models, although powerful, are often criticized for their lack of explainability, making it difficult to comprehend their decision-making process and raising concerns about trust and reliability. To address this issue, we can use techniques like the PFI method, a simple, model-agnostic approach that helps visualize the importance of features in deep learning models. This method works by shuffling individual feature values in the dataset and observing the impact on the model's performance. By measuring the change in a designated metric when each feature's values are randomly permuted, we can infer the importance of that specific feature. The idea is that if a feature is crucial for the model's performance, shuffling its values should lead to a significant shift in performance; otherwise if a feature has little impact, its value permutation should result in a minor change. Applying the permutation feature importance method to the best model, obtained after hyperparameter tuning, can give us a more transparent understanding of how the model makes its decisions.
Permutation feature importance has some limitations, such as assuming features are independent and producing biased results when features are highly correlated. It also doesn't provide detailed explanations for individual data points. An alternative is sensitivity analysis, which studies how input features affect model predictions. By perturbing each input feature individually and observing the prediction changes, we can understand which features significantly impact the model's output. This approach offers insights into the model's decision-making process and helps identify influential features. However, it does not account for feature interactions and can be computationally expensive for many features or perturbation steps.
UMAP is a nonlinear dimensionality reduction technique that visualizes high-dimensional data in a lower-dimensional space, preserving the local and global structure. In trajectory forecasting, UMAP can project high-dimensional model representations into 2D or 3D to clarify the relationships between input features and outputs. Unlike sensitivity analysis, which measures prediction changes due to input feature perturbations, UMAP reveals data structure without perturbations. It also differs from feature permutation, which evaluates feature importance by shuffling values and assessing model performance changes. UMAP focuses on visualizing intrinsic data structures and relationships.
GRUs can effectively forecast vessel trajectories but have notable downsides. A primary limitation is their struggle with long-term dependencies due to the vanishing gradient problem, causing the loss of relevant information from earlier time steps. This makes capturing long-term patterns in vessel trajectories challenging. Additionally, GRUs are computationally expensive with large datasets and long sequences, resulting in longer training times and higher memory use. While outperforming basic RNNs, they may not always surpass advanced architectures like LSTMs or Transformer models. Furthermore, the interpretability of GRU-based models is a challenge, which can hinder their adoption in safety-critical applications like vessel trajectory forecasting.