This tutorial will guide you in using the AISdb package to load AIS data into a database and perform queries. We will begin with AISdb installation and environment setup, then proceed to examples of querying the loaded data and creating simple visualizations.
Install Requirements
Preparing a Python virtual environment for AISdb is a safe practice. It allows you to manage dependencies and prevent conflicts with other projects, ensuring a clean and isolated setup for your work with AISdb. Run these commands in your terminal based on the operating system you are using:
Linux
python -m venv AISdb # create a python virtual environment
source ./AISdb/bin/activate # activate the virtual environment
pip install aisdb # from https://pypi.org/project/aisdb/
Windows
python -m venv AISdb # create a virtual environment
./AISdb/Scripts/activate # activate the virtual environment
pip install aisdb # install the AISdb package using pip
Now you can check your installation by running:
$ python
>>> import aisdb
>>> aisdb.__version__ # should return '1.7.0' or newer
If you're using AISdb in Jupyter Notebook, please include the following commands in your notebook cells:
# install nest-asyncio for enabling asyncio.run() in Jupyter Notebook
%pip install nest-asyncio
# Some of the systems may show the following error when running the user interface:
# urllib3 v2.0 only supports OpenSSL 1.1.1+; currently, the 'SSL' module is compiled with 'LibreSSL 2.8.3'.
# install urllib3 v1.26.6 to avoid this error
%pip install urllib3==1.26.6
Then, import the required packages:
from datetime import datetime, timedelta
import os
import aisdb
import nest_asyncio
nest_asyncio.apply()
Load AIS data into a database
This section will show you how to efficiently load AIS data into a database.
AISdb includes two database connection approaches:
SQLite database connection; and,
PostgreSQL database connection.
SQLite database connection
We are working with the SQLite database in most of the usage scenarios. Here is an example of loading data using the sample data included in the AISdb package:
# List the test data files included in the package
print(os.listdir(os.path.join(aisdb.sqlpath, '..', 'tests', 'testdata')))
# You will see the print result:
# ['test_data_20210701.csv', 'test_data_20211101.nm4', 'test_data_20211101.nm4.gz']
# Set the path for the SQLite database file to be used
test_database
# Use test_data_20210701.csv as the test data
filepaths = [os.path.join(aisdb.sqlpath, '..', 'tests', 'testdata', 'test_data_20210701.csv')]
with aisdb.DBConn(dbpath = dbpath) as dbconn:
aisdb.decode_msgs(filepaths=filepaths, dbconn=dbconn, source='TESTING')
The code above decodes the AIS messages from the CSV file specified in filepaths and inserts them into the SQLite database connected via dbconn.
Following is a quick example of a query and visualization of the data we just loaded with AISdb:
In addition to SQLite database connection, PostgreSQL is used in AISdb for its superior concurrency handling and data-sharing capabilities, making it suitable for collaborative environments and handling larger datasets efficiently. The structure and interactions with PostgreSQL are designed to provide robust and scalable solutions for AIS data storage and querying. For PostgreSQL, you need the psycopg2 library:
pip install psycopg2
To connect to a PostgreSQL database, AISdb uses the PostgresDBConn class:
from aisdb.database.dbconn import PostgresDBConn
# Option 1: Using keyword arguments
dbconn = PostgresDBConn(
hostaddr='127.0.0.1', # Replace with the PostgreSQL address
port=5432, # Replace with the PostgreSQL running port
user='USERNAME', # Replace with the PostgreSQL username
password='PASSWORD', # Replace with your password
dbname='aisviz' # Replace with your database name
)
# Option 2: Using a connection string
dbconn = PostgresDBConn('postgresql://USERNAME:PASSWORD@HOST:PORT/DATABASE')
After establishing a connection to the PostgreSQL database, specifying the path of the data files, and using the aisdb.decode_msgs function for data processing, the following operations will be performed in order: data files processing, table creation, data insertion, and index rebuilding.
Please pay close attention to the flags in aisdb.decode_msgs, as recent updates provide more flexibility for database configurations. These updates include support for ingesting NOAA data into the aisdb format and the option to structure tables using either the original B-Tree indexes or TimescaleDBβs structure when the extension is enabled. In particular, please take care of the following parameters:
source(str, optional)
Specifies the data source to be processed and loaded into the database.
Options: "Spire", "NOAA"/"noaa", or leave empty.
Default: empty but will progress with Spire source.
raw_insertion(bool, optional)
If False, the function will drop and rebuild indexes to speed up data loading.
Default: True.
timescaledb(bool, optional)
Set to Trueonly if using the TimescaleDB extension in your PostgreSQL database.
Example: Processing a Full Year of Spire Data (2024)
The following example demonstrates how to process and load Spire data for the entire year 2024 into an aisdb database with the TimescaleDB extension installed:
start_year = 2024
end_year = 2024
start_month = 1
end_month = 12
overall_start_time = time.time()
for year in range(start_year, end_year + 1):
for month in range(start_month, end_month + 1):
print(f'Loading {year}{month:02d}')
month_start_time = time.time()
filepaths = aisdb.glob_files(f'/slow-array/Spire/{year}{month:02d}/','.zip')
filepaths = sorted([f for f in filepaths if f'{year}{month:02d}' in f])
print(f'Number of files: {len(filepaths)}')
with aisdb.PostgresDBConn(libpq_connstring=psql_conn_string) as dbconn:
try:
aisdb.decode_msgs(filepaths,
dbconn=dbconn,
source='Spire',
verbose=True,
skip_checksum=True,
raw_insertion=True,
workers=6,
timescaledb=True,
)
except Exception as e:
print(f'Error loading {year}{month:02d}: {e}')
continue
Example of performing queries and visualizations with PostgreSQL database:
from aisdb.gis import DomainFromPoints
from aisdb.database.dbqry import DBQuery
from datetime import datetime
# Define a spatial domain centered around the point (-63.6, 44.6) with a radial distance of 50000 meters.
domain = DomainFromPoints(points=[(-63.6, 44.6)], radial_distances=[50000])
# Create a query object to fetch AIS data within the specified time range and spatial domain.
qry = DBQuery(
dbconn=dbconn,
start=datetime(2023, 1, 1), end=datetime(2023, 2, 1),
xmin=domain.boundary['xmin'], xmax=domain.boundary['xmax'],
ymin=domain.boundary['ymin'], ymax=domain.boundary['ymax'],
callback=aisdb.database.sqlfcn_callbacks.in_time_bbox_validmmsi
)
# Generate rows from the query
rowgen = qry.gen_qry()
# Convert the generated rows into tracks
tracks = aisdb.track_gen.TrackGen(rowgen, decimate=False)
# Visualize the tracks on a map
aisdb.web_interface.visualize(
tracks, # The tracks (trajectories) to visualize.
domain=domain, # The spatial domain to use for the visualization.
visualearth=True, # If True, use Visual Earth for the map background.
open_browser=True # If True, automatically open the visualization in a web browser.
)
Moreover, if you wish to use your own AIS data to create and process a database with AISdb, please check out our instructional guide on data processing and database creation: Using Your AIS Data.