In addition to accessing data stored on the AISdb server, you can download open-source AIS data or import your datasets for processing and analysis using AISdb. This tutorial guides you through downloading AIS data from popular websites, creating SQLite and PostgreSQL databases compatible with AISdb, and establishing database connections. We provide two examples: Downloading and Processing Individual Files, which demonstrates working with small data samples and creating an SQLite database, and Pipeline for Bulk File Downloads and Database Integration, which outlines our approach to handling multiple data file downloads and creating a PostgreSQL database.
Data Source
The U.S. vessel traffic data across user-defined geographies and periods are available at MarineCadastre. This resource offers comprehensive AIS data that can be accessed for various maritime analysis purposes. We can tailor the dataset based on research needs by selecting specific regions and timeframes.
Downloading and Processing Individual Files
In the following example, we will show how to download and process a single data file and import the data to a newly created SQLite database.
First, download the AIS data of the day using the curl command:
Next, we update the name of columns in the existing dataframe df_ and change the time format as required. The timestamp of an AIS message is represented by BaseDateTime in the default format YYYY-MM-DDTHH:MM:SS. For AISdb, however, the time is represented in UNIX format. We now read the CSV and apply the necessary changes to the date format:
# Take the first 40,000 records from the original dataframedf = df_.iloc[0:40000]# Create a new dataframe with the specified headersdf_new = pd.DataFrame(columns=list_of_headers_)# Populate the new dataframe with formatted data from the original dataframedf_new['Time']= pd.to_datetime(df['BaseDateTime']).dt.strftime('%Y%m%d_%H%M%S')df_new['Latitude']= df['LAT']df_new['Longitude']= df['LON']df_new['Vessel_Name']= df['VesselName']df_new['Call_sign']= df['CallSign']df_new['Ship_Type']= df['VesselType'].fillna(0).astype(int)df_new['Navigational_status']= df['Status']df_new['Draught']= df['Draft']df_new['Message_ID']=1# Mark all messages as dynamic by defaultdf_new['Millisecond']=0# Transfer additional columns from the original dataframe, if they existfor col_n in df_new:if col_n in df.columns: df_new[col_n]= df[col_n]# Extract static messages for each unique vesselfiltered_df = df_new[df_new['Ship_Type'].notnull()& (df_new['Ship_Type']!=0)]filtered_df = filtered_df.drop_duplicates(subset='MMSI', keep='first')filtered_df = filtered_df.reset_index(drop=True)filtered_df['Message_ID']=5# Mark these as static messages# Merge dynamic and static messages into a single dataframedf_new = pd.concat([filtered_df, df_new])# Save the final dataframe to a CSV file# The quoting parameter is necessary because the csvreader reads each column value as a string by defaultdf_new.to_csv("./data/AIS_2020_01_01_aisdb.csv", index=False, quoting=1)
In the code, we can see that we have mapped the column named accordingly. Additionally, the data type of some columns has also been changed. Additionally, the nm4 file usually contains raw messages, separating static messages from dynamic ones. However, the MarineCadastre Data does not have such a Message_ID to indicate the type. Thus, adding static messages is necessary for database creation so that a table related to metadata is created.
Let's process the CSV to create an SQLite database using the aisdb package.
import aisdb# Establish a connection to the SQLite database and decode messages from the CSV filewith aisdb.SQLiteDBConn('./data/test_decode_msgs.db')as dbconn: aisdb.decode_msgs(filepaths=["./data/AIS_2020_01_01_aisdb.csv"], dbconn=dbconn, source='Testing', verbose=True)
Pipeline for Bulk File Downloads and Database Integration
This section provides an example of downloading and processing multiple files, creating a PostgreSQL database, and loading data into tables. The steps are outlined in a series of pipeline scripts available in this GitHub repository, which should be executed in the order indicated by their numbers.
AIS Data Download and Extraction
The first script, 0-download-ais.py, allows you to download AIS data from MarineCadastre by specifying your needed years. If no years are specified, the script will default to downloading data for 2023. The downloaded ZIP files will be stored in a /data folder created in your current working directory. The second script, 1-zip2csv.py, extracts the CSV files from the downloaded ZIP files in /data and saves them in a new directory named /zip.
To download and extract the data, simply run the two scripts in sequence:
python0-download-ais.pypython1-zip2csv.py
Preprocessing - Merge and Deduplication
After downloading and extracting the AIS data, the 2-merge.py script consolidates the daily CSV files into monthly files while the 3-deduplicate.py script removes duplicate rows, retaining unique AIS messages. To perform the execution, simply run:
python2-merge.pypython3-deduplicate.py
The output of these two scripts will be cleaned CSV files, which will be stored in a new folder named /merged on your working directory.
PostgreSQL Database Creation and Data Loading to Tables
The final script, 4-postgresql-database.py, creates a PostgreSQL database with a specified name. To do this, the script connects to a PostgreSQL server, requiring you to provide your username and password to establish the connection. After creating the database, the script verifies that the number of columns in the CSV files matches the headers. The script creates a corresponding table in the database for each CSV file and loads the data into it. To run this script, you need to provide three command-line arguments: -dbname for the new database name, -user for your PostgreSQL username, and -password for your PostgreSQL password. Additionally, there are two optional arguments: -host (default is localhost) and -port (default is 5432), you can adjust the -host and -port values if your PostgreSQL server is running on a different host or port.
When the program prompts that the task is finished, you may check the created database and loaded tables by connecting to the PostgreSQL server and using the psql command-line interface:
psql-UUSERNAME-dDBNAME-hlocalhost-p5432
Once connected, you can list all tables in the database by running the \dt command. In our example using 2023 AIS data (default download), the tables will appear as follows:
ais_pgdb=# \dt
List of relations
Schema | Name | Type | Owner
--------+-------------+-------+----------
public | ais_2023_01 | table | postgres
public | ais_2023_02 | table | postgres
public | ais_2023_03 | table | postgres
public | ais_2023_04 | table | postgres
public | ais_2023_05 | table | postgres
public | ais_2023_06 | table | postgres
public | ais_2023_07 | table | postgres
public | ais_2023_08 | table | postgres
public | ais_2023_09 | table | postgres
public | ais_2023_10 | table | postgres
public | ais_2023_11 | table | postgres
public | ais_2023_12 | table | postgres
(12 rows)