πŸ“”Working with Your AIS Data

This page provides a detailed tutorial about downloading the freely available AIS data, understanding the data, and creating a database using AISdb-Client.

1. Downloading Freely Available AIS Data From NOAA

The U.S. vessel traffic data across user-defined geographies and time periods are available at https://marinecadastre.gov/AIS/ β–Ά AIS Broadcast Points. We show an example of downloading the data of a particular day.

# downloading AIS data of a day using curl command

curl --remote-name https://coast.noaa.gov/htdata/CMSP/AISDataHandler/2020/AIS_2020_01_01.zip

2. Exploring CSV File

We will look into the number of columns the downloaded CSV file has.

import pandas as pd  # pandas v2.1.2

# reading CSV file in 
df_ = pd.read_csv("/home/AIS_2020_01_01.csv", parse_dates=["BaseDateTime"])

print(df_.columns)
Output
Index(['MMSI', 'BaseDateTime', 'LAT', 'LON', 'SOG', 'COG', 'Heading',
       'VesselName', 'IMO', 'CallSign', 'VesselType', 'Status', 'Length',
       'Width', 'Draft', 'Cargo', 'TransceiverClass'],
      dtype='object')

The name and number of columns that AISdb requires are different. So let's define the list of columns actually required.

list_of_headers_ = ["MMSI","Message_ID","Repeat_indicator","Time","Millisecond","Region","Country","Base_station","Online_data","Group_code","Sequence_ID","Channel","Data_length","Vessel_Name","Call_sign","IMO","Ship_Type","Dimension_to_Bow","Dimension_to_stern","Dimension_to_port","Dimension_to_starboard","Draught","Destination","AIS_version","Navigational_status","ROT","SOG","Accuracy","Longitude","Latitude","COG","Heading","Regional","Maneuver","RAIM_flag","Communication_flag","Communication_state","UTC_year","UTC_month","UTC_day","UTC_hour","UTC_minute","UTC_second","Fixing_device","Transmission_control","ETA_month","ETA_day","ETA_hour","ETA_minute","Sequence","Destination_ID","Retransmit_flag","Country_code","Functional_ID","Data","Destination_ID_1","Sequence_1","Destination_ID_2","Sequence_2","Destination_ID_3","Sequence_3","Destination_ID_4","Sequence_4","Altitude","Altitude_sensor","Data_terminal","Mode","Safety_text","Non-standard_bits","Name_extension","Name_extension_padding","Message_ID_1_1","Offset_1_1","Message_ID_1_2","Offset_1_2","Message_ID_2_1","Offset_2_1","Destination_ID_A","Offset_A","Increment_A","Destination_ID_B","offsetB","incrementB","data_msg_type","station_ID","Z_count","num_data_words","health","unit_flag","display","DSC","band","msg22","offset1","num_slots1","timeout1","Increment_1","Offset_2","Number_slots_2","Timeout_2","Increment_2","Offset_3","Number_slots_3","Timeout_3","Increment_3","Offset_4","Number_slots_4","Timeout_4","Increment_4","ATON_type","ATON_name","off_position","ATON_status","Virtual_ATON","Channel_A","Channel_B","Tx_Rx_mode","Power","Message_indicator","Channel_A_bandwidth","Channel_B_bandwidth","Transzone_size","Longitude_1","Latitude_1","Longitude_2","Latitude_2","Station_Type","Report_Interval","Quiet_Time","Part_Number","Vendor_ID","Mother_ship_MMSI","Destination_indicator","Binary_flag","GNSS_status","spare","spare2","spare3","spare4"]
// data type constraint for the CSV file

MMSI: int
Ship_Type: int
IMO: int
Time:  20210701_185151
Draft: int

Now, we will update the name of columns in the existing dataframe df_ and change the time format as required. The timestamp of an AIS message is indicated by BaseDateTime and its default format is YYYY-MM-DDTHH:MM:SS Whereas for the AISdb, the column name should be Time and its format should be accepted Reading CSV and modifying the date format

df_ = df_.iloc[0:40000] # we are taking some records 

df_new = pd.DataFrame(columns=list_of_headers_)

df_new['Time'] = pd.to_datetime(df_['BaseDateTime']).dt.strftime('%Y%m%d_%H%M%S')
df_new['Latitude'] = df_['LAT']
df_new['Longitude'] = df_['LON']
df_new['Vessel_Name'] = df_['VesselName']
df_new['Call_sign'] = df_['CallSign']
df_new['Ship_Type'] = df_['VesselType'].fillna(0).astype(int)
df_new['Navigational_status'] = df_['Status']
df_new['Draught'] = df_['Draft']
df_new['Message_ID'] = 1 # considering all messages as dynamic
df_new['Millisecond'] = 0
# Transferring data columns
for col_n in df_new:
    if col_n in df_.columns:
        df_new[col_n] = df_[col_n]
        
# Perhaps adding static messages
 filtered_df = df_new[ df_new['Ship_Type'].notnull() & (df_new['Ship_Type'] != 0) ]
filtered_df = filtered_df.drop_duplicates(subset='MMSI', keep='first')
filtered_df = filtered_df.reset_index(drop=True)
filtered_df['Message_ID'] = 5

# merging dynamic and static messages into single dataframe
df_new = pd.concat([filtered_df, df_new])

# saving dataframe into file, the parameter `quoting` is necessary as the csvreader.rs read by default string value for each column value
df_new.to_csv("/home/AIS_2020_01_01_aisdb.csv", index=False, quoting=1)

In the code we can see have mapped the column named accordingly. Additionally, the data type of some columns is also changed. Additionally, in the nm4 file usually contains raw messages, where the static messages are separated from dynamic messages. However, the MarineCadastre Data does not have such Message_ID to indicate the type. Thus, adding static messages is necessary for database creation so that a table related to metadata is created.

3. Creating AISdb database

Let's now process the CSV to create an SQLite database using the aisdb package.

import aisdb

with aisdb.SQLiteDBConn('/home/test_decode_msgs.db') as dbconn:
        aisdb.decode_msgs(filepaths=["/home/AIS_2020_01_01_aisdb.csv"],
                          dbconn=dbconn,
                          source='Testing',
                          verbose=True)
Output
generating file checksums...
checking file dates...
creating tables and dropping table indexes...
Memory: 8.39GB remaining.  CPUs: 8.  Average file size: 3114.38MB  Spawning 1 workers
processing /home/AIS_2020_01_01_aisdb.csv
SQLite3 version: 3.41.2
AIS_2020_01_01_aisdb.csv                                         count: 7040389    elapsed:  398.96s    rate:    17647 msgs/s
saving checksums...
error processing /home/AIS_2020_01_01_aisdb.csv, skipping checksum...
cleaning temporary data...

A SQLite database has been created now. However, there is an error in the code that needs to be verified why it is showing error processing

4. Exploring Data from AISHub

We have sample data from AISHub in CSV format.

import pandas as pd  # pandas v2.1.2

# reading CSV file in 
df_ = pd.read_csv("/home/AISHub_Sample.csv", parse_dates=["TSTAMP"])

print(df_.columns)
Output
Index(['MMSI', 'TSTAMP', 'LATITUDE', 'LONGITUDE', 'COG', 'SOG', 'HEADING',
       'NAVSTAT', 'IMO', 'NAME', 'CALLSIGN', 'TYPE', 'A', 'B', 'C', 'D',
       'DRAUGHT', 'DEST', 'ETA'],
      dtype='object')

Let's process this dataframe to match the columns required by AISdb for reading CSV.

df_new = pd.DataFrame(columns=list_of_headers_) # we already defined the variable list_of_headers_

df_new['Time'] = pd.to_datetime(df_['TSTAMP']).dt.strftime('%Y%m%d_%H%M%S')
df_new['Latitude'] = df_['LATITUDE']
df_new['Longitude'] = df_['LONGITUDE']
df_new['Heading'] = df_['HEADING']
df_new['Navigational_status'] = df_['NAVSTAT']
df_new['Vessel_Name'] = df_['NAME']
df_new['Call_sign'] = df_['CALLSIGN']
df_new['Ship_Type'] = df_['TYPE']
df_new['Dimension_to_Bow'] = df_['A']
df_new['Dimension_to_stern'] = df_['B']
df_new['Dimension_to_port'] = df_['C']
df_new['Dimension_to_starboard'] = df_['D']
df_new['Draught'] = df_['DRAUGHT']
df_new['Destination'] = df_['DEST']
# same name columns
df_new['MMSI'] = df_['MMSI']
df_new['IMO'] = df_['IMO']
df_new['COG'] = df_['COG']
df_new['SOG'] = df_['SOG']
df_new['Message_ID'] = 1

filtered_df = df_new[ df_new['Ship_Type'].notnull() & (df_new['Ship_Type'] != 0) ]
filtered_df = filtered_df.drop_duplicates(subset='MMSI', keep='first')
filtered_df = filtered_df.reset_index(drop=True)
filtered_df['Message_ID'] = 5
df_new = pd.concat([filtered_df, df_new])

# saving dataframe into file, the parameter `quoting` is necessary as the csvreader.rs read by default string value for each column value
df_new.to_csv("/home/AIS_2020_01_01_aisdb.csv", index=False, quoting=1)

Now, this new csv can be used in #3.-creating-aisdb-database

Last updated