This page provides a detailed tutorial about downloading the freely available AIS data, understanding the data, and creating a database using AISdb-Client.
1. Downloading Freely Available AIS Data From NOAA
The U.S. vessel traffic data across user-defined geographies and time periods are available at https://marinecadastre.gov/AIS/ βΆ AIS Broadcast Points.
We show an example of downloading the data of a particular day.
# downloading AIS data of a day using curl commandcurl--remote-namehttps://coast.noaa.gov/htdata/CMSP/AISDataHandler/2020/AIS_2020_01_01.zip
2. Exploring CSV File
We will look into the number of columns the downloaded CSV file has.
import pandas as pd # pandas v2.1.2# reading CSV file in df_ = pd.read_csv("/home/AIS_2020_01_01.csv", parse_dates=["BaseDateTime"])print(df_.columns)
// data type constraint for the CSV fileMMSI:intShip_Type:intIMO:intTime:20210701_185151Draft:int
Now, we will update the name of columns in the existing dataframe df_ and change the time format as required. The timestamp of an AIS message is indicated by BaseDateTime and its default format is YYYY-MM-DDTHH:MM:SS Whereas for the AISdb, the column name should be Time and its format should be accepted Reading CSV and modifying the date format
df_ = df_.iloc[0:40000]# we are taking some records df_new = pd.DataFrame(columns=list_of_headers_)df_new['Time']= pd.to_datetime(df_['BaseDateTime']).dt.strftime('%Y%m%d_%H%M%S')df_new['Latitude']= df_['LAT']df_new['Longitude']= df_['LON']df_new['Vessel_Name']= df_['VesselName']df_new['Call_sign']= df_['CallSign']df_new['Ship_Type']= df_['VesselType'].fillna(0).astype(int)df_new['Navigational_status']= df_['Status']df_new['Draught']= df_['Draft']df_new['Message_ID']=1# considering all messages as dynamicdf_new['Millisecond']=0# Transferring data columnsfor col_n in df_new:if col_n in df_.columns: df_new[col_n]= df_[col_n]# Perhaps adding static messages filtered_df = df_new[ df_new['Ship_Type'].notnull()& (df_new['Ship_Type']!=0) ]filtered_df = filtered_df.drop_duplicates(subset='MMSI', keep='first')filtered_df = filtered_df.reset_index(drop=True)filtered_df['Message_ID']=5# merging dynamic and static messages into single dataframedf_new = pd.concat([filtered_df, df_new])# saving dataframe into file, the parameter `quoting` is necessary as the csvreader.rs read by default string value for each column value
df_new.to_csv("/home/AIS_2020_01_01_aisdb.csv", index=False, quoting=1)
In the code we can see have mapped the column named accordingly. Additionally, the data type of some columns is also changed. Additionally, in the nm4 file usually contains raw messages, where the static messages are separated from dynamic messages. However, the MarineCadastre Data does not have such Message_ID to indicate the type. Thus, adding static messages is necessary for database creation so that a table related to metadata is created.
3. Creating AISdb database
Let's now process the CSV to create an SQLite database using the aisdb package.
Let's process this dataframe to match the columns required by AISdb for reading CSV.
df_new = pd.DataFrame(columns=list_of_headers_)# we already defined the variable list_of_headers_df_new['Time']= pd.to_datetime(df_['TSTAMP']).dt.strftime('%Y%m%d_%H%M%S')df_new['Latitude']= df_['LATITUDE']df_new['Longitude']= df_['LONGITUDE']df_new['Heading']= df_['HEADING']df_new['Navigational_status']= df_['NAVSTAT']df_new['Vessel_Name']= df_['NAME']df_new['Call_sign']= df_['CALLSIGN']df_new['Ship_Type']= df_['TYPE']df_new['Dimension_to_Bow']= df_['A']df_new['Dimension_to_stern']= df_['B']df_new['Dimension_to_port']= df_['C']df_new['Dimension_to_starboard']= df_['D']df_new['Draught']= df_['DRAUGHT']df_new['Destination']= df_['DEST']# same name columnsdf_new['MMSI']= df_['MMSI']df_new['IMO']= df_['IMO']df_new['COG']= df_['COG']df_new['SOG']= df_['SOG']df_new['Message_ID']=1filtered_df = df_new[ df_new['Ship_Type'].notnull()& (df_new['Ship_Type']!=0) ]filtered_df = filtered_df.drop_duplicates(subset='MMSI', keep='first')filtered_df = filtered_df.reset_index(drop=True)filtered_df['Message_ID']=5df_new = pd.concat([filtered_df, df_new])# saving dataframe into file, the parameter `quoting` is necessary as the csvreader.rs read by default string value for each column value
df_new.to_csv("/home/AIS_2020_01_01_aisdb.csv", index=False, quoting=1)