To use the AISdb Python package, you must have Python version 3.8 or above. If you want to use SQLite, you don't need to install anything extra because it's already included in Python. However, if you prefer to use the PostgreSQL server, you'll need to install it separately. You can easily install the AISdb Python package using pip. It's highly recommended that a virtual Python environment be created and the package installed within it.
Linux
python-mvenvAISdb# create a python virtual environmentsource./AISdb/bin/activate# activate the virtual environmentpipinstallaisdb# from https://pypi.org/project/aisdb/
Alternatively, you may also use AISdb on Docker. Regardless of the installation procedure you decide to use, you can test your installation by running the following commands:
$ python>>>import aisdb>>> aisdb.__version__# should return '1.7.0' or newer
Notice that if you are running Jupyter, ensure it is installed in the same environment as AISdb:
The Python code in the rest of this document can be run in the Python environment you created.
Database Handling
Connecting to a Postgres database
This option requires an optional dependency psycopg for interfacing with Postgres databases. Beware that Postgres accepts these keyword arguments. Alternatively, a connection string may be used. Information on connection strings and Postgres URI format can be found here.
from aisdb.database.dbconn import PostgresDBConn# [OPTION 1]dbconn =PostgresDBConn( hostaddr='127.0.0.1', # Replace this with the Postgres address (supports IPv6) port=5432, # Replace this with the Postgres running port (if not the default) user='postgres', # Replace this with the Postgres username password='YOUR-PASSWORD', # Replace this with your password dbname='postgres', # Replace this with your database name)# [OPTION 2]dbconn =PostgresDBConn('postgresql://USERNAME:PASSWORD@HOST:PORT/DATABASE')
Attaching a SQLite database
Querying SQLite is as easy as informing the name of a ".db" file with the same entity-relationship as the databases supported by AIS, which are detailed in the SQL Database section.
# Generate tracks using the database querydbconn = aisdb.DBQuery( dbconn=aisdb.DBConn('/home/test_database.db'), # new connection start="15012020", end="31012020", # time-range of interest xmin=-68, ymin=45, xmax=-56, ymax=51.5# Gulf of St. Lawrence callback=aisdb.database.sql_query_strings.in_bbox, # callbacks for the data)
A specific region can be queried for AIS data using aisdb.gis.Domain or one of its sub-classes to define a collection of shapely polygon features. For this example, the domain contains a single bounding box polygon derived from a longitude/latitude coordinate pair and radial distance specified in meters. If multiple features are included in the domain object, the domain boundaries will encompass the convex hull of all features.
withDBConn('/home/test_database.db')as dbconn:# Define a bounding box with a central point and radius domain = aisdb.DomainFromPoints(points=[(-63.6, 44.6),], radial_distances=[5000,])# Make query within the specified area qry = aisdb.DBQuery( dbconn=dbconn, dbpath='AIS.sqlitedb', callback=aisdb.database.sqlfcn_callbacks.in_bbox_time_validmmsi, start=datetime.utcnow() -timedelta(hours=48), end=datetime.utcnow(), xmin=domain.boundary['xmin'], xmax=domain.boundary['xmax'], ymin=domain.boundary['ymin'], ymax=domain.boundary['ymax'], )# Output the queried informationfor vessel in aisdb.TrackGen(qry.gen_qry(), decimate=False):print(vessel)
The above generator can be input into a processing function, yielding modified results. For example, to model the activity of vessels on a per-voyage or per-transit basis, each voyage is defined as a continuous vector of vessel positions where the time between observed timestamps never exceeds a 24-hour period.
import aisdbfrom datetime import datetime, timedelta# Define a maximum time interval of 24 hoursmaxdelta =timedelta(hours=24)# Create a query object to fetch data within a specific time range (last 48 hours)with aisdb.DBConn('/home/test_database.db')as dbconn: qry = aisdb.DBQuery( dbconn=dbconn, dbpath='AIS.sqlitedb', callback=aisdb.database.sql_query_strings.in_timerange, start=datetime.utcnow() -timedelta(hours=48), end=datetime.utcnow(), )# Generate tracks based on the query results tracks = aisdb.TrackGen(qry.gen_qry(), decimate=False)# Split the generated tracks into segments, each no longer than 24 hours track_segments = aisdb.split_timedelta(tracks, maxdelta)for segment in track_segments:print(segment)
Data cleaning and MMSI deduplication
A common problem with AIS data is noise, where multiple vessels might broadcast using the same identifier simultaneously. AISdb integrates data cleaning techniques to denoise the vessel track data; for details:
(1) Denoising with Encoder: The aisdb.denoising_encoder.encode_greatcircledistance() function checks the approximate distance between each vesselβs position. It separates vectors where a vessel couldnβt reasonably travel using the most direct path, such as speeds over 50 knots.
(2) Distance and Speed Thresholds: A distance and speed threshold limits the maximum distance or time between messages that can be considered continuous.
(3) Scoring and Segment Concatenation: A score is computed for each position delta, with sequential messages nearby at shorter intervals given a higher score. This score is calculated by dividing the Haversine distance by elapsed time. Any deltas with a score not reaching the minimum threshold are considered the start of a new segment. New segments are compared to the end of existing segments with the same vessel identifier; if the score exceeds the minimum, they are concatenated. If multiple segments meet the minimum score, the new segment is concatenated to the existing segment with the highest score.
Processing functions may be executed in sequence as a processing chain or pipeline, so after segmenting the individual voyages as shown above, results can be input into the encoder to remove noise and correct for vessels with duplicate identifiers effectively.
import aisdbfrom datetime import datetime, timedeltamaxdelta =timedelta(hours=24)# the maximum time intervaldistance_threshold =200000# the maximum allowed distance (meters) between consecutive AIS messagesspeed_threshold =50# the maximum allowed vessel speed in consecutive AIS messagesminscore =1e-6# the minimum score threshold for track segment validationwith aisdb.DBConn('/home/test_database.db')as dbconn: qry = aisdb.DBQuery( dbconn=dbconn, dbpath='AIS.sqlitedb', callback=aisdb.database.sql_query_strings.in_timerange, start=datetime.utcnow() -timedelta(hours=48), end=datetime.utcnow(), )# Generate tracks from the database query results tracks = aisdb.TrackGen(qry.gen_qry())# Split the generated tracks into segments based on the maximum time interval (maxdelta) track_segments = aisdb.split_timedelta(tracks, maxdelta)# Encode the track segments to clean and validate the track data tracks_encoded = aisdb.encode_greatcircledistance(track_segments, distance_threshold=distance_threshold, speed_threshold=speed_threshold, minscore=minscore)
In this second example, artificial noise is introduced into the tracks as a hyperbolic demonstration of the denoising capability. The resulting cleaned tracks are then displayed in the web interface.
import osimport aisdbfrom datetime import datetimefrom dotenv import load_dotenvfrom aisdb import DBQuery, DBConnfrom aisdb.gis import DomainFromTxtsload_dotenv()dbpath = os.environ.get('EXAMPLE_NOISE_DB', 'AIS.sqlitedb')trafficDBpath = os.environ.get('AISDBMARINETRAFFIC', 'marinetraffic.db')domain =DomainFromTxts('EastCoast', folder=os.environ.get('AISDBZONES'))start =datetime(2021, 7, 1)end =datetime(2021, 7, 2)# Define the default geographical boundary for processingdefault_boundary ={'xmin':-180,'xmax':180,'ymin':-90,'ymax':90}# Function to add random noise to track data within specified boundariesdefrandom_noise(tracks,boundary=default_boundary):for track in tracks: i =1while i <len(track['time']): track['lon'][i] *= track['mmsi'] track['lon'][i] %= (boundary['xmax']- boundary['xmin']) track['lon'][i] += boundary['xmin'] track['lat'][i] *= track['mmsi'] track['lat'][i] %= (boundary['ymax']- boundary['ymin']) track['lat'][i] += boundary['ymin'] i +=2yield trackwithDBConn('/home/test_database.db')as dbconn: vinfoDB = aisdb.webdata.marinetraffic.VesselInfo(trafficDBpath).trafficDB qry =DBQuery( dbconn=dbconn, dbpath=dbpath, start=start, end=end, callback=aisdb.database.sqlfcn_callbacks.in_bbox_time_validmmsi, **domain.boundary, ) rowgen = qry.gen_qry(fcn=aisdb.database.sqlfcn.crawl_dynamic_static) tracks = aisdb.track_gen.TrackGen(rowgen, decimate=True)# Enrich track data with vessel information tracks = aisdb.webdata.marinetraffic.vessel_info(tracks, vinfoDB)# Add random noise to the tracks tracks =random_noise(tracks, boundary=domain.boundary)# Encode the track data using great circle distance encoding tracks = aisdb.encode_greatcircledistance(tracks, minscore=1e-5, speed_threshold=50, distance_threshold=50000)if__name__=='__main__': aisdb.web_interface.visualize( tracks, domain=domain, visualearth=True, open_browser=True, )
Interpolating, geofencing, and filtering
Building on the above processing pipeline, the resulting cleaned trajectories can be geofenced and filtered for results contained by at least one domain polygon and interpolated for uniformity.
# Define a domain with a central point and corresponding radial distancesdomain = aisdb.DomainFromPoints(points=[(-63.6, 44.6),], radial_distances=[5000,])# Filter the encoded tracks to include only those within the specified domaintracks_filtered = aisdb.track_gen.fence_tracks(tracks_encoded, domain)# Interpolate the filtered tracks with a specified time intervaltracks_interp = aisdb.interp_time(tracks_filtered, step=timedelta(minutes=15))for segment in track_segments:print(segment)
Additional processing functions can be found in the aisdb.track_gen module.
Exporting as CSV
The resulting processed voyage data can be exported in CSV format instead of being printed:
AISDB supports integrating external data sources such as bathymetric charts and other raster grids.
Bathymetric charts
To determine the approximate ocean depth at each vessel position, theaisdb.webdata.bathymetry module can be used.
import aisdb# Set the data storage directorydata_dir ='./testdata/'# Download bathymetry grid from the internetbathy = aisdb.webdata.bathymetry.Gebco(data_dir=data_dir)bathy.fetch_bathymetry_grid()
Once the data has been downloaded, the Gebco() class may be used to append bathymetric data to tracks in the context of a TrackGen() processing pipeline like the processing functions described above.
tracks = aisdb.TrackGen(qry.gen_qry(), decimate=False)tracks_bathymetry = bathy.merge_tracks(tracks)# merge tracks with bathymetry data
AIS data from the database may be overlayed on a map such as the one shown above using the aisdb.web_interface.visualize() function. This function accepts a generator of track dictionaries such as those output by aisdb.track_gen.TrackGen(). The color of each vessel track is determined by vessel type metadata.
from datetime import datetime, timedeltaimport aisdbimport aisdb.web_interfacefrom aisdb import DomainFromPointsdbpath='./YOUR_DATABASE.db'# define the path to your database# Set the start and end times for the querystart_time = datetime.strptime("2018-01-01 00:00:00", '%Y-%m-%d %H:%M:%S')end_time = datetime.strptime("2018-01-02 00:00:00", '%Y-%m-%d %H:%M:%S')domain = DomainFromPoints(points=[(-63.6, 44.6)], radial_distances=[50000]) # a circle with a 50km radius around the location point
maxdelta =timedelta(hours=24)# the maximum time intervaldistance_threshold =200000# the maximum allowed distance (meters) between consecutive AIS messagesspeed_threshold =50# the maximum allowed vessel speed in consecutive AIS messagesminscore =1e-6# the minimum score threshold for track segment validationdefcolor_tracks(tracks):''' set the color of each vessel track using a color name or RGB value '''for track in tracks: track['color']='blue'or'rgb(0,0,255)'yield trackwith aisdb.SQLiteDBConn(dbpath=dbpath)as dbconn: qry = aisdb.DBQuery( dbconn=dbconn, dbpath='./YOUR_DATABASE.db', start=start_time, end=end_time, xmin=domain.boundary['xmin'], xmax=domain.boundary['xmax'], ymin=domain.boundary['ymin'], ymax=domain.boundary['ymax'], callback=aisdb.database.sqlfcn_callbacks.in_time_bbox_validmmsi, ) rowgen = qry.gen_qry() tracks = aisdb.track_gen.TrackGen(rowgen, decimate=False)# Split the tracks into segments based on the maximum time interval track_segments = aisdb.split_timedelta(tracks, maxdelta)# Encode the track segments to clean and validate the track data tracks_encoded = aisdb.encode_greatcircledistance(track_segments, distance_threshold=distance_threshold, speed_threshold=speed_threshold, minscore=minscore) tracks_colored =color_tracks(tracks_encoded)if__name__=='__main__': aisdb.web_interface.visualize( tracks_colored, domain=domain, visualearth=True, open_browser=True, )
For using nightly builds (not mandatory), you can follow the Tutorial for and to install it from the source.
If you want to create your own database using your data, we have you covered. We have a that shows you how to create an SQLite database from open-source data.