Abstract
Echosounders are employed in sea-based aquaculture
to estimate the biomass present in a net pen. Knowing
the position of the echosounder within the net pen over time,
and monitoring its status are required to assess the quality of
the biomass density measured by the instrument. This paper
studies how to track the position of the echosounder placed
inside the fish cage using the monocular camera of an ROV,
and provides a solution verified to work even on a small dataset.
The methodology for achieving localization of the echosounder
is divided into two separate tasks: detection with tracking and
pose estimation. A dataset of about 1000 labelled images has
been used to train the YOLO v5 detection framework. Once the
object is detected, DeepSort is used to track the echosounder.
The combination YOLO - DeepSort is furthermore robust to
short periods (up to a few seconds) of full occlusion thanks to
DeepSort emplyoing a Kalman filter. The pose estimation task
relies on beforehand known information (i.e., the dimensions of
the echosounder, and the camera calibration parameters) as well
as the position of the object in the image frame determined by
the detection and tracking systems. Standard image processing
techniques are employed to estimate the ellipse shape of the
echosounder. The ellipse parameters, the echosounder dimensions
and the camera parameters are used to estimate the position of
the echosounder with respect to the ROV camera.
to estimate the biomass present in a net pen. Knowing
the position of the echosounder within the net pen over time,
and monitoring its status are required to assess the quality of
the biomass density measured by the instrument. This paper
studies how to track the position of the echosounder placed
inside the fish cage using the monocular camera of an ROV,
and provides a solution verified to work even on a small dataset.
The methodology for achieving localization of the echosounder
is divided into two separate tasks: detection with tracking and
pose estimation. A dataset of about 1000 labelled images has
been used to train the YOLO v5 detection framework. Once the
object is detected, DeepSort is used to track the echosounder.
The combination YOLO - DeepSort is furthermore robust to
short periods (up to a few seconds) of full occlusion thanks to
DeepSort emplyoing a Kalman filter. The pose estimation task
relies on beforehand known information (i.e., the dimensions of
the echosounder, and the camera calibration parameters) as well
as the position of the object in the image frame determined by
the detection and tracking systems. Standard image processing
techniques are employed to estimate the ellipse shape of the
echosounder. The ellipse parameters, the echosounder dimensions
and the camera parameters are used to estimate the position of
the echosounder with respect to the ROV camera.