Authors: Danylo Snahoshchenko, Volodymyr Korzun, Data Science researchers at Ciklum
Object detection is one of the most popular applications of artificial intelligence (AI) and machine learning. Training machine-learning algorithms to recognize and identify items contained within a photograph or video stream has far-reaching implications, making tools like augmented reality or facial recognition an actuality.
While object detection models have proliferated in use, many data scientists and software developers continue to rely on the same tried-and-true methods of object identification. To try a different approach, Ciklum`s R&D team set out to build a unique object detection model for use in future applications with new possibilities.
The Existing Object Detection Approach
Many of today’s object detection methods rely on traditional rectangular bounding boxes to locate objects in images. For many applications, this capability is sufficient — but also includes a number of significant drawbacks. These shortcomings include:
- The inability to provide the exact physical size of an object,
- Difficulty in distinguishing dense clusters of objects, and
- A lack of compensation to camera orientation when a camera is moved
In object detection models, when the viewpoint of the camera moves to the top of the object, orientation becomes an important factor. Otherwise, the orientations of the objects become arbitrary and more difficult to detect. We sought to solve this problem by developing a new model to compensate for any change in object orientation.
In a research paper by Lei Liu, Zongxu Pan, and Bin Lei, “Learning a Rotation Invariant Detector with Rotatable Bounding Box,” the trio identified the inherent difficulty behind detecting arbitrarily rotated objects. Existing models are not comprehensive enough to accurately locate multi-angle objects and effectively separate them from the background because most models rely on a traditional bounding box, a rotation variant structure for identifying rotated objects.
The researchers proposed a new approach to object detection called a rotatable bounding box (RBBox). When orientation angles of the objects are arbitrary, a proposed detector (DRB) can be trained to force detection networks to learn an object’s correct orientation angle and identify a correct rotation invariant property. We used this research as the basis of our hypothesis on how to build its new object detection model.
Ciklum’s Research & Development
Upon the conclusion of its research, our R&D team proposed an object detection hypothesis focused on the RBBox. The RBBox is identified as a rectangle parameterized by four variables: the center point position (two variables), the width, and the height. The research and development team also operated off of the assumption that two more points — left and right shifts from the center point — could be predicted using a customization of the most efficient deep learning architectures.
With the hypothesis in place, we developed a robust research pipeline to kick off the development process.
One Week: Dataset Exploratory Data Analysis
The first step was to carry out an exploratory data analysis on the dataset. This step is essential in understanding the main characters of the data through visualizations to unearth insights that become critical in the development process for building algorithmic models.
The analysis led us to move forward with data from the Kaggle Airbus Ship Detection Challenge, which contained satellite images of ships that users must locate. Since many of the images do not contain ships or contain multiple ships, this dataset is ideal for simulating difficult configurations. In fact, many of the ships contained within the image dataset significantly differed in size and shape, and they were located in a wide variety of shifting environments, including docks, marinas, or on the open sea.
One Week: Dataset Preparation
With the dataset chosen and analyzed, our team began the process of preparing the data for use by the eventual machine-learning model. One of the primary tasks was to transfer image masks to RBBoxes, reorganizing them into a standard form.
In each image, labelled data consists of image masks that are aligned with bounding box segments around each ship. Two steps needed to be carried out in order to prepare the data for use: creating annotated data from RLE to RBBox format, and splitting the dataset into a training and validation set using only images that contained ships.
Three to Five Weeks: Creation of a RetinaNet Adaptation for RBBoxes
For further research and experimentation, Ciklum`s R&D team implemented the open source Keras RetinaNet object detector. We modified source code data to make the model adapt to predicting the additional data points. This development work required to further investigate how the model developed and worked to train existing architecture in label format, leading the research and development team to add new features that would lead models to predict six data points instead of four. This also led to the creation of additional parameters for future training of bounding boxes.
One to Two Weeks: Training, Testing, and Validation Strategies Development
For the next few weeks, we devoted significant efforts to debugging permanent errors and running models that would result in successful training. With time, we improved the post-processing results by recognizing the network was better at predicting left shifts in the imagery. We used this shift and orientation (produced by the right shift) to ultimately plot more accurate bounding boxes.
One to Two Weeks: Baseline Model Selection, Development, and Training
With properly tested and validated data, we were ready to find its machine-learning model and begin development and training. It was crucial to find the proper object detectors for the project.
In general, object detectors are divided into one-stage and two-stage approaches. One-stage detectors are applied over a regular, dense sampling of possible object locations and are potentially the faster and simpler choice, but they often trail the two-stage model in their accuracy due to extreme class imbalance encountered during training.
Nevertheless, we chose to focus on the RetinaNet one-stage detector — a single, unified network composed of a backbone network and two task-specific subnetworks. This model employed “focal loss,” a reshaping of cross entropy loss that often down-weights the loss assigned to well-classified examples. The novel focal loss analyzes training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the data detector during training.
Two to Three Weeks: Model Improvement and Retraining to Achieve Optimal Performance
Upon the successful build of the model, Ciklum`s team carried out weeks of additional development work and made additional improvements. These improvements would ultimately lead to better predictions about small objects, and high density-located objects.
Predicting small objects:
Predicting high-density located objects:
As a result of Ciklum’s rotation detector project, the research and development team finally had a functional and robust object detection algorithm in place. The applications of such a tool are tremendous and wide-reaching; for instance, the same ship-identifying model could be used with satellite data to find vessels in areas designated off-limits to commercial fishing ventures, supporting sustainable fishing projects around the world.
Need machine learning expertise for your project? Drop us a message about your challenges and our experts will build sophisticated AI-powered algorithms for your business goals.