TailNet - Deep Learning for Marine Mammal Conservation
This is my passion project for a reason. I love the ocean. Many of the most magical moments in my life have been the underwater encounters with wild marine mammals, such as dolphins. It’s on my bucket list to swim with a whale one day.
When I saw there’s a Kaggle competition to identify humpback whales by photos, I knew immediately what my final project would be. Humpback whales are gentle giants that grow up to 50 ft long. The unique black and white patterns on their tails (called “fluke”) make it possible to distinguish between individuals. They used to be on the edge of extinction but are recovering slowly thanks to ocean conservation efforts. As data scientists, we can contribute to the understanding of their population dynamics.
The challenge
The biggest challenge comes from the lack of good data. There are more than 4000+ whale individuals, but only 9800 labeled photos to learn from. Most individuals have only one photo on record, and even this single shot varies greatly in image quality. Imagine how difficult it is to go through another 15000 unlabeled photos and try figuring out “have I seen you before?” - “yes, briefly and blurry, guess who I am” (by the way, there are more than 4000 of us!).
The brainstorm
I consider this a pattern matching problem in computer vision and decided to use convolutional neural networks as the fundation. ConvNets have grown fast in recent years and are very good at mimicing how human vision works, that instead of seeing just pixels, it tries to understand the symantic structures in images.
I have also identified this specific challenge as a “one-shot” learning problem. I know it sounds very unfamiliar, but I bet many of you have heard of, of even experienced it yourself, the fancy FaceID on the new iphone. Face verification, telling whether you are actually you, is the building block for the more complex problem of face recognition, finding out exactly who you are. When Apple was developing this model, very likely a neural network but of course they don’t make it open source, the developers could not have trained the model on the information of your face, or of any of us here, but when you bought the phone, it’s able to identify you after just one initial setup. How does that work?
Here’s how I would construct a faceID from scratch, but let’s use the whale phots so you know I’m not going off track. After initial processing, pairs of images are fed through two identical ConvNets and become a pair of vectors at the end of the pipeline. This is called an “embedding” or “encoding” process, which concentrate the dimensions down to the most meaningful features. From there, by looking at the distance between the two feature vectors, we decide whether it’s the same whale or not. The key is to train the network so that the features separate different individuals apart. Now that we have a blueprint, let’s go to the assembly line.
TailNet 1.0
Here’s TailNet 1.0. I took a basic approach for each steps above. First, all training photos are processed and cropped using OpenCV, hoping to increase the signal to noise ratio.
To obtain the feature vectors, I experimented with a few pre-trained convnets with weights trained on ImageNet. The idea is that, because these advanced networks are able to distinguish such a diverse set of objects, cars, cats, dogs, humans, chairs, … , if we remove the top layer which makes the 1000-category prediction, and expose the layer underneath where we can extract feature vectors, chances are these vectors capture a great deal of comprehensive knowledge of the patterns in the original images. Then we compute a pair-wise distance between unknow whale to known ones, and find nearest neighbors to make prediction.
With TailNet 1.0, I’m currently at 12th place out of 139 teams (April 5th, 2018). This is three weeks after I started working on this project, while it has been running for 3 months. Not a bad start. The performance, though, still has a lot of room to improve. Right now I got about 62% accuracy on verification (same whale or not?) and 16% accuracy on recognition (which whale?) on a 1/5 validation set.
TailNet 2.0 (in progress)
This bring us to TailNet 2.0. This upgraded version is a Siamese Neural Network, which is basically a conjoined twin-network. For the convnets, the identical twins, I selected the best performer from TailNet 1.0, the InceptionV3, and made the last 5% weights trainable, hoping to get more meaningful tail features in the output vectors. The loss function is designed to minimize distance between the same whale and the opposite for different whales. To overcome lack of data, I applied real-time image augmentation in Keras to simulate different conditions. The model is currently running on a GPU instance on AWS, and I hope to submit an update to Kaggle soon.
TailNet 3.0 (future work)
I plan on applying object detection to further cut out useless/distracting informatin from original photos.
Summary
In summary, TailNet is an example of face recognition applied to a different domain. The basic approaches already got me to a good start and please stay tuned as there are still 3 months till the end of the competition. I envision TailNet applied to a wide range of wildlife identifiation and monitoring. More than that, it should see a great potential in any verification or recognition problems that suffer from a huge categories with limited information for each individual.