magicplan AI part 3

magicplan AI: Part III of our journey into the world of deep learning

Previously in the wonderful journey of magicplan-AI…

In part 2, we showed how you could reach a good level detection accuracy on a Deep Learning model running on powerful GPU, when you have the right expertise on Deep Learning training.

Unfortunately, this is not enough when you want to implement the feature on a smartphone and have to deal with really limited hardware resources both in terms of memory and computing power.

Previous work allowed to design and train a “good enough” door / window object detection model. However, even just for the inference part, this architecture can only run on powerful NVIDIA GPU. It is incompatible with the smartphone limited hardware resources, both in terms of memory requirements and processing time.
This is a huge problem that kept us stuck for a while.

The remote approach

At first, the contemplated solution was to offload the GPU computation on the cloud.

The main advantage of this approach is that the server has the required GPU memory and computation power to run properly the model. So, this solution is available day one.

However, 3 factors made this approach problematic:

  1. magicplan is designed to run both online and offline. Relying on an remote connection for windows / doors detection would change this paradigm,
  2. the upload and download durations create an unwanted and unpredictable lag that can be really annoying during the real-time capture session,
  3. there was an uncertainty in the cost of deploying GPU servers on the cloud in order to scale with the potential demand, once in production.

Nevertheless, the approach was very useful to “simulate”, locally on a private network and in real capture conditions, the reliability of the model and validate that the user experience was acceptable.

Eventually, the combination of 3 solutions allowed us to overcome this critical barrier:

  1. Applying quantising approach to reduce memory footprint through lowering floating precision,
  2. Applying the Teacher / Student approach to our model to “shrink” it to an acceptable memory size, while keeping the same detection accuracy,
  3. Moving part of the model from the TensorFlow standard framework to the Apple CoreML accelerated framework to optimise computing performance on the smartphone.

 

Read the whole article here.