InstaDeep delivers AI-powered decision-making systems for the Enterprise. With expertise in both machine intelligence research and concrete business deployments, we provide a competitive advantage to our customers in an AI-first world.
Founded in 2014, InstaDeep is today 80 people strong. The company is headquartered out of London with additional offices in Paris, Tunis, Nairobi and Lagos. By combining the skills of our AI researchers, ML engineers, Hardware, Software and Visualisation experts, our in-house team harnesses the power of optimisation and deep reinforcement learning to create AI systems that can tackle the most challenging optimisation and automation challenges in real-life environments. As an industry-agnostic business, our expertise covers industries including financial services, energy, logistics, manufacturing, retail and aviation, amongst others.
We're on a mission to accelerate the transition to an AI-first world that benefits everyone, and in our pursuit to stay ahead of the curve on both research and delivery, we are proud to partner with organisations such as Google DeepMind, Intel and Nvidia.
With a mission to help tackle client’s optimisation problems, one of InstaDeep’s use cases solves the Bin Packing problem. The solution consists of a set of boxes of different sizes that need to be packed efficiently in a container by minimising the wasted space and satisfying operational constraints, e.g. preventing items from overlapping, the need for physical support, weight distribution, etc. To succeed, the agent learned to solve the problem at a superhuman level without any prior human knowledge, and the results outperform and scale better than up-to-date optimisation solvers.
The Reinforcement Learning system was trained on an Intel multi-core system that helped to parallelise simulations and generate data for the agent to accelerate the learning process.
The solution generalises and is applicable to a range of NP-hard problems. Read more about it in the white paper.
Read our most recent news and achievements from Q2 in the newest edition of the Quarterly Digest.
Learning a wide variety of skills that can be reused to learn more complex skills or solve new problems, is one of the central challenges of AI. In this research project in collaboration with Scott Reed and Nando de Freitas, we worked on answering the following question: How can knowledge acquired in one setting be transferred to other settings? A powerful way to achieve this goal is through modularity. Imagine you want to teach a robot how to cook your favourite meal. First, you have to teach it how to chop vegetables, how to fry them, etc. Secondly, to chop vegetables, the robot must learn how to use a knife, how to use a chopping board, how to properly clean your vegetables, and so on. This paper aims to exploit modularity to learn to solve complex problems with neural networks by decomposing them into simpler ones to be solved first.
Adversarial self-play in two-player games has delivered impressive results when used with reinforcement learning algorithms that combine deep neural networks and tree search. Algorithms like AlphaZero and Expert Iteration learn tabula-rasa, producing highly informative training data on the fly. However, the self-play training strategy is not directly applicable to single-player games. Recently, several practically important combinatorial optimisation problems, such as the travelling salesman problem and the bin packing problem, have been reformulated as reinforcement learning problems, increasing the importance of enabling the benefits of self-play beyond two-player games. We present the Ranked Reward (R2) algorithm which accomplishes this by ranking the rewards obtained by a single agent over multiple games to create a relative performance metric. Results from applying the R2 algorithm to instances of a two-dimensional and three-dimensional bin packing problems show that it outperforms generic Monte Carlo tree search, heuristic algorithms and integer programming solvers. We also present an analysis of the ranked reward mechanism, in particular, the effects of problem instances with varying difficulty and different ranking thresholds.
Sequence similarity is a critical concept for comparing short- and long-term memory in order to identify hidden states in partially observable Markov decision processes. While connectionist algorithms can learn a range of ad hoc proximity functions, they do not reveal insights and generic principles that could improve overall algorithm efficiency. Our work uses the instance-based Nearest Sequence Memory (NSM)  algorithm as a basis for exploring different explicit sequence proximity models including the original NSM proximity model and two new models, temporally discounted proximity and Laplacian proximity. The models were compared using three benchmark problems, two discrete grid world problems and one continuous space navigation problem. The results show that more forgiving proximity models perform better than stricter models and that the difference between the models is more pronounced in the continuous navigation problem than in the discrete grid world problems.