The Future of Computer Vision

Hugo Ponte Hugo Ponte
January 8, 2021 AI & Machine Learning

In the next ten years, computer vision will make huge strides. In this article, we take a look at the trends and breakthroughs of the 2010s and what we can expect as we enter the 2020s.

I. A Short History of Computer Vision

Throughout the 80s, 90s, and 00s, computer vision was a notoriously difficult task. Even mediocre performance in a research lab was applauded in the community. And rightfully so – in those days the features used to train machine learning systems on visual tasks were manually designed in a process known as feature engineering.

What is feature engineering, you ask? It means we used our “expert” human intuition to design special tricks that would work on specific patterns within an image to create useful features for a learning computer. Over the years we accumulated many of these tricks, each with their own acronym: HOG, SIFT, ORB and even SURF. However, the unfortunate reality is that solving real world problems requires a carefully curated blend of these tricks. What you used to detect the divider line on a road was not what you would use for recognizing and distinguishing faces. The allure of building general purpose systems remained a distant dream.

II. Moving Beyond Feature Engineering

This drastically changed in the early 2010s, when we saw the biggest revolution in computer vision since the invention of computers themselves.

In 2012, a computer vision algorithm known as AlexNet achieved a 10% improvement over its competitors at the ImageNet Large Scale Visual Recognition Challenge. The world was shocked. The most amazing thing about it: the model used no hand-engineered features. Instead, the model relied on a general purpose learning system known as a neural network. AlexNet’s breakthrough had been to use a GPU (Graphics Processing Unit) to train the computer vision model significantly faster and for longer: AlexNet was trained over 6 days on two consumer-grade GPUs. For comparison OpenAI’s GPT3 released in 2020 trained on the simulated equivalent of 355 years costing roughly $4,600,000 to train.

Since AlexNet we have continued to add data points that are showing a clear and obvious pattern: the bigger the dataset, the bigger the model, and the longer we train for, the better our learning features become. For the first time we can now see a clear path to the general purpose intelligent systems we have always dreamed of.

III. The Roll-Out: Transformers, Mobilize

More recently, in the last couple years, we have seen a new breakthrough in vision algorithms with the emergence of transformers over convolution.

Transformers are a deep learning architecture based around an encoder and decoder which have been popular in natural language (NLP) tasks for some time now. Papers such as DETR out of Facebook’s AI Research group made waves when they showed how transformers could be used to get state of the art performance on vision tasks.

Transformers are simpler to implement than the currently popular computer vision algorithms (such as MaskRCNN) and represent yet another step in the direction of less human engineering in computer vision. The less time we spend developing and tuning these algorithms the more we can tackle increasingly complex tasks, making computer vision more accessible to more humans.

A huge ramification of this as we move into the next decade will be the opportunity to create transformer-friendly hardware that works for both vision and NLP tasks. Right now, there is much debate as to whether the intelligent agents (IoT cameras, Alexa and Google Home devices, etc.) will perform inference on the cloud or directly on the device itself. Is that little device just a dumb sensor sending signals to a specialized brain in the cloud, or is it a little general purpose learner using its own silicone to recognize your face and listen to your commands—perhaps preferable to privacy advocates, as the data never leaves teh device. Moreover, a more homogeneous landscape of model architectures will have repercussions as to whether the edge beats the cloud.

IV. Data Power and the Synthetic Data for Computer Vision

We’ve talked about algorithms and hardware. We now fall on the most important piece of the AI puzzle: data.

The historic trends show us the following: one, algorithms are becoming more generic, and two, the guard rails of human engineering become smaller. The consequence of this is that the performance of computer vision is more dependent on the data used to train it. This should not come as a surprise, we all see the tech giants amassing huge datasets.

However, getting huge datasets is not the answer to more powerful AI. Because these data sets, whether they’re scraped from the internet or painstakingly staged and captured in house, are not the best to train more generic autonomous algorithms. This “real data” allows for all the bias of the real world to inevitably creep into the computer vision algorithms. Further, real data is not easily fed into training: it needs to be cleaned, labelled, annotated, and fixed.

So, we find ourselves poised at the precipice of a technological turn as significant as the introduction of neural nets and transformers. Data is the big hurdle holding back computer vision. And the solution, we would argue, is synthetic data. A quick definition: synthetic data is data created and generated by a computer (think video games or the CGI you see in movies). Full control over this virtual world means pixel perfect labels (think metadata such as which pixels correspond to a face in an image), even labels which may be impossible to label in real world datasets.

Synthetic data is still in the early days. Much like feature engineering in the 2010s, each synthetic dataset is currently designed by hand using human intuition. But as we speak (or read, as it were) startups (including us!) are building the systems which will allow us to generate infinite streams of synthetic data which are designed by learning systems themselves.

Automated synthetic data generation, or as we like to think of it, the advent of a generative platform for synthetic data sets, will be a game changer for computer vision. A decade from now, computer vision algorithms will be constantly improving through a process known as lifelong learning. The model will recognize its weaknesses, generate new synthetic data for that weakness, and train on that dataset. The best part: this will all be automated. An invisible process running on hordes of GPUs somewhere in the cloud.

That’s what we can expect as we enter the 2020s: it is about data, and more specifically, synthetic data. This is what will optimize and enable more complex computer vision.

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Hugo Ponte

    Tags
    Computer VisionDataFutureMachine Learning
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    AI Use Cases That Will Advance The Industries In 2021

    AI Use Cases That Will Advance The Industries In 2021

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.