Building a Face Attributes Model Using Multi-Task Learning

Tanin Phongpandecha Tanin Phongpandecha
November 14, 2018 AI & Machine Learning

Ready to learn Machine Learning? Browse Machine Learning Training and Certification courses developed by industry thought leaders and Experfy in Harvard Innovation Lab.

When someone asks you to guess a person’s age, your mind starts to answer all sorts of questions regarding the person’s demographics, then finally you take a guess. Without even being asked, the human mind inter-relates different abstract concepts that add up together to answer a specific question. That is the essence of learning, and in Machine Learning, we call this a Multi-Task Learning Problem.

In Multi-Task Learning, we train our machine to answer multiple questions simultaneously, and in the process, features useful for multiple tasks are shared in our model. In a typical Convolutional Neural Network (CNN), an image is passed through a series of convolutions, nonlinearities, and pooling layers. This gives us abstract representations of our image, called feature maps. The feature maps are then optionally fed into a fully connected layer, then a linear classifier to tell which class the image is in. For multi-task learning, the CNN is required to learn a feature map that is useful to solving all the given tasks. This makes multi-task learning models more generalizable and you wouldn’t need to train many similar models, which is wasteful, because the parameters are shared in one network.

In a recent project, I’ve used multi-task learning to train a model to tell a person’s age, gender, and ethnicity. I’d like to share the important components that made this system work.

Datasets & Benchmarks

The dataset I used was UTK Face Dataset. Before trying to build any model, it was important to have a baseline/benchmark performance. I could have trained the model separately for each of age, gender, and ethnicity, and used that as my benchmarks. But I was googling around, and found this nice blog post where the author uses a pretrained model to achieve 94.5% on gender classification and 86.4% on ethnicity classification. That is great, let’s keep those numbers in mind and get on with training our multi-task learning model.

Data Preprocessing

A crucial step in making this system work is to crop and align the faces from the data. A popular way to do this is to use a Histogram of Oriented Gradients(HOG), but I decided to use Multi-Task Cascaded Convolutional Network(MTCNN), the state-of-the-art deep learning model for face detection and alignment. You could easily find a pretrained model online, loop through the folder containing the images, and crop the faces. Below are some sample cropped photos

Each image label is contained in the filename. As explained in the dataset’s homepage,

The labels of each face image is embedded in the file name, formated like `[age]_[gender]_[race]_[date&time].jpg`

`[age]` is an integer from 0 to 116, indicating the age

`[gender]` is either 0 (male) or 1 (female)

`[race]` is an integer from 0 to 4, denoting White, Black, Asian, Indian, and Others (like Hispanic, Latino, Middle Eastern).

`[date&time]` is in the format of yyyymmddHHMMSSFFF, showing the date and time an image was collected to UTKFace

For this project I used Keras with tensorflow backend. Since our labels are contained in the filename, I created a custom image data generator. The generator accepts a list of filenames as input, parses the names, and yields a numpy array of images as well as three lists of labels: age, gender, and ethnicity.


It is noteworthy that you can model age as a real number and view one head of the model as a regression problem. I’ve tried that and the scale of the mean-squared error differed by orders of magnitude from the categorical crossentropies of gender and ethnicity, so it was hard to add the losses up to optimize the total loss. One thing I tried was giving the losses different weights. The model was very sensitive to the weights, and the weight hyperparameters were hard to tune, so I settled back to viewing age as categories. If you view a person’s age as a probability distribution (you could be age x with probability P(X=x)), then your estimated age is the expected value of your age distribution.

For this, I used softmax to make all the age probabilities add up to 1. Given an image of a face, the model outputs 117 probabilities for age, for which dot product with an array [0,1,2,3,…116] to find the expected age.

Source: https://data.vision.ee.ethz.ch/cvl/rrothe/imdb-wiki/

The Model

Since the UTK Face dataset is only some 20,000 images big, it is necessary to have a pretrained model for embedding faces. For this, I used FaceNet pretrained on MS-Celeb-1M. You can download the model here. I removed the last 3 layers from the pretrained model (bottle neck layer, batch normalization layer, and dropout layer) because it performed better than having them. Next, I added 2 Fully Connected layers with LeakyReLU as the activation as it seems to perform a little better than the regular ReLU. Finally, I added 3 branches of MLPs after for the outputs. In the figure below, AvgPool: GlobalAveragePooling2D was the final layer from the FaceNet after removing the aforementioned 3 layers.

As discussed, all the output layers used softmax as activation, and all were given equal contribution to the final loss. When tested on a test set of 2100 images, the model achieves an Age RMSE of 6.846, Genders Accuracy of 94.84%, and Ethnicities Accuracy of 87.97%.

Here are some predictions from the model:

Whoops, it even got a pretty accurate result of Jeff Bezos, even with his sunglasses on! Another thing to consider when putting this model on production is how big the model was. After some experimenting with the hyperparameters, I was able to reduce the model size from 478 MB to 182 MB while still getting Age RMSE: 7.08, Genders Accuracy: 94.44%, and Ethnicities Accuracy: 86.46%, a very comparable result…! Despite the size differences, both models take around 0.08 seconds on an i7 CPU to predict an image. Imagine how painful it would be if you needed 3 separate models to do each prediction…

Summary

In this post, we walked through the important steps in building a multi-task learning model. As you may already see, most of the models used were pretrained on a bigger dataset. In the world of Deep Learning, you should almost always be using pretrained models of some sort (e.g. ImageNet) as these models are trained on a much bigger dataset that is probably impractical for you to train it on a personal computer. Further, it helps the model converge much faster, and the generalizability is great. Throughout the project, careful considerations of the architecture, hyperparameters, and loss function were extremely important. In the end, hard work paid off, and we got an awesome model!

  • Experfy Insights

    Top articles, research, podcasts, webinars and more delivered to you monthly.

  • Tanin Phongpandecha

    Tags
    Artificial Intelligence
    © 2021, Experfy Inc. All rights reserved.
    Leave a Comment
    Next Post
    Deep Learning vs. Machine Learning for Business Outcomes: A Data Scientist’s Perspective

    Deep Learning vs. Machine Learning for Business Outcomes: A Data Scientist’s Perspective

    Leave a Reply Cancel reply

    Your email address will not be published. Required fields are marked *

    More in AI & Machine Learning
    AI & Machine Learning,Future of Work
    AI’s Role in the Future of Work

    Artificial intelligence is shaping the future of work around the world in virtually every field. The role AI will play in employment in the years ahead is dynamic and collaborative. Rather than eliminating jobs altogether, AI will augment the capabilities and resources of employees and businesses, allowing them to do more with less. In more

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    How Can AI Help Improve Legal Services Delivery?

    Everybody is discussing Artificial Intelligence (AI) and machine learning, and some legal professionals are already leveraging these technological capabilities.  AI is not the future expectation; it is the present reality.  Aside from law, AI is widely used in various fields such as transportation and manufacturing, education, employment, defense, health care, business intelligence, robotics, and so

    5 MINUTES READ Continue Reading »
    AI & Machine Learning
    5 AI Applications Changing the Energy Industry

    The energy industry faces some significant challenges, but AI applications could help. Increasing demand, population expansion, and climate change necessitate creative solutions that could fundamentally alter how businesses generate and utilize electricity. Industry researchers looking for ways to solve these problems have turned to data and new data-processing technology. Artificial intelligence, in particular — and

    3 MINUTES READ Continue Reading »

    About Us

    Incubated in Harvard Innovation Lab, Experfy specializes in pipelining and deploying the world's best AI and engineering talent at breakneck speed, with exceptional focus on quality and compliance. Enterprises and governments also leverage our award-winning SaaS platform to build their own customized future of work solutions such as talent clouds.

    Join Us At

    Contact Us

    1700 West Park Drive, Suite 190
    Westborough, MA 01581

    Email: [email protected]

    Toll Free: (844) EXPERFY or
    (844) 397-3739

    © 2025, Experfy Inc. All rights reserved.