Age and Gender Detection
Objective
To build a gender and age detector that can approximately guess the gender and age of the person (face) in a picture using Deep Learning on the Audience dataset.
- Detect faces Classify into Male/Female Classify into one of the 8 age ranges, put the results on the image and display it.
Introduction
Facial analysis from images has gained a lot of interest because it helps in several different problems like better ad targeting for customers, better content recommendation system, security surveillance, and other fields as well
The task of gender and age detection just from an image is not an easy task even for us humans because it is totally based on looks and sometimes it is not easy to guess it. People of the same age can look very different from what we can guess.
Quividi which is an AI software application which is used to detect age and gender of users who passes by based on online face analyses and automatically starts playing advertisements based on the targeted audience.
Another example could be AgeBot which is an Android App that determines your age from your photos using facial recognition. It can guess your age and gender along with that can also find multiple faces in a picture and estimate the age for each face.
Tools and Frameworks
There are many methods we can use to solve this problem. There are traditional algorithms like “Fisherfaces” and “Eigenface” which are created for face recognition and feature relation methods, but these do not work as well as needed. We can create solutions better than this using CNN (convolutional neural networks) which have emerged as the most preferred model for computer vision tasks. They have proven to be most effective when dealing with image datasets and are the heart of most machine learning computer vision models. We will be using OpenCV for this project
Regression vs Classification
Data-set and Training
- We will use the models trained by Tal Hassner and Gil Levi.
- The Dataset For this python project, we’ll use the Adience dataset; the dataset is available in the public domain and you can find it here.
- This dataset serves as a benchmark for face photos and is inclusive of various real-world imaging conditions like noise, lighting, pose, and appearance. The images have been collected from Flickr albums.
- It has a total of 26,580 photos of 2,284 subjects in eight age ranges (as mentioned above) and is about 1GB in size. The models we will use have been trained on this dataset.
- The predicted gender may be one of ‘Male’ and ‘Female’, and the predicted age may be one of the following ranges- (0 – 2), (4 – 6), (8 – 12), (15 – 20), (25 – 32), (38 – 43), (48 – 53), (60 – 100) (8 nodes in the final softmax layer).
- It is very difficult to accurately guess an exact age from a single image because of factors like makeup, lighting, obstructions, and facial expressions. And so, we make this a classification problem instead of making it one of regression.
- The CNN Architecture The convolutional neural network for this python project has 3 convolutional layers: Convolutional layer; 96 nodes, kernel size 7 Convolutional layer; 256 nodes, kernel size 5 Convolutional layer; 384 nodes, kernel size 3
- It has 2 fully connected layers, each with 512 nodes, and a final output layer of softmax type.
CNN
- In the past few decades, Deep Learning has proved to be a very powerful tool because of its ability to handle large amounts of data.
- The interest to use hidden layers has surpassed traditional techniques, especially in pattern recognition. One of the most popular deep neural networks is Convolutional Neural Networks.
- In the early days of AI, researchers have struggled to make a system that can understand visual data. In the following years, this field came to be known as Computer Vision.
What exactly is CNN?
- A convolutional neural network (CNN/ConvNet) is a class of deep neural networks, most commonly applied to analyze visual imagery.
- Now when we think of a neural network we think about matrix multiplications but that is not the case with ConvNet. It uses a special technique called Convolution.
- Now in mathematics convolution is a mathematical operation on two functions that produces a third function that expresses how the shape of one is modified by the other.
Working
An RGB image is nothing but a matrix of pixel values having three planes whereas a grayscale image is the same but it has a single plane.
For simplicity, let’s stick with grayscale images as we try to understand how CNNs work.
The above image shows what a convolution is. We take a filter/kernel(3×3 matrix) and apply it to the input image to get the convolved feature. This convolved feature is passed on to the next layer.
In the case of RGB color, channel take a look at this animation to understand its working
Convolutional neural networks are composed of multiple layers of artificial neurons. Artificial neurons, a rough imitation of their biological counterparts, are mathematical functions that calculate the weighted sum of multiple inputs and outputs an activation value. When you input an image in a ConvNet, each layer generates several activation functions that are passed on to the next layer.
The first layer usually extracts basic features such as horizontal or diagonal edges. This output is passed on to the next layer which detects more complex features such as corners or combinational edges. As we move deeper into the network it can identify even more complex features such as objects, faces, etc
Based on the activation map of the final convolution layer, the classification layer outputs a set of confidence scores (values between 0 and 1) that specify how likely the image is to belong to a “class.” For instance, if you have a ConvNet that detects cats, dogs, and horses, the output of the final layer is the possibility that the input image contains any of those animals.
Algorithm
Face Detection
-
For face detection, we have a .pb file- this is a protobuf file (protocol buffer); it holds the graph definition and the trained weights of the model.
-
We can use this to run the trained model. And while a .pb file holds the protobuf in binary format, one with the .pbtxt extension holds it in text format.
-
These are TensorFlow files.
Age, Gender Detection
-
For age and gender, the .prototxt files describe the network configuration and the .caffemodel file defines the internal states of the parameters of the layers.
-
We use the argparse library to create an argument parser so we can get the image argument from the command prompt.
-
We make it parse the argument holding the path to the image to classify gender and age for.
-
For face, age, and gender, initialize protocol buffer and model.
-
Initialize the mean values for the model and the lists of age ranges and genders to classify from.
Source code
To detect for an image:
python3 main.py --detect <image>
To detect for image from live camera feed:
python3 main.py
References
I. Rafique, A. Hamid, S. Naseer, M. Asad, M. Awais and T. Yasir, “Age and Gender Prediction using Deep Convolutional Neural Networks,” 2019 International Conference on Innovative Computing (ICIC), 2019, pp. 1-6, doi: 10.1109/ICIC48496.2019.8966704.Abstract: Age and gender identification have become a major part of the network, security and care. It has a common use in age specific content access for children. Social media uses it in delivering layered ads and marketing to extend it’s a reach. Face recognition has developed to a great extent that we have to map it further in getting more useful results having different approaches. In this paper, we propose deep CNN to improve age and gender prediction from significant results can be obtained and a significant improvement can be seen in various tasks such as face recognition. A simple convolutional network architecture is proposed to make a noticeable improvement in this field using existing methods. Using deep CNN, model is trained to an extent that accuracy of Age and Gender become 79% using HAAR Feature-based Cascade Classifiers is an effective method proposed by Paul Viola and Michael Jones. It is a machine learning based approach where a cascade function is trained from a lot of positive and negative images. It is then used to detect objects in other images.