Digital Attendance Tracking through Face Recognition: Artificial Intelligence

Digital Attendance Tracking through Face Recognition

Maintaining a very good attendance record will exhibit the commitment and the sincerity of an employee and a student in organizations and educational institutions respectively. Moreover, the punctuality of a person can also be evaluated from the attendance record. It is tremendously essential to track the attendance record for each person at both the entry and exit point. At the same time, it is a time-consuming process because the person has to show the ID to register his/her attendance in each time crossing the checkpoint. Of course, it will create a big queue and usage of the wrong ID, which is not advisable.

Various solutions have been brought up to extemporize the conventional attendance system to tackle the difficulties encountered during the registration of attendance. Simultaneously, closed-circuit television (CCTV) is prevalent for video surveillance, installed in most organizations and institutions. Utilizing those available resources for a useful application will be a smart move. Yes! Face recognition needs cameras to capture images that will be used for identification purposes, paving the way for developing digital attendance tracking through facial recognition.

In this digital era, the digitization of various activities is inevitable due to the availability of state-of-the technology and its applications. One of the best applications is face recognition that we are experiencing on Facebook and other applications. Let us understand, what is face recognition?

Face recognition is defined as the identification of a face or many faces in images or videos that are fed to the system, which helps to identify the person in a short time when compared with conventional methods such as fingerprint, iris recognition and so on. Deep learning, a subset of machine learning, focuses on replicating human behavior in terms of learning from the experience.

A convolutional neural network, one of the deep learning algorithms, is vastly used for extracting features from the images, which is important in recognizing the face from the given images or videos.

Face recognition is a straightforward method that consists of three elements: the first one is to detect the face of the image or the video shown in figure 1, the second one is to extract facial features of the detected face and the third one is mapping the features with the dataset to find out the detected face is whether known face or unknown face shown in figure 2.

Face recognition differs from face detection, that is, the former will give whose face it is, shown in Figure 1 and the latter will only detect the face in the image but not recognize the person, shown in Figure 2.

Figure 1. Face detection shows only the detection of face in the image.

Figure 1. Face detection shows only the detection of face in the image.

Figure 2. Face recognition shows the person’s name associated with the face embedding. It detects the face and also recognizes the person.

Figure 2. Face recognition shows the person’s name associated with the face embedding. It detects the face and also recognizes the person.

There are different deep learning systems used by different companies for face recognition, those systems are given below:

  • DeepFace, developed by Facebook, is a deep learning facial recognition system with an accuracy rate of 97.35% as per the Facebook researchers’ report.
  • FaceNet is a state-of-the-art face recognition system that uses a triple loss function in extracting features of the image, which results in providing more accuracy to the output.

Let us discuss in detail how face recognition works. Face recognition is a systematic sequence of elements that are detecting the face, face alignment, extracting the features of the face and finally, recognizing the detected face.

Detecting the face: This is the first step in face recognition, where the faces present in an image or video will get detected and give as input. There are many classification algorithms used to perform classification based on the features such as  Eigen-based algorithm, neural networks,  support vector machines (SVM), Naïve Bayes techniques, and so on.

Face alignment: It is a normalization technique that helps to enhance the accuracy of the algorithms by centering the image, ensuring eyes in a horizontal line, and making the scale of the faces in the image into an identical one that is followed by all image in the dataset. 

Feature Extraction:  This is a very important step that is used for recognizing the face. Each face has its dimensions of elements such as the chin, left eyebrow, right eyebrow, left eye, right eye, top lip and bottom lip. These features are called as face landmarks that are extracted and stored in a variable for the future recognizing purpose. Figure 3 shows the face landmarks of a face in the image shown in figure 1.

Figure 3. Face landmarks of a face used in figure 1.

Figure 3. Face landmarks of a face used in figure 1.

Recognizing Faces: Once the model has been built we can check the performance of the model by passing an image into the model.  The features of the input image will be taken and then matched to the closest features of the image available in the database. If there is a match between the input image and the images present in the database, it will recognize the person and will display the name of the person along with the image. Otherwise, it will display the image as an unknown image.

Let us now discuss the implementation of face recognition in the attendance system. Breaking down of overall steps involved in the digital attendance tracking system is shown below:

  • Every 5 minutes the camera capture the image and pass to the model
  • Faces in the image will get detected and features are extracted
  • Extracted features will be matched with the database.
  • If matched, the person’s name associated with that image would be displayed
  • And attendance will be marked present along with the date and time.

Opencv library is a useful library for computer vision is another method for face recognition, is used to do video capturing. For accessing the camera to get an image from live video and to give as input to the model this library is utilized. The code used to get an image from the video is shown in figure 4, which is used to get the image for further processing in recognizing the face of that person.

Figure 4. Cv2 code is given to get an image from the video capturing

There are two methods to do image classification used in this article: one is using CNN technique and the other is using dlib.

Method 1

First, TensorFlow should be imported to run keras on top of it. To do machine learning the image data should be converted into tensors that are multidimensional arrays. Those tensors should be in floating point because optimization is required to get learned functions. And also, the normal binary formatting should be done to do image processing.

The dataset consists of different photos of all members of the session that is required to register the attendance. Then, keras models are used to import sequential that will ensure one input and one tensor at each layer and different layers should be added whereby convolution, pooling, flattening can be done. A Rectified Linear Unit (Relu) is the activation function used for the layers.  At first, the convolution layer is added to get the tensor outputs. Then the max-pooling that reduced the dimensionality of the images by decreasing the number of pixels in the output from its previous introduced convolutional layer. Then, the dropout layer is incorporated to ignore the units during the training. Before passing the output to the full connected neural network, it should be flattened by model.add(Flatten()).

The CNN performs convolution process to detect edges in images in both vertical and horizontal ways, which is carried out by filtering, padding, and strides.  By multiplying the filter with the input image, the vertical and horizontal edges can be obtained. The padding will balance out the information of the image obtained from the centre as well as from the edges and assist to deliver the output size as same as input size. The striding is the movement of the convolution matrix with the input image.  

Now, the hidden layers are added with “Relu” as an activation function before the output layer with “SoftMax” as an activation function. In model compilation, loss function should be observed to find the difference between predicted and observed values along with optimizers(Adam).  

Catagorical_crossentrophy is the loss function used here. Fitting the model can be improved if the number of epochs is increased. Once the model is fitted, the same can be evaluated with respect to the accuracy level.

Method 2

When the image is available for the process, the next step is to extract features of that image. The extraction of measurement is done with the help of dlib library. The dlib.frontal_face_detector() is executed to detect the face in the image and dlib.shape_predictor() is applied along with the face landmark data file to obtain a measurement of the important features of a face such as right eye, left eye, right eyebrow, left eyebrow, top lip and bottom lip.  The face feature is shown in figure 5.

The face key points can be found by using face_recognition library, which will give the Euclidean distance between the face images available in the library and the new unknown face with respect to encodings. The face encoding will give a result in the list form that is shown in figure 6.

Figure 5. face feature extraction

Figure 5. Face feature extraction

Figure 6. Face encoding result from face recognition library

Figure 6. Face encoding result from face recognition library

Comparison of known and unknown encoding should be done to check the matching of both images. Again, the face_recogition library is used to compare the encodings of both known and unknown images. The code used for this comparison is shown in figure 7. From the results, the matching can be obtained. That is, if the result is “True”, then the person is the same in both the known and the unknown images. If the result is “False”, the person who appeared in the known image and the person who appeared in the unknown image is different from each other.

Figure 7. Face encoding comparison using face_recognition library.

Figure 7. Face encoding comparison using face_recognition library.

Once the face is matched, attendance should be given. This can be done by xlwt library. The present-day, time, name of the person and attendance status should be updated in the excel sheet, which can be updated by using the following code shown in figure 8.

Figure 8. Attendance update using xlwt library

Figure 8. Attendance update using xlwt library

The excel sheet is shown in figure 9 where the updated status is recorded.

Figure 9. Attendance is updated in the excel sheet.

Figure 9. Attendance is updated in the excel sheet.

Hence, digital attendance tracking through facial recognition has been successfully developed.


This article is co-authored by Satheesh Kumar M and Dr.Hemachandran K, Woxsen University.