INFORMATION CONTENT DETECTION IN FACE PARTS USING DEEP LEARNING

. The present paper introduces use of deep neural network for classification of three different categories of emotions - angry, happy and neutral. The database consisted of 48x48 pixel grayscale images of faces from the Face expression recognition dataset from Kaggle. Separate parts of faces such as eyes, nose, or mouths were occluded by a manually inserted 48x15 pixel black rectangle to see what part of the face carries the most significant information about the expressed emotions. By applying pretrained Inception network provided as a part of Keras/Tensorflow environment, we found that, to our surprise, faces with eyes covered were more easily identified. Results were replicated using augmented data.


INTRODUCTION
It is easy to imagine the brain as a super-efficient computer running algorithms, for example, dedicated to the interpretation of people's facial expressions and emotions.This can serve as a metaphor for understanding what computational modelling is attempting to, to decode and understand these algorithms [1].Importantly, computational models do not reproduce actual brain architecture and cannot explain insights of complex mechanisms used by the brain, however, researchers are able to design computer algorithms that lead to face and emotion recognition.For example, algorithms that automatically code facial actions in still images and video sequences present a great potential in psychology, neuroscience [2], security or forensic setting, as a polygraph aid during interrogation or interviews [3,4], robotics [5] and retail or consumer science etc. [6].
Facial expression recognition is a successfully growing sub-field of image processing [7].A great number of computational models of emotion recognition have been compared and validated by looking at its results, correctness, and accuracy [8,9].Researchers' aim is to develop the most efficient machine learning or computer vision methods and algorithms by further fine-tuning, editing and optimization of parameters [10,11].The process requires making design choices that reflect the nature of data.Techniques that have been used include repeating what worked in other settings and model selection by trial and error search [12].Firstly, the accuracy and characteristics of databases (background, distance from the camera, resolution of pictures, number of expressions used etc.) play an important role in facial expression recognition research.While using the intra-database protocol (training and testing data are from the same database), well-established databases such as Extended Cohn-Kanade Database CK+ [13], MMI [14], Radboud Faces Database RaFD [15] or Karolinska Directed Emotional Faces KDEF [16] models perform with high accuracies.However, when a crossdatabase protocol is adopted (one or more databases are used for training and other databases are utilized for evaluating) models achieve notably lower accuracies [11].Furthermore, even when choice of databases is considered, researchers must cope with an intra-class variation, clutter, occlusion, face pose changes and excessive make-up of participants etc. [17,18].For over two millennia, research has been built on the hypothesis that certain discrete and discernible facial muscle and muscle groups changes are linked to specific emotions [19,20,21].Thus, Action Units (AUs), discriminable by facial muscle changes are considered to be expression markers [22] for the most widely used and validated method of measuring and describing facial behaviours -Facial Action Coding System (FACS), developed by Paul Ekman and Wallace V. Friesen [23].Action Units detail facial movement specifics.For example, AU 12 represents a lip corner puller that allows one to smile (zygomatic major muscle), AU 6 represents a cheek riser displayed also as wrinkling lateral to the eyes (orbicularis oculi contraction and pars orbitalis) and the combinations of those AUs refer to emotion-specified facial expression -happy.Therefore, by allocating AUs such as brow rising and lowering, nostrils dilating etc. and its combination, FACS defines seven emotions: happy, sad, anger, fear, surprise, disgust, and neutrality that might be observed at different intensities.
The problem of emotion recognition (and object recognition in general) has typically been approached using parts-and-shape models that represent not only the appearance of individual object components, but also the spatial relations between them [1,24,25].Similarly, it has been proposed that the brain or computer needs to identify the shape and shading features that are invariant to identity, pose and illumination, and incorporate it into algorithms that identify which of these features best discriminate the muscles involved in each expression [26].On the other hand, Benitez-Quiroz with colleagues argue that facial expressions of emotion and neutral faces can be discriminated solely depending on colour features without information on shading provided by facial muscle movements.
Because of the complexity of faces [27] and emotional expressions, the aim of this paper is to look closer at stimuli present in faces that may convey important information for emotion recognition as this should be regarded as the cornerstone of the computational models and approaches leading to emotion recognition.Similarly, Happy and Routray [28] examined ʼsalient facial patchesʼ as distinct pairs of expression classes in the area of inner facial features such as eyes, nose and mouth.In other words, this study wants to explore the aforementioned area of taxonomic reference setting, (computational descriptors, facial landmarks or classifiers), different feature extraction conditions and their effects on identifying targeted emotions.One may argue that perception and expression of emotions vary cross-culturally [29].For instance, there is evidence that more expressive Englishspeaking western cultures focus more on the area of mouth than people from Asian countries, who pay more attention to eyes [30].This was also reflected in analyses of cross-cultural use of emoticons on Twitter that supported differences of emoticon preferences, as easterners expressed their feelings and states with their eyes, whereas westerners did so with their mouth [31].However, Srinivasan and Martinez [32] in their study identified 35 facial cross-cultural expression configurations (shared expressions across countries) and only 8 culture-specific (used in some, but not all, cultures).
In this study, we look at the information content that may be hidden in specific face parts when displaying emotions.We use a deep learning framework to provide an objective measure of how rich in information content these face parts are.

METHODOLOGY
For the data analysis we used photographs of people from the Face expression recognition dataset from Kaggle [33].All photographs are 48x48 pixel grayscale images of faces, each image corresponds to a facial expression of one emotion category.For the simplicity purposes of this preliminary study, we chose only three categories of emotions -angry, happy and neutral.In every category, we split the data into four groups, either we left pictures intact or we covered eyes, nose, or mouths by a manually inserted 48x15 pixel black rectangle into the photograph.Figure 1 shows samples of input data.
Pretrained Inception network provided as a part of Keras/Tensorflow environment was employed.After removing the final layer, we froze the network and added a softmax layer on the top to allow for training of the data.For every emotion category, we used 40 pictures as a training set, 30 as a validation set and 30 as a test set.The network was trained by applying RMSprop optimizer, for the loss we ran sparse categorical crossentropy with 50 epochs.

RESULTS
For every set of data, we ran 50 simulations to adjust for the stochastic nature of deep learning algorithms.We found an average accuracy of 0.54 for the sample without covering any part of the face (average loss was 1.48).For the sample with covered nose, an average accuracy reached 0.58 (average loss 1.36).For the sample with covered eyes, we found an average accuracy of 0.60 (average loss 1.24).In the sample with covered mouth, there was an average accuracy of 0.44 (average loss 1.75).
We find these results surprising as the accuracy for pictures with a covered nose and eyes respectively was higher than for a non-covered face.To see whether these results may not be caused by some artefact in the data we used augmentation of pictures.Following data augmentation parameters in Keras datagen to modify pictures were used: rotation_range=20, width_shift_range=0.02, height_shift_range=0.02, horizontal_flip=True Again, we ran 50 simulations and found an average accuracy of 0.52 for the sample without covering any part of the face.For the sample with covered nose, we found an average accuracy of 0.50.For the sample with covered eyes, we found an average accuracy of 0.53.For the sample with covered mouth, we found an average accuracy of 0.40.

CONCLUSIONS
In reference to our findings and previous literature, we can conclude that the accuracy of recognition of emotions based on covering of specific face parts varies.This is not an unexpected result.However, the deep network most accurately recognised the emotions with eyes of subjects covered.This finding was replicated also with the augmented data although the differences were less pronounced.We may only hypothesize what may cause such a result.There may be an unknown artefact in the data that may cause this result.Furthermore, the network may work more efficiently when considering less complex data.In future research, we want to operate with larger samples to see whether these results will still hold.

Figure 1
Figure 1 Example of input photographs