Hand gesture recognition using machine learning algorithms

Received Apr 24, 2020 Revised Jun 14, 2020 Accepted Jun 29, 2020 Gesture recognition is an emerging topic in today’s technologies. The main focus of this is to recognize the human gestures using mathematical algorithms for human computer interaction. Only a few modes of human-computer interaction (HCI) exist, they are: through keyboard, mouse, touch screens etc. Each of these devices has their own limitations when it comes to adapting more versatile hardware in computers. Gesture recognition is one of the essential techniques to build user-friendly interfaces. Usually gestures can be originated from any bodily motion or state, but commonly originate from the face or hand. Gesture recognition enables users to interact with the devices without physically touching them. This paper describes how hand gestures are trained to perform certain actions like switching pages, scrolling up or down in a page.


INTRODUCTION
Gesture recognition is a technique which is used to understand and analyze the human body language and interact with the user accordingly. This in turn helps in building a bridge between the machine and the user to communicate with each other. Gesture recognition is useful in processing the information which cannot be conveyed through speech or text. Gestures are the simplest means of communicating something that is meaningful. This paper involves implementation of the system that aims to design a vision-based hand gesture recognition system with a high correct detection rate along with a high-performance criterion, which can work in a real time Human Computer Interaction system without having any of the limitations (gloves, uniform background etc.) on the user environment. The system can be defined using a flowchart that contains three main steps, they are: Learning, Detection, Recognition as shown in Figure 1. Learning: It involves two aspects such as  Training dataset: This is the dataset that consists of different types of hand gestures that are used to train the system based on which the system performs the actions.  Feature Extraction: It involves determining the centroid that divides the image into two halves at its geometric Centre. Detection  Capture scene: Captures the images through a web camera, which is used as an input to the system.  Preprocessing: Images that are captured through the webcam are compared with the dataset to recognize the valid hand movements that are needed to perform the required actions.


Hand gesture recognition using machine learning algorithms… (Abhishek B) 117  Hand Detection: The requirements for hand detection involve the input image from the webcam. The image should be fetched with a speed of 20 frames per second. Distance should also be maintained between the hand and the camera. Approximate distance that should be between hand the camera is around 30 to 100 cm. The video input is stored frame by frame into a matrix after preprocessing. Recognition  Gesture Recognition: The number of fingers present in the hand gesture is determined by making use of defect points present in the gesture. The resultant gesture obtained is fed through a 3Dimensional Convolutional Neural Network consecutively to recognize the current gesture.  Performing action: The recognized gesture is used as an input to perform the actions required by the user.

LITERATURE SURVEY
The implementation is divided into four main steps: 1. Image Enhancement and Segmentation 2. Orientation Detection 3. Feature Extraction 4 [1]. Classification. This work was focussed on above four categories but main limitation was change of color was happening very rapidly by the change in the different lighting condition, which may cause error or even failures. For example, due to insufficient light condition, the existence of hand area is not detected but the non-skin regions are mistaken for the hand area because of same color [2]. Involves three main steps for hand gesture recognition system: 1. Segmentation 2. Feature Representation 3. Recognition Techniques. The system is based on Hand gesture recognition by modeling of the hand in spatial domain. The system uses various 2D and 3D geometric and non-geometric models for modeling. It has used Fuzzy c-Means clustering algorithm which resulted in an accuracy of 85.83%. The main drawback of the system is it does not consider gesture recognition of temporal space, i.e. motion of gestures and it is unable to classify images with complex background i.e. where there are other objects in the scene with the hand objects [3]. This survey focuses on the hand gesture recognition using different steps like data acquisition, pre-processing, segmentation and so on. Suitable input device should be selected for the data acquisition. There are a number of input devices for data acquisition. Some of them are data gloves, marker, and hand images (from webcam/Kinect 3D Sensor). But the limitation with this work was change in the illumination, rotation and orientation, scaling problem and special hardware which is pretty costlier [4]. The system implementation is divided into three phases: 1. Hand gesture recognition using kinetic camera 2. Algorithms for hand detection recognition 3. Hand gesture recognition. The limitation here is that the edge detection and segmentation algorithms used here are not very efficient when compared to neural networks. The dataset being considered here is very small and can be used to detect very few sign gestures. The System architecture consists of: 1. Image acquisition 2. Segmentation of hand region. 3. Distance transforms method for gesture recognition [5]. The limitations of this system involve 1. The numbers of gestures that are recognized are less 2. The gestures recognized were not used to control any applications [6]. In this implementation there are three main algorithms that are used: 1. Viola and jones Algorithm. 2. Convex Hull Algorithm. 3. The AdaBoost based learning Algorithm. The work was accomplished by training a set of feature set which is local contour sequence. The limitations of this system are that it requires two sets of images for classification. One is the positive set that contains the required images, the other is the negative set that contains contradicting images [7]. The system implementation consists of three components: 1. Hand detection 2. Gesture recognition 3. Human-Computer Interaction (HCI). It has implemented the following methodology: 1. the input image is preprocessed and the hand detector tries to filter out the hand from the input image 2.A CNN classifier is employed to recognize gestures from the processed image, while a Kalman Filter is used to estimate the position of the mouse cursor. 3. The recognition and estimation results are submitted to a control Centre which decides the action to be taken. One of the limitations of this system is that it recognizes only the static images [8]. This implementation focuses on detection of hand gestures using java and neural networks. It is divided into two phases: -1. Detection module using java where in the hand is detected using background subtraction and conversion of video feed into HSB video feed thus detecting skin pixels. 2. The second module is the prediction module; a convolutional neural network is used. The input feed image is gained from Java. The input image is fed into the neural network and is analyzed with respect to the dataset images. One of the limitations of this system is that it requires socket programming in order to connect java and python modules.

IMPLEMENTATION
A hand gesture recognition system was developed to capture the hand gestures being performed by the user and to control a computer system based on the incoming information. Many of the existing systems in literature have implemented gesture recognition using only spatial modelling, i.e. recognition of a single gesture and not temporal modelling i.e. recognition of motion of gestures. Also, the existing systems have not been implemented in real time, they use a pre captured image as an input for gesture recognition. To overcome these existing problems a new architecture has been developed which aims to design a vision-based hand gesture recognition system with a high correct detection rate along with a high-performance criterion, which can work in a real time HCI system without having any of the mentioned strict limitations (gloves, uniform background etc.) on the user environment. The design is composed of a human computer interaction system which uses hand gestures as input for communication as show in Figure 2. 119 Input to the system is from the web camera or a prerecorded video sequence. Later it detects the skin color by using an adaptive algorithm in the beginning of the frames. For the current user skin color has to be fixed based on the lighting and camera parameter and condition. Once it is been fixed, hand is localized with a histogram clustering method. Then a machine learning algorithm is been used to detect the hand gestures in consecutive frames to distinguish the current gesture. These gestures are used as an input for a computer application as shown in Figure 3. The system is divided into 3 subsystems:

Hand and Motion Detection
The Web-camera captures the hand movement and provides it as input to OpenCV and TensorFlow Object detector. Edge detection and skin detection are performed to obtain the boundary of the hand. This is then sent to the 3D CNN.

Dataset
Dataset is used for training the 3D CNN. Two types of datasets are being usedone for the hand detection and the other for the motion or gesture detection. Hand detection uses EGO dataset, Motion or Gesture Recognition uses Jester dataset.

3D CNN
CNN's are a class of deep learning neural networks used for analyzing videos and images. It consists of several layersinput layer, hidden layers and output layer. It performs back propagation for better accuracy and efficiency. It performs training and verification of the recognized gestures and human computer interactions take placeturning of the pages, zooming in and zooming out. The interactions with the computer take place with the help of PyAutoGUI or System Calls.

CONCLUSION
The importance of gesture recognition lies in building efficient human-machine interaction. This paper describes how the implementation of the system is done based upon the images captured. Hand detection is done using OpenCV and TensorFlow object detector. And further it is enhanced for interpretation of gestures by the computer to perform actions like switching the pages, scrolling up or down the page.