U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of jimaging

A Comparative Review on Applications of Different Sensors for Sign Language Recognition

Muhammad saad amin.

1 Department of Computer Science, University of Turin, 10149 Turin, Italy

Syed Tahir Hussain Rizvi

2 Department of Electronics and Telecommunication (DET), Politecnico di Torino, 10129 Torino, Italy

Md. Murad Hossain

3 Department of Modelling and Data Science, University of Turin, 10149 Turin, Italy; [email protected]

Sign language recognition is challenging due to the lack of communication between normal and affected people. Many social and physiological impacts are created due to speaking or hearing disability. A lot of different dimensional techniques have been proposed previously to overcome this gap. A sensor-based smart glove for sign language recognition (SLR) proved helpful to generate data based on various hand movements related to specific signs. A detailed comparative review of all types of available techniques and sensors used for sign language recognition was presented in this article. The focus of this paper was to explore emerging trends and strategies for sign language recognition and to point out deficiencies in existing systems. This paper will act as a guide for other researchers to understand all materials and techniques like flex resistive sensor-based, vision sensor-based, or hybrid system-based technologies used for sign language until now.

1. Introduction

A speaking or hearing disability is a cause by which people are affected naturally or accidentally. In the world’s whole population, there are approximately 72 million deaf-mute people. Lack of communication is seen between ordinary and deaf-mute people. This communication gap affects their whole lives. A unique language based on hand gestures and facial expressions lets these affected people interact with the environment and society. This language is known as sign language. Sign language varies according to the region and native languages. However, when we speak of standards, American Sign Language is considered as a standard for number and alphabets recognition. This standard is considered as the best communication tool for affected people only. An average healthy person with all abilities to speak and hear is not required to know this prototype because that person is entirely unfamiliar with these signs. There are two ways to make communication feasible between a healthy and affected person. Firstly, convince that healthy person to learn all sign language gestures for communication with the deaf-mute person or, secondly, make any deaf-mute person capable of translating gestures into some normal speaking format so everyone can understand sign language easily. Considering the first option, it almost looks impossible to convince any healthy person to learn sign language for communication. This is also the main drawback of sign language. Therefore, technologists and researchers have focused on the second option to make deaf-mute people capable of converting their gestures into some meaningful voice or texture information. For Sign language recognition, a smart glove embedded with sensors was introduced that can convert handmade gestures into meaningful information easily understandable by ordinary people.

Smart technology-based sign language interpreters that remove the communication gap between normal and affected people use different techniques. These techniques are based on image processing or vision-sensor based techniques, and sensor fusion-based smart data glove-related techniques, or hybrid techniques. No such limitations are seen in these technological interpreters as extracting required features from an image usually creates problem due to foreground and background environmental conditions. If we consider an image or vision-sensor based recognition system, there is no limitation of foreground or background in gesture recognition. Considering a sensor-based smart data glove, there is no limitation of carrying this data glove as it is mobile, lightweight, and flexible. Research has shown that many applications based on vision-sensors, flex-based sensors, or hybrid techniques with different combinations of sensors are currently being used for a communication tool. These applications also act as a learning tool for normal people to comfortably communicate with deaf or mute people. Latest technologies like robotics, virtual reality, visual gaming, intelligent computer interfaces, and health-monitoring components use sign language-based applications. The goal of this sign language recognition-based article was to deeply understand the current happenings and emerging techniques in sign language recognition systems. This article completely reflected on the evolution of gesture recognition-based systems and their performance, keeping in mind the limitations and pros and cons of each module. The aim of this study was to understand technological gaps and provide analysis to researchers so they can work on highlighted limitations in future perspectives. So, the aims and objectives of the prescribed study were fulfilled by considering published articles based on the specified domain, the technology used, gestures and hand movements recognized, and sensor types and languages targeted for recognition purposes. This paper also reflected on the method of performance evaluation and effectiveness level achieved for analyzing sign language techniques used previously.

For sign language recognition, fingers bend and hand position is explicitly considered. Gesture location and movement of the body and hands collectively perform any sign translation. For recognizing and identifying any sign made for some specific language, recognition factors play an important role in this scenario. These recognition factors include facial expressions, hand orientation, head and body movement types, finger configuration, and articulation point analysis as shown in Table 1 . Any sign made by the deaf-mute person is a combination of these factors. Automated intelligent systems use these factors for the recognition of gestures.

Basic elements of sign language recognition.

People with speech and hearing disabilities use sign language based on hand gestures. Communication is performed with specific finger motions to represent the language. A smart glove is designed to create a communication link of individuals with speech disorders. It provides a close analysis of the engineering and scientific aspects of the system. The fundamentals are taken into account for the social inclusion of such individuals. A smart glove is an electronic device that translates sign language into text. This system is designed to make communication feasible between the mute people and the public. Sign language recognition-based techniques consist of three main streams that are vision-sensor based, flex sensor-based, and a combination of both vision sensor and flex sensor fused systems and are listed below.

  • Vision-sensor based SLR system
  • Sensor-based SLR system
  • Hybrid SLR system

In vision-sensor based systems, input data are captured using a high-definition camera, mostly Kinect cameras, for input-processing purposes. This captured input image is processed and recognized by matching it with sign language images placed in the database. The main advantage of using the vision-sensor based method is relief from sensors. The sensor-based model is costly compared with the vision-based model, which requires only an HD camera. The laptop’s built-in cameras are avoided as they produce blur images for processing. In the case of the blurred input image, the pre-processing cost is increased. Despite the advantages, there are some disadvantages of using a vision-sensor based approach for sign recognition. In a camera-based model, a field of view is limited for capturing input data. The number of cameras increased as we dealt with occlusion and depth-related issues in image-based models. The increased number of cameras also increased the computational cost of the overall system. A systematic model of steps involved in the vision-sensor-based sign language recognition system is shown in Figure 1 .

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g001.jpg

Processing steps for the vision sensor-based SLR system.

A smart glove embedded with different sensors is used for gesture recognition in the sensor-based model. These sensors include flex sensors for measuring figure bending, accelerometers for angle and degree movement, and abduction and proximity sensors. These sensors are attached to the user’s fingers and on the upper side of the palm; therefore, the length of the sensor can vary according to finger length. The sensors provide outputs in analog form. Value variation is determined by the degree of finger bending. The value provided by the sensor will be given to the controller. It processes the signals and performs analog to digital signal conversion. At this stage, the data are in the form of digits having some specified values according to the sign made for language. These sign values are collected and stored in a file to perform actions. The major advantage of using different sensors is the accomplishment of direct values related to some specific sign made according to some alphabet or digit. So, there is no need for pre-processing as in the vision-sensor-based model. A systematic model of steps involved in a sensor-based sign language recognition system is shown in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g002.jpg

Processing steps for the sensor-based SLR system.

A hybrid system for sign language is a combination of both vision-sensor-based and combination of different sensors based models. So, glove and camera-based orientation are collectively used for raw gesture data collection. Data accuracy and precision are determined using the mutual error elimination method. This method removes errors for data accuracy. Due to complex and large computations and cost overhead, this method is not commonly used for data collection. However, hybrid methods of data collection are used in augmented reality and virtual reality-based systems, which somehow produce promising results related to sign language gesture-based data collection. Modern techniques are moving towards advanced sensors and vision-sensor-based sign recognition approaches. Processing steps for the hybrid recognition model are shown in Figure 3 given below.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g003.jpg

Processing steps for the hybrid SLR system.

A comprehensive study related to sign language recognition was also conducted. Review-based articles played a vital role in understanding sign language fundamentals. The general review paper reflected application-based discussion with several pros and cons. Emerging trends, developing technologies, and sign-detection device characteristics were the focus of general review articles. Sign language recognition-based review articles covered most techniques achieved for gesture recognition. The focus of these articles was on technology. Choice of the right sensory material and limitations in existing systems are reflected in these articles. So, these articles provide a deep understanding of recognition materials and methods used to obtain an efficient sign dataset with maximum accuracy. A detailed analysis of sensor based, vision based, framework based, commercial, non-commercial and hybrid systems covering the conceptualization of review articles will be discusses in Section 2 of this paper. Machine learning is an emerging technique with very high efficiency and good output response. Mainly machine learning is used for training systems that will perform actions intelligently. Basically, any machine can be made intelligent with the help of the machine learning approach. Machine learning techniques are counted as under the tree of artificial intelligence. So, the data by which our algorithm learns to perform operations were provided by the sensor data collected in a file. These data were used for training purposes. In this way, the gesture is recognized, and the algorithm efficiency can be computed via testing operations. With the help of this technique, the barrier faced by mute people in communicating with society can be reduced to a great extent when someone wants to share with people who are not able to speak due to a vocally impaired disability. This communication gap creates a problem. Therefore, sign language is used for communication. There are several people who cannot understand these sign gestures. Communication hindrance between the public and mute people is the main problem to be addressed.

2. Literature-Based Sign Language Recognition Models

A lot of image processing and sensor-based techniques have been applied for sign language recognition. Recent studies have shown the latest framework for sign recognition with the advent of time. Detailed literature analysis and a deep understanding of sign language recognition categorized this process into different sub-sections. The further division was completely application-based depending upon the type of sensors used in the formation of data gloves. So, these subdivisions were based on non-commercial prototypes for data acquisition, commercial data acquisition prototypes, Bi channel-based systems, and hybrid systems. Introducing non-commercial systems, these prototypes are self-made systems that use sensors to gather data. These sensor values are transmitted to another device, usually any processor or controller, to understand transmitted sensor data and convert these data into their respective sign language format. Most of the sensor-based systems are non-commercial prototype-based systems discussed in the literature review. In non-commercial systems, most of the authors worked on finger bend detection regarding any sign made. So, a large variety of different solo sensors or combinations of different sensors were used to detect this finger bending. So, SLR models can be further divided into non-commercial prototype-based and framework-based prototypes. Different literature review-based models are discussed in detail in the sections below.

2.1. Sensor-Based Models

An American Sign Language Recognition system with flex and motion sensors was implemented in ref. [ 1 ]. This system produced better results than a cyber glove embedded with a Kinect camera. Authors succeeded in proposing a model which performs better recognition of signs using different algorithms of machine learning [ 2 ]. Response time and accuracy were increased using better sensors and an efficient algorithm for the specified task. The traditional approach of sign language was changed by embedding sensor networks for recognition purposes. A proposed model was implemented by the combination of Artificial Neural Network (ANN) and Support Vector Machine (SVM) in sign language recognition. This combined algorithm produced better results than the Hidden Markova model (HMM) [ 3 ]. A smart data glove named “E-Voice” was developed by authors for alphabetical gesture recognition of ASL. The prototype was designed using flex sensors and accelerometer sensors. The data glove was successful in recognizing sign gestures with improved accuracy and increased efficiency [ 4 ]. Sign language is a subjective matter, so a new method of recognition was developed using surface electromyography (sEMG). Here, sensors were connected to the right forearm of the subject and collected data for training and testing purposes. They used the Support Vector Machine (SVM) algorithm for recognition and obtained better results for real-time gesture recognition [ 5 ]. Another model was proposed as a combination of three types of sensors. These sensors included flux, motion, and pressure sensors to determine SVM impact on sign language recognition [ 6 ]. Daily activity was recognized using a smart-data glove. Two basic techniques for gesture interpretation were used using data glove interaction [ 7 ].

Implementation of a more advanced approach including a deep learning model was developed in ref. [ 8 ]. Static gestures were converted into American Sign Language alphabets using Hough Transform, and this technique was applied on 15 samples per alphabet and obtained 92% accuracy. A combination of motion tracker and the smart sensor was used in sign language recognition. An Artificial Neural Network approach was implemented to obtain the desired results [ 9 ]. The Artificial Neural Network translates American Sign Language into alphabets. A sensory glove called a smart hand glove with a motion detection mechanism was used for data collection purposes, and as a result, the transmitter-receiver network processed input data to control home appliances and generate recognition results [ 10 ]. Hand-body language-based data analysis was performed using a machine-learning approach. A sensor glove embedded with 10 sensors was used to capture 22 different kinds of postures. KNN, SVM, and PNN algorithms were applied to perform sign language posture recognition [ 11 ]. The authors in ref. [ 12 ] presented a device named the “Electronic Speaking Glove”. This device was developed using a combination of Flex sensors. Flex sensor data were fed into a low-power, high-performance, and fast 8-bit TMEGA32L AVR microcontroller. A reduced instruction set architecture (RISC)-based AVR microcontroller used the “template matching” algorithm for sign language recognition.

Another Sign language recognition-based system was developed by [ 13 ]. This virtual image interaction-based sensor system succeeded in recognizing six letters, i.e., “A, B, C, D, F, and K,” and a digit, “8”. So, the prototype was developed using two flex sensors attached to the index and middle finger of the right hand. These sensor data were transmitted towards the Arduino Uno microcontroller. In this experimental setup, MATLAB-based Fuzzy logic was implemented. A sign gesture recognition-based prototype was developed by the authors in ref. [ 14 ]. This prototype consisted of a smart glove embedded with five flex sensors. This acquired data were then sent towards the Arduino Nano microcontroller, and a template matching algorithm was used for gesture recognition purposes. This experimental set succeeded in recognizing four gestures made for Sign Language. A Liquid Crystal Display (LCD) and a speaker were used to display and speaking recognized gestures, respectively. The authors in ref. [ 15 ] developed an experimental model for Standard American Sign Language (ASL) alphabet recognition. A programmable intelligent computer (PIC) was used to store the predefined alphabet data of ASL alphabets. This experimental setup was also based on template matching phenomena. For data acquisition purposes, a smart prototype based on three flex sensors along with an analogue to digital (ADC) converter, an LCD, and an INA 126 instrumentation speaker were utilized. In this setup, the 16F377A modeled microcontroller was used, which succeeded in recognizing 70% of ASL alphabet gestures.

A more advanced, intelligent, and smart system was implemented by the authors in ref. [ 16 ]. Their experimental setup included eight contact sensors and nine flex sensors. These sensors were placed inside and outside of fingers. The outer five sensors were deployed to detect bending changes, and the inner four sensors were attached to measure hand orientation. This system was also based on a template matching algorithm where a unique 36 gesture-based standard ASL dataset was matched with input data. The ATmega 328P microcontroller was used for matching purposes, which succeeded in producing 83.1% and 94.5% accuracy for alphabet and digits, respectively.

In ref. [ 17 ], the authors made a Standard Sign language recognition prototype. This prototype consisted of five flex sensors embedded with an ATmega 328 based microcontroller. Senor-based acquired data were compared with the already stored ASL dataset. This experimental set succeeded in producing 80% overall accuracy. To facilitate the deaf-mute community, another important contribution was presented by ref. [ 18 ]. The authors succeeded in developing a prototype which translates sign language into its perfectly matched alphabet or digit. Eleven resistive sensors were used to measure the bending of each finger. Two separate sensors were utilized in this scenario to detect wrist bending. The developed smart device worked perfectly on static gestures and produced good results in alphabet recognition with an overall accuracy of 90%. A Vietnamese Sign language recognition system was developed using six accelerometer sensors [ 19 ]. This prototype was designed for 23 local Vietnamese gestures, including two extra postures of “space” and “punctuation”. Gesture classification was performed using Fuzzy logic-based algorithms. This device, named “AcceleGlove”, succeeded in producing 92% overall accuracy in Vietnamese Sign language recognition. A posture recognition system was developed in ref. [ 20 ]. This innovative glove-based system was assembled using a flex sensor, force sensor, and a MPU6050 Accelerometer sensor. Five flex and five force sensors were attached to each finger, and an accelerometer was attached to to the wrist. The experimental setup comprised data from flex sensors, force sensors, gyro sensors, accelerometer sensors, and IMU sensor data. All these sensors related to Arduino Mega for data acquisition. Based on data classification, the output was displayed on LCD. This system achieved 96% accuracy on average. A real-time sign-to-speech translator was developed to convert static signs into speech by using “Sign Language trainer & Voice converter” software [ 21 ]. Data were acquired using five flex sensors and a 3-axis accelerometer sensor connected with an Arduino-based microcontroller.

A handmade sign recognition system was developed with the help of LabVIEW software using the Arduino board. The user interacted with the environment using the LabVIEW provided Graphical User Interface (GUI), and recognition was performed with the help of Arduino [ 22 ]. Another smart-sensor-embedded glove was developed by [ 23 ]. A good combination of flex sensors, contact sensors, and a 3-axis ADXL335 accelerometer was used for recognition purposes. Flex sensors were attached to each finger of the hand, and contact sensors were placed in between two consecutive fingers. The sign language-based gestures were obtained using a described smart glove. These sign-based analog data were transferred towards the Arduino Mega environment for recognition purposes. Classified sign gestures were displayed with the help of a 16 × 2 Liquid Crystal Display (LCD) and were converted into speech with the help of a speaker. A smart glove based on five flex sensors and an accelerometer was designed for sign language recognition [ 24 ]. This data glove transferred analog signal data to the microcontroller for recognition. Lastly, the output was shown with the help of pre-recorded voice matched with a recognized sign. A sign language recognition system based on numeric data was developed in ref. [ 25 ]. The authors used a combination of a Hall sensor and a 3-axis accelerometer. The smart data glove was composed of four Hall sensors attached to the fingers only. Hand orientation was measured with the help of an accelerometer, and finger bend was detected by using Hall sensors. These analog sensor data were passed towards MATLAB code to ideally recognize signs made by smart gloves. This experimental setup was only tested on numbers ranging from 0 to 9. The developed system succeeded in producing an accuracy of 96% in digit recognition.

Despite traditional sensor-based bright data gloves, another advanced approach was utilized by [ 26 ]. A smart glove for gesture recognition was created by using LTE-4602 modeled light emitting diodes (LEDs), photodiodes, and polymeric fibers. This combination was used only to detect finger bending. Hand motion was also captured using a 3-axis Accelerometer and gyroscope. This portable smart glove succeeded in hand gesture recognition made for sign language translation. The authors also made regional sign language systems from different origins. An Urdu Sign Language-based system was developed in ref. [ 27 ]. The smart data glove was composed of five flex sensors attached to each finger, and a 3-axis accelerometer was placed at the palm. To display the output, a liquid crystal display (LCD) of 16 × 2 dimensions was used. The authors succeeded in creating a dataset of 300 × 8 dimensions, and the Principal Component Analysis (PCA) technique was utilized using the MATLAB software to detect static sign gestures. Using PCA, the authors succeeded in achieving 90% accuracy. Another regional sign language recognition system was presented in ref. [ 28 ]. The authors made a prototype to convert Malaysian Sign Language. A smart data glove made of Tilt sensors and a 3-axis accelerometer was developed for recognition purposes. Microcontroller and Bluetooth modes were also involved in this prototype to classify detected signs and transmit them to a smartphone. The microcontroller operated on template-matching phenomena and succeeded in recognizing a few Malaysian Sign Language gestures. Overall system accuracy was from 78.33% to 95%. Flex sensor and accelerometer-based smart gloves can perform alphanumeric data classification. Using this prototype, 26 alphabets and ten digits can be recognized using a template-matching algorithm [ 29 ]. Five flex sensors attached to each finger produced an analog signal of a performed gesture which was transferred towards an Arduino Uno microcontroller. Including an accelerometer for hand motion detection, the authors obtained eight valued data for a sign gesture. In ref. [ 30 ], the authors developed two gloves-based models. These models contained ten flex sensors attached to each finger of both hands and a 9 degree of freedom (DoF) accelerometer for motion detection. Two glove-based systems were tested on phonetic letters, including a, b, c, ch, and zh. With the help of a matching algorithm, the authors performed static sign recognition with approximately 88% accuracy.

American Sign Language classification and recognition-system-based probabilistic segmentation were presented in ref. [ 31 ]. This system was divided into two main modules. The first module performed segmentation based on the Bayesian Network (BN). Data obtained during this session were used for training purposes. The second module was based on classification using a combination of Support Vector Machine (SVM) classifier with multilayer Conditional Random Field (CRF). This system succeeded in producing 89% accuracy on average. The authors in ref. [ 32 ] brought some innovation in existing sign language recognition systems by combining data obtained from sensor gloves and the data obtained using hand-tracking systems. A very well-known methodology known as Dumpster-Shafer theory was implemented on the obtained and fused data for evidence assembling. This fused system achieved 96.2% recognition accuracy on 100 two-handed ArSL. Hand motion and tilt sensor-based sign data were collected using Cyber Glove [ 33 ]. The classification of 27 hand-shapes based on Signing Exact English (SEE) was performed using Fisher’s linear discriminant embedded with a linear decision tree. Vector Quantization Principal Component Analysis (VQPCA) was used as a classification tool for sign language recognition. This system was successful in obtaining 96.1% overall accuracy.

An Arabic Sign Language recognition-based deep learning framework focusing on the singer independent isolated model was discussed in ref. [ 34 ]. The main focus of this research was on the regional sign gestures. In a vast variety of regional domains, these authors focused on only Arabic sign gestures and implemented deep learning-based approaches to achieve the desired results. Implementation of hand gestures recognition for posture classification was implemented in ref. [ 35 ]. The prototype was purely based on real-time hand gesture recognition. For implementation, an IMU-based data glove embedded with different sensors was used to achieve the desired results. Another advancement in the field of sensor-based gestures recognition was implemented by ref. [ 36 ]. A dual leap motion controller (LMC)-based prototype was designed to capture and identify data. Gaussian Mixture Model and Linear Discriminant based approaches were implemented to achieve results. A case study-related implementation based on regional data was implemented in ref. [ 37 ]. The authors focused on Pakistani Sign Language models to work on Multiple Kernel Learning-based approaches. Working with signal-based sensor values for classification of real-time gestures was implemented in ref. [ 38 ]. The authors worked on wrist-worm-based real-time hand and surface postures. EMGs and IMU-based sensors were embedded to achieve the desired values of sign postures. An armband EMG sensor-based approach was implemented by the authors in ref. [ 39 ]. The main focus was to classify finger language by utilizing ensemble-based artificial neural network learning. The sensor values helped ANN to classify gestures accurately. A sign language interpretation-based smart glove was designed by the authors in ref. [ 40 ]. A sensor-fused data glove was used to recognize and classify SL postures. Another novel approach to capture sign gestures was discussed in ref. [ 41 ]. The developers of the smart data glove named it SkinGest as it completely grips skin with no detachments. For capturing gestures and postures data, filmy stretchable strain sensors were used. Leap motion-based identification of sign gestures was implemented with the help of a modified LSTM model in ref. [ 42 ]. Continuous sign gestures was perfectly classified using an LSTM model to get the desired results. Another novel approach of working with key frame sampling was implemented in ref. [ 43 ]. The authors also focused on skeletal features to utilize an attention-based sign language recognition network. A Turkish sign language dataset was processed using based line methods in ref. [ 44 ]. Large scale multimodal data were classified based on regional postures to achieve good recognition results. Similarly, the authors used Multimodal Spatiotemporal Networks to classify sign language postures in ref. [ 45 ]. Development of a low cost model for translating sign gestures was targeted in ref. [ 46 ]. The main focus was the development of a smart wearable device with a very reasonable price.

2.2. Vision Based Models

The authors in ref. [ 47 ] developed a model using vision-sensor-based techniques to extract temporal and spatial features from video sequences. The CNN algorithm was applied on removed lines to identify the recognized activity. An American Sign Language dataset was used for feature extraction and activity recognition. An Intel Real sense camera was used to translate American Sign Language (ASL) gestures into text. The proposed system included an Intel Real camera-based setup and applied SVM and Neural Network (NN) algorithms to recognize sign language [ 48 ]. Due to the large set of classes, inter-class complexity was increased to a large extent. This issue was resolved using the Convolutional Neural Network CNN-based approach. Depth images were captured using a high-definition Kinect Camera. Obtained images were processed using CNN to obtain alphabets [ 49 ]. Real-time sign language was interpreted using CNN to perform real-time sign detection. This approach does not include outdated datasets or predefined image datasets. The authors manually implemented a real-time data analysis mechanism rather than the traditional approach of using predefined datasets in ref. [ 50 ]. In vision-sensor-based recognition, 20 alphabets with numbers were recognized using Neural Network-based Hough Transform [ 51 ]. Due to the image’s dataset, a specific threshold value of 0.25 was used for efficiency achievement in the Canny edge detector. This system succeeded in achieving 92.3% accuracy. Fifty samples of alphabets and numbers were recognized by the Indian sign language system using a vision-sensor-based technique [ 52 ]. A support vector machine (SVM)-based classifier with B-spline approximation was used, which achieved 91% accuracy on average. A hybrid pulse-coupled neural network (PCNN) embedded with a nondeterministic finite automaton (NFA) algorithm was used collectively to identify image-based gesture data [ 53 ]. This prototype achieved 96% accuracy based on the best match phenomena.

Principal component analysis (PCA) along with local binary patterns (LBP) extracted Hidden Markov Model (HMM) features with 99.97% accuracy in ref. [ 54 ]. In ref. [ 55 ], hand segmentation based on skin color detection was used. For hands identification and tracking, a skin blob tracking system was used. This system achieved 97% accuracy on 30 recognition words. In ref. [ 56 ], Arabic Sign language recognition was performed using various transformation techniques like Log-Gabor, Fourier, and Hartley transform. Hartley transform and Support Vector Machine (SVM) and K-Nearest Neighbor (KNN) classifiers helped produce 98% accuracy. Combined orientation histogram and statistical (COHST) features along with wavelet feature techniques were used in ref. [ 57 ]. These techniques succeeded in recognizing static signs made for numbers from zero to nine in ASL. The neural network produced efficient results based on the feature values of COHST, wavelet, and histogram with 98.17% accuracy. Static gesture recognition based on alphabets was performed using neural network-based wavelet transform. This system achieved 94.06% accuracy in recognizing Persian sign language [ 58 ]. Manual signs were detected using the finger, palm, and place of articulation. Equipment arranged for manual sign extracted data from a video sequence and matched it with a 2D image of standard American Sign Language alphabets. The proposed setup resulted in accurate sign detection of alphabets [ 59 ].

Deep learning-based SLR models are also focused on vision-based approaches. The authors in ref. [ 60 ] focused on current deep learning-based techniques, trends, and issues in deep models for SL generation. Keeping in mind standard American Sign Language models, the authors in ref. [ 61 ] focused on the development of a deep image-based user independent approach. Their main work was based on PCANet features based on depth analysis. Another edge computing-based thermal image detection system was presented by the authors in ref. [ 62 ]. They worked on digit-based sign recognition model using deep learning approaches. Different computer vision-based techniques were applied for SLR tasks. A camera sensor-based prototype was used by the authors in ref. [ 63 ] to correctly identify sign postures. A convolutional neural network-based approach was implemented by using video sequences in ref. [ 64 ]. A three-dimensional attention-based model was designed for a very large vocabulary to acquire data from video sequences and classify them using a 3D-CNN model. Similarly, the same authors implemented a boundary adaptive encoder using an attention-based method on a regional Chinese language dataset in ref. [ 65 ]. A novel key-frame-centered clip-based approach was implemented on the same Chinese Sign Language-based dataset in ref. [ 66 ]. The regional Chinese sign dataset was classified using video sequences in the form of images. This vision-based novel approach produced challenging results in CSL. Another fingerspell-based smart model was developed by the authors in ref. [ 67 ]. They focused on the development of an Indian quiz portfolio that was based on camera-oriented posture classification. The main point of identification was based on ASLR models using a vision-based approach. A vision sensor-based three-dimensional approach was implemented by the authors in ref. [ 68 ]. Three-dimensional sign language representation was classified with the help of spatial three-dimensional relational geometric features. These 3-D data were classified and recognized with the help of a S3DRGF-based technique quite efficiently. Another vision-based technique focusing on color mapping-based classification and recognition was developed by the authors in ref. [ 69 ]. A CNN-based deep learning model was trained on the three-dimensional data of signs. Color texture coded-based joint angular displacement maps were classified efficiently with the help of a 3-D deep CNN model. Another advanced approach based on three-dimensional data manipulation for sign gestures was implemented in ref. [ 70 ]. The authors focused on classification and recognition of angular velocity maps with the help of the deep ResNet model. Connived Feature ResNet was deployed specifically to classify and recognize 3-D sign data. Another video sequences-based novel approach to classify sign gestures was implemented in ref. [ 71 ]. A BiLSTM-based three-dimensional residual neural network was used to capture video sequences and identify the posture data. A novel deep learning-based hand gesture recognition approach was implemented by the authors in ref. [ 72 ]. Image-based fine postures were captured and perfectly recognized using deep learning-based architecture. A virtual sign channel for visual communication was developed in ref. [ 73 ]. The authors’ main focus was to create a virtual communication channel for deaf-mute and hearing individuals. Another three-dimensional data representation for Indian sign language was developed in ref. [ 74 ]. The authors used an adoptive kernel-based motionlets-matching technique to classify gesture data. A video sequence and text embedding-based continuous sign language model was implemented in ref. [ 75 ]. Joint latent spaces-based data were processed using cross model alignment of a continuous sign language recognition model.

2.3. Non-Commercial Models for Data Glove

In non-commercial systems, most authors work on finger bend detection regarding any sign made. So, a large variety of different solo sensors or a combination of different sensors were used to detect this finger bending. The authors in ref. [ 76 ] developed a non-commercial-based prototype for sign language recognition. This system was completely based on the finger bending method. To detect finger bending, ten flex sensors were used. A pair of sensors were attached to two joints of each finger. To deal with analogue flex data, a MPU-506A multiplexer was used. Selected data coming from the multiplexer were sent to the MSP430G2231 microcontroller. A Bluetooth module was used to transmit data towards a smart cell phone. This captured data were then compared with the sign language database and the sorted result was converted into speech using a text-to-speech converter. The authors in ref. [ 77 ] also succeeded in developing a non-commercial sign language recognition-based prototype. This prototype included five ADXL 335 accelerometer sensors connected with an ATmega 2560 microcontroller system. Based on axis orientation, sign language was identified and transmitted via a Bluetooth module towards mobile application for text-to-speech conversion. In ref. [ 78 ], a prototype was developed to help handicapped people. This prototype converted finger orientation into some actions. For this purpose, five optical fibers sensors were used to collect finger bending data. These 8-bit analog data were used to train multilayered neural networks (NN) using MATLAB. So, six hand gesture-based operations were performed using the Backpropagation training algorithm. For data validation, a tenfold validation method was implemented on 800 sample records. Similarly, for Sign Language Recognition, the authors made a non-commercial prototype based on five flex sensors [ 79 ]. The MSP430F149 microcontroller was used to classify incoming analog data. These data were compared with standard American Sign Language (ASL) data, and the output was displayed on Liquid Crystal Display (LCD). Using text-to-speech methodology, the recognized letter was converted into speech using a good quality speaker. The authors in ref. [ 80 ] developed the Sign-to-Letter (S2L) system. This system contained six flex sensors and a combination of discrete-valued components and a microcontroller. Five flex sensors were attached to five fingers of the hand, and one sensor was attached to the wrist of the same hand. This combination of two different bending-based sensors succeeded in converting signs into the letter. The output of this system was displayed via the programming “IF-ELSE” condition. A combination of Light Emitting Diode- Laser Dependent Resistor (LED-LDR) sensors was used by [ 81 ]. MSP430G2553 microcontroller was used to detect signs made by finger bending. Using mentioned microcontroller, analog data were converted into digital and ASCII codes related to 10 Sign Language Alphabets. Converted data were transmitted using a Bluetooth module named as ZigBee, and recognized ASCII code was displayed on a computer screen. This code was also converted into speech.

Another fingerspell system was developed in ref. [ 82 ]. This prototype included four flex sensors and an accelerometer sensor. The main idea in this prototype design was to translate handmade signs into their corresponding American Sign Language (ASL) alphabets. For data acquisition, four deaf-mute individuals were gathered. This system succeeded in understanding 21 gestures out of 26. A hand gesture recognition system was developed by measuring inertial measurements along with altitude values [ 83 ]. For data acquisition, six Inertial Measurement Units (IMUs) were used in this prototype. Each IMU was attached to each finger, and one IMU was attached to the wrist. This experimental setup succeeded in collecting hand gesture data by an accelerometer and a gyroscope, and a magnetometer sensor provided values. These values were refined using Kalman Filter and processed through the Linear Discriminant Analysis (LDA) algorithm. Overall, 85% accuracy was achieved by using this prototype in hand gesture recognition.

2.4. Commercial Data Glove Based Models

Besides following the traditional way of making cheap data gloves, some of the authors used a commercial data glove named “CyberGlove”. This commercial glove was specifically designed for deaf-mute people. A lot of affected communities and research centers used this glove for communication and research purposes. CyberGlove was manufactured precisely with the combination of 22 sensors embedded on the glove. The basic structure of the glove contained four sensors attached in between fingers and three sensors attached on each finger. Palm sensors and wrist bending measurement sensors were also included in this commercial prototype. This smart, thin layer, elastic fiber-based sensor glove had an approximate cost of $40,000 for each pair. Using this CyberGlove, authors in ref. [ 84 ] applied a combination of neural network-based algorithms to measure the accuracy and efficiency of the system. Finger orientation and hand motion projection were captured with a smart CyberGlove embedded with a 3D motion-tracker sensor. This analog signal data were transferred towards a pair of word recognition network and velocity network algorithms. These algorithms worked on 60 American Sign Language (ASL) combinations and obtained an accuracy of 92% and 95%, respectively. A posture recognition system based on a 3D hand posture model was developed in ref. [ 85 ]. A Java 3D-based model helped in classification and segmentation of real-time input posture data. These data were compared with pre-recorded CyberGlove-based data with the help of an index tree algorithm. Another CyberGlove embedded with a 3D motion tracker named as Folk of Birds was used for sign language recognition. CyberGlove-based data containing bend, axis, motion, and hand orientation were fed into the multilayered neural network. The Levenberg-Marquardt backpropagation algorithm was used for segmentation and sign classification. This prototype succeeded in producing 90% accuracy in American Sign Language (ASL) recognition [ 86 ].

In the sensor-based sign language recognition domain, another advancement was made by introducing a new five-dimensional technology commercial data glove commonly known as the 5DT data glove. This 5DT commercial glove was made in two variants, one with five fiber optic sensors and the other with fourteen optic sensors. 5DT manufacturers named this fiber optic smart data device ultra-motion. Internationally this data glove’s cost was approximately $995. In five sensor-based data gloves, each optical sensor is attached to each finger, and one sensor is attached for hand orientation detection. In 14 optic fiber sensors, two sensors are kept in contact with one finger, and a sensor is also attached in between fingers to check finger abduction. Two-axis measurement-based sensors are also attached in that glove to determine axis and orientation, including pitch and roll of the hand. So, these 5DT-based bright data gloves were used by authors for Japanese Sign language recognition [ 87 ]. The main idea of developing this system was to automate the learning system. A 3D model based on the 5DT 14 sensor-based smart data glove for simulating signs was made. This system highlights motion errors for beginners and helps understand hand motion completely via a 3D model. To facilitate communication for deaf and mute people, another advancement was applied using a combination of 5DT data gloves with five embedded sensors. Data obtained by using ultra motion glove were trained using the MATLAB simulator. A multilayered neural network with five inputs and 26 outputs was utilized for the training model for sign language recognition. A series of NN-based algorithms like resilient, back, quick, and Manhattan propagation, including scaled conjugated gradient, was used for the training model [ 88 ].

Another advancement in sign language recognition was seen in ref. [ 89 ]. The authors used a DG5 V hand data glove for data acquisition. The internal structure of the DG5 V hand data glove contained five flex or bending sensors with one three axis accelerometer and three contact sensors. This data glove was capable of transmitting acquired data wirelessly. The overall system was made remotely functional by using a battery. The DG 5 V commercial data glove was used for American and Arabic Sign language recognition systems. The authors focused on Arabic Sign Language, whereas this data glove had already been used previously for American Sign Language. The only left-hand glove cost $750. A pair of DG5 V data gloves were used in Arabic Sign language recognition. Two glove-based models succeeded in acquiring data for 40 sentences. This dataset was classified using a modified K-Nearest Neighbor (MKNN) algorithm. The overall system succeeded in producing 98.9% accuracy. The hand gesture cannot be fully recognized without knowing hand orientation and posture. Therefore, an advancement in the traditional system was brought by fusing the concept of Electromyography and inertial sensors within the system [ 90 ]. Using a combination of the Accelerometer (ACC) sensor with Electromyography (EMG), the authors achieved multiple degrees of freedom for hand movement. This setup was used for Chinese Sign language recognition. The EMG sensors were attached at five muscle points over the forearm, and the MMA7361 modeled 3-axis accelerometer was attached over the wrist. Multi-layered Hidden Markov Model and decision tree algorithms were used for recognition purposes, which succeeded in producing 72.5% accuracy.

The same setup of Accelerometer and Electromyography was used for German Sign Language. The authors used a single EMG with a single ACC sensor to recognize a small database of German vocabulary. The training was performed on seven words with seventy samples for each word. K-Nearest Neighbor (KNN) and Support Vector Machine (SVM) classifiers were used. The system succeeded in achieving an average accuracy of 88.75% and 99.82% in the case of subject dependency [ 91 ]. A similar hybrid approach of Accelerometer and Electromyography was used for the Greek Sign language recognition system. The experimental setup consisted of five-channel Electromyography and an accelerometer sensor. The experiment was conducted on the singer with the intrinsic entropy mode. Experiments repeated ten times on three native singers produced training data. So, the system was trained using the intrinsic entropy mode on MATLAB. The system’s overall accuracy was 93% collectively (without the personal effect of native singers involved for data collection purposes) [ 92 ].

2.5. Hybrid Recognition Models

A vision-sensor-based approach was also adopted in sign language recognition. The previously used combination of electromyography with an accelerometer was replaced with a vision-sensor-based hybrid approach. In the hybrid approach, the authors used a variety of accelerometers with vision-sensor cameras. The purpose of a hybrid system was to enhance data acquisition and accuracy. The vision-sensor-based hybrid prototype contained red, green, and blue (RGB) color model cameras, depth sensors, and accelerometer-based axis and orientation sensors. This combination of the smart hybrid approach was used for gesture identification purposes. The experimental setup included seven IMU accelerometer sensors attached to the arm, wrist, and fingers. For data acquisition, five different age group sign language speakers performed ten times repeated forty gestures. Parallel Hidden Markov Model (PaHMM) succeeded in producing 99.75% accuracy [ 93 ]. Another combination of an accelerometer-based glove and camera sensor was used for American Sign Language recognition. The experimental setup contained a camera attached to a hat for detecting correctly made signs. Nine accelerometer sensors were used for gesture formation: five attached on each finger and two on the shoulder and arm to detect arm and shoulder movement. Two sensors were attached to the back of the palm for hand orientation measurement. This setup was tested on 665 gestures using the Hidden Markov Model (HMM) and produced a per sign accuracy of 94% [ 94 ].

2.6. Framework-Based Recognition Models

Most of the articles [ 95 , 96 , 97 , 98 ] followed a predefined framework for sign language recognition. The main objective of using the same framework was to enhance data accuracy and dataset efficiency. The authors in ref. [ 99 ] correctly developed a sign language system and implemented that system using different classification and recognition algorithms. The authors in ref. [ 100 ] succeeded in creating a Vietnamese Sign Language framework that worked wirelessly. A two-handed wireless smart data glove was designed and developed using bend and orientation measurement. The experimental setup included MEMS accelerometer sensors attached just like the Accele Glove and as an addition one more sensor was attached to the palm of hands for orientation measurement. Wireless communication was made feasible by using a Bluetooth module attached to a cellphone. The user-generated sign was compared with the standard sign language database, and the correctly found result was displayed on a cellphone screen. Finally, a text-to-speech Google translator was utilized to convert the recognized sign alphabet into speech. This sign language framework succeeded in producing a reasonable accuracy. Similarly, the authors in ref. [ 101 ] developed an Arabic Sign Language recognition system. The main purpose of developing another framework for static sign analysis was to minimize the number of sensors on data gloves. This experiment was simulated on the Proteus software. The two-handed glove system contained six flex sensors, four contact sensors, one gyroscope, and one accelerometer sensor on each hand.

Another algorithmic-based sign language recognition framework was designed in ref. [ 102 ]. Stream segmentation-based sign descriptors and text auto-correction-based algorithm were utilized. The system also provided software architecture of descriptors for hand gesture recognition. The Sign Language-based Interpolator, which converted text into speech, was also designed in ref. [ 103 ]. The overall system framework contained four basic modules that included the smart data glove, training algorithms for the input sign dataset, wirelessly visible sign application, and sign language database for matching the input sign with the standard repository. A very simple resistor-based framework was developed and implemented by ref. [ 104 ]. The authors used ten resistors and detected finger movement only. This was a medical application used only for finger flexion and extension. This was a very simple, low-cost, efficient, reliable, and low-power trigger. A data glove containing resistor-based framework was directly connected with a microcontroller which further transmits captured data towards a computer for finger movement analysis. Another simple gesture recognition-based framework was developed by ref. [ 105 ]. The smart spelling data glove consisted of three bending sensors attached on three fingers. The authors worked only on five gestures, including thumbs-up and rest. Input gesture data were fed into the microcontroller for recognition purposes, and analyzed gestures were combined in a row to form meaningful data before transmitting them to the receiver. A detailed review on all the frameworks based on Chinese Sign Language was discussed in ref. [ 106 ]. All the technical approaches that are only related to the regional Chinese Sign Language recognition and classification mechanisms were discussed in detail. Another detailed review on all the wearable frameworks and prototypes related to sign gesture classification was discussed in ref. [ 107 ]. The authors focused on maximum frameworks that are related to and had been previously used by authors in the same field. This is also a review article with good depth of technologies and frameworks in SLR.

3. Components and Methods

This section emphasizes different methods of prototype formation and techniques used to perform sign recognition. In prototype formation, the developer’s main focus is to design such a system that effectively finds a solution for the current problem. So, there are two main classes for prototype formation. One is different sensor-based, and the other is vision-sensor-based. The hybrid approach is also the combination of these two main approaches. Considering the vision-sensor-based approach, here, the prototype contains a camera as sensor for gesture detection and a CPU for internal algorithm processing. For different sensor-based systems, the input is captured with a sensor data glove for sign detection and a CPU with some machine learning-based algorithms. The output in both cases is the same, which can be any monitor, liquid crystal display (LCD), or any analysis window for operational observations. Mainly, sign language recognition systems can be divided into three main categories. One is the data acquisition process where sign data in any format, either by camera or by combination of sensors, are obtained. Secondly, the processing of acquired data. This step includes a microcontroller board or processor for data processing purposes. Thirdly, the display of results obtained after data processing. This display can be any monitor, speaker, smartphone, or LCD with sign language translated into some meaningful information. The whole system is represented in Figure 4 given below.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g004.jpg

Sign recognition components.

3.1. Data Acquisition Unit

The data acquisition unit is considered the most important unit of the sign language recognition system. Sensor-based or vision-sensor-based approaches use cameras and combinations of sensors for data acquisition purposes, respectively. In comparison, the hybrid approach uses both sensor and camera for acquisition purposes. For the vision-sensor-based approach, high-definition Kinect cameras are mostly used, and for the sensor-based approach, different sensors like flex, gyro, or tactile are used.

Bending Sensor: Considering sign language systems, it is clearly observed that finger bending plays a vital role in gesture formation. This is the reason that such sensors are mostly used, which provide efficient results related to finger bending. The flex sensor shown in Figure 5 is the most helpful sensor in sign recognition as it provides a good effect of figure bending [ 12 , 14 , 26 ].

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g005.jpg

Flex sensor or bend sensor [ 12 , 14 , 26 ].

Considering the internal structure and working of the flex sensor, it is made of resistive material like carbon. They are very flexible, lightweight, and easily attachable. Internationally, the flex sensor is made in two versions: one with 2.2-inch length and the other with 4.5-inch length, with the market price ranging from $15 to $20 approximately. Both sensor versions are application-dependent and are used accordingly. Initially, when no bending is performed, it produces an average resistance value depending upon the material. The value of resistance increases as more bending is performed. Thus, the degree of bend and resistance value is directly proportional to each other. Typically the resistance of the flex sensor lies in the range of 30 k to 40 k. This sensor works on the voltage division rule shown in Figure 5 . Flex sensor resistance is divided by the sum of flex sensor resistance and normal resistance embedded in the circuit and multiplied by voltages in the circuit.

Another important sensor in gesture recognition is the optical sensor. This is a sensor that converts incident light rays into electrical pulses or signals. The purpose of using this type of sensor is to detect finger bending angles. This sensor is mainly used in an environment where an indicated person is unable to move their fingers. Thus, the generated electrical signal helps in completing some specified tasks. Incident light produces different electrical signals for different bending orientations of the fingers. Like a flex sensor, this is also a material-dependent sensor. The technology used in sensor formation depicts the amount of incident light [ 87 ]. An advanced approach for bend detection is obtained by using a light-emitting diode (LED) and a light-dependent resistor (LDR) together. A light-dependent resistor produces variable resistance whenever an intensity light falls on it [ 81 ]. So, in case of no bending, a very low value is received, and when a finger is bent with some angle, it is detected using this sensor combination.

Another resistive force sensor, known as the tactile sensor, shown in Figure 6 , is used normally for sign recognition. This sensor is made of thick polymer material whose resistance can vary. So, whenever a force with some value is applied to the surface of the tactile sensor, its resistivity changes due to its structural material. Similarly, whenever pressure is applied on the material surface, the force applied can also be measured ranging from 1000 N to 100,000 N. As sign language is a combination of different hand and finger orientations, pressure and force values can vary accordingly. There are some orientations when no force is applied to the tactile sensor. In that case, the circuit acts as an open circuit. So, when pressure on the tactile surface is increased, its resistance decreases due to the materialistic property of the sensor [ 16 ]. Considering the working mechanism of the sensor, one can easily detect finger bending due to pressure applied on the sensor and can also measure finger orientation and angle position [ 23 , 87 ]. Considering the cost and size of the sensor, it is easily available globally with varying costs from $6 to $25. Size varies according to the price and experimental usage of the sensor. There are different versions of tactile sensor sizes in which 1, 0.5, and 0.16 inches of diameter are normally available.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g006.jpg

Tactile sensor or force resistive sensor [ 101 ].

Figure shape and orientation can also be detected using an applied magnetic field. This principle uses voltage variation values obtained from electrical conductor material. This prototype is known as the Hall Sensor. So, as the name suggests, the Hall Effect Magnetic Sensor (HEMS) given below in Figure 7 uses an applied magnetic field to measure voltage variations across the conductor. A lot of versions of the Hall sensor are used for effect measuring purposes, but mostly the unipolar MH183 is normally used. This unipolar magnetic sensor finds the relevant South Pole to perform voltage variation tasks. This lightweight sensor is placed on the tip of each finger. To measure variations, a good capacity magnet is placed onto the palm with the South Pole faced upward. When the magnet sensor is brought in front of the palm magnet, voltage variation ranging from 100 to 400 mV is generated. This prototype is known due to its high level of accuracy.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g007.jpg

Smart glove with Hall sensors connected to the fingertip [ 25 ].

Some of the articles used the accelerometer ACC sensor to determine the shape and orientation of fingers [ 19 , 20 , 21 , 77 , 82 , 91 ]. If bending is the only target to be achieved, then only the flex sensor can deal with the problem. However, in sign language, hand movement and orientation are also considered to understand any sign perfectly. In the above-discussed sensors, orientation information cannot be captured. Therefore, an accelerometer sensor is used to determine the hand orientation for sign language recognition. The accelerometer works on Degree of Freedom (DoF) principles. This sensor has axis information along with different acceleration values provided by the sensor. This sensor is also embedded in a smart glove to perfectly distinguish signs made by the user. There are a lot of versions of accelerometer sensors that are used accordingly depending upon the situation to be faced. Normally for axis and orientation purposes, the ADXL335 version given below in Figure 8 , made by Adafruit Industries in New York, is used. ADXL335 produces three-axis acceleration and three angles orientation. Physically, the device is very lightweight, small, thin in shape, and low power consuming. Depending upon the functionality and size of the sensor, its price ranges from $10 to $30. ADXL335 uses applications based on tilt-sensing to measure gravity acceleration statically. Dynamically, acceleration is measured from such devices that have vibrating, shocking, and motion effects. Talking about the structure of ADXL335, it is a silicon-based microstructured component. It is made of a silicon ship where a silicon wafer provides resistance values regarding applied forces of acceleration. An accelerometer also provides deflection values. So, deflection is measured using a capacitor that is attached to a moving object. Deflection variation results in changed capacitor values. These changes values produce output sensor values named acceleration.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g008.jpg

The ADXL335 3-axis ACC with a three-output analog pin x, y, and z [ 19 , 20 , 21 ].

As discussed above, the inertial measurement unit (IMU) works on six degrees of freedom (DoF). This sensor provides three directional acceleration and three directional orientation. In some applications, only the accelerometer is not enough to cope with motion-related tasks. Rather, the unaffected gravity gyroscope provides help to acquire the best motion-related results [ 20 , 35 , 38 , 83 ]. Therefore, the gyroscope and accelerometer are embedded into a single chip for the best and most efficient results, as shown in Figure 9 below. For the processing of tasks, a micro processing unit (MPU) is used. Among a variety of MPUs, the most famous is MPU 6050, which is available internationally at prices ranging from $25 to $30. The MPU 6050 is a six-DoF IMU device that operates on 3 V to 5 V with 400,000 Hz frequency using I2C communication. These data communication values are received with the help of 16 bit analogue-to-digital converter embedded into a micro controller unit (MCU).

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g009.jpg

The six DoF IMU MPU6050 chip consists of a 3-axis ACC and 3-axis gyroscope [ 20 , 35 , 38 , 83 ].

Another updated IMU 6050 was obtained by adding the three-axis magnetometer effect on the existing six DoF IMU. This new version is known as the MPU 9250, with prices ranging from $15 to $20. This new device is now based on a microelectromechanical system having the ability to detect and operate on magnetic fields. MPU 9250, given below in Figure 10 , provides a three-axis gyroscope, three-axis accelerometer, and three-axis magnetometer. Using 9 DoF IMU, hand orientation and movement for sign language gestures can easily be analyzed. MPU 9250 version of IMU proved better than its previous versions, including MPU 9150 in energy and power consumption.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g010.jpg

The 9 DoF IMU, MPU-9250 breakouts [ 30 ].

3.2. Processing Unit

Machines have a central module working as the brain of the system. The processing unit is known as the brain or the system’s mastermind, which performs complete processing tasks from data acquisition to display of results. Acquired data are processed and recognized accordingly by the processing unit. This processed data, which are any sign data, are displayed as outputs using output ports of processing units. Processing units play a vital role in the development of prototypes designed for sign language recognition. A variety of processing units starting from small microcontroller-based chips to complete processing boards has already been used. From the literature review, ref. [ 12 ] used an ATmega AVR-based microcontroller chip shown in Figure 11 a given below. Atmega is a Reduced Instruction Set Architecture (RISC)-based 8-bit AVR microcontroller. Both memories read and write operations are performed using 32KB flash memory. Similarly, ref. [ 37 ] used an 8-channel based 10-bit Analogue to a Digital convertor (ADC), normally named MSP430G2553, as shown in Figure 11 b given below. Ref. [ 12 ] used ARM7 and ARM9-based microcontrollers for processing purposes. Refs. [ 13 , 14 , 20 , 21 , 22 , 23 ] used the most common, open-source processing unit known as Arduino-based microcontroller shown in Figure 11 c given below. Arduino has a lot of versions in the market. Among which Arduino Nano, Uno, and mega are the most used microprocessors. Based on the literature review analysis, ATmega328P model-based Arduino Uno is commonly used. This Arduino version communicates with the USB port system and operates on 16M Hz quartz crystal frequency. For getting sensor values and displaying output data values, Arduino has six Analogue pins and 14 digital input/output pins. Ref. [ 17 ] used a four-CPU core-based Samsung Exynos5 Octa 5410 processor with each core consisting of Cortex A-15. The abovementioned microprocessor is named Android XU4, as shown in Figure 11 d given below. This microprocessor is produced by Hardkernel2 and manufactured in South Korea.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g011.jpg

( a ) ATmega microcontroller, ( b ) MSP430G2553 microcontroller, ( c ) Arduino Uno board, and ( d ) Android XU4 minicomputer [ 17 ].

3.3. Output Unit

The final section in the sign language recognition system is based on the monitoring or output analysis unit. The sign language recognition journey starts from acquiring sign language data, processing the acquired data to extract useful information, and monitoring the desired result using the appropriate output unit. For output analysis purposes, researchers used some sort of display. Researchers including refs. [ 81 , 100 ] used computer or laptop screens for result analysis. Some authors and prototype designers, including refs. [ 14 , 15 , 20 , 23 ], used liquid crystal display (LCD) for results analysis. Some authors have used speakers for text-to-speech analysis [ 93 ]. Authors using both LCD and speakers are ref. [ 27 ]. To send analyzed data in the form of recognized gestures, some authors have used smartphones [ 28 ]. They have used data transmission mechanisms from the computer after algorithmic processing and reception of recognized results on smartphone screens. All the above mentioned techniques have been used by authors until now.

3.4. Gesture Classification Method

The role of software is very important in any field where data analysis or simulation-based results are needed. The main functionality of the software is to provide a platform where input data can be processed using the best and efficient techniques. These applied techniques will help to produce desired results. So, using good software, methods normally named as algorithms are developed. In the case of sign language recognition (SLR), normally classification, template matching, static posture matching, long term fuzzy logic, and artificial neural network (ANN)-based algorithms have been used. In static posture matching, new incoming data in the form of testing data are matched with pre-defined or trained data using data statistics. Based on classified statistics, data related to the class of best match are assigned accordingly. This is also known as template matching because data used by classification algorithms make a classification template for new incoming testing data [ 16 , 25 ]. The only complication in template matching is the increased number of classes and complex computations.

Introducing the concept of machine learning in the classification process results in a good recognition approach. Among the domain of artificial intelligence, neural networks are used as the best classifiers. Artificial Neural Network (ANN)-based classifiers have been succeeded in both static and dynamic gesture recognition [ 39 ]. Data acquisition using data glove and classification obtained using machine learning-based artificial neural networks also produced good results in posture classification [ 84 ].

Falling in the domain of ML-based ANN, Discriminant Analysis based on dimensionality reduction proved to produce less complicated and efficient results. Using improved clustering, linear discriminant analysis (LDA) succeeded in producing accurate classification results [ 41 ]. Another ML-based classification algorithm includes Hidden Markov Models (HMMs). HMM succeeded in speech recognition, gesture recognition, and posture recognition tasks [ 54 , 93 , 94 ]. In the sign language recognition domain, K-Nearest Neighbor (KNN) produced good results for American Sign Language (ASL) recognition. Integrating different ML-based algorithms like a combination of HMMs with K-Nearest Neighbor (KNN) produced accurate results of hand gesture classification. Similarly, the combination of KNN with Support Vector Machine (SVM) succeeded in producing better posture classification results [ 56 , 89 ]. Fuzzy logic is normally used in binary classification. However, whenever human interaction is involved in the form of decision making, long-term fuzzy logic is used for wide classification. In Sign language recognition, many classes have to be interacted using long-term fuzzy logic recognition [ 13 , 19 ].

3.5. Training Datasets

Classification is the process of a training model to classify testing data. The factors that affect the accuracy of any system include data capacity for training purposes. Many training data are directly proportional to grater classifier accuracy. Choosing a perfect training dataset is also another challenge, which is a subjective matter. A detailed analysis can be written on the types, size, properties, audience and specifications of datasets. In this section, a generalized overview of different datasets is discussed. To obtain the perfect training data, a sensor-based glove is used. This is since sign language is the combination of different fingers orientations. So, any orientation made would produce some sensor values. These sensor values collectively act as training data for classifiers. In a sign language recognition system, training datasets are classified into different domains. Some authors, including refs. [ 16 , 23 ], used a very small number of alphabets of sign language for recognition purposes. Different regions have different types of sign language. So, gesture variation based on region produced different datasets. Region-wise word-based signs are produced by the authors in refs. [ 19 , 27 , 32 , 76 ].

Some researchers have extended the work and included standard alphabets and numbers of sign language. Words, numbers, and alphabet-based datasets were used by ref. [ 37 ] and succeeded in developing a standard system of sign language. Some authors used already available datasets for sign language and extended work already work by purifying and classifying available sign language datasets [ 83 , 87 ]. Initially, researchers developed a system only to recognize sign language numbers [ 25 , 28 ].

Refs. [ 77 , 90 , 101 ] worked on numbers and alphabets and extended the training dataset. Mapping a standard sign language system towards daily routine problems was another challenging task. Based on region-wise spoken words, sentences, and phrases, gestures made were converted into meaningful information. These real-time-based sentences covered shopping, greetings, sports, and educational domains. Based on literature review analysis, studied literature is divided into different categories based on experimental domains. Figure 12 given below provides an analysis of the number of articles of prescribed domain like numbers, numbers and alphabets, real-time postures, alphabets and words or phrases with real-life spoken sentences, etc.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g012.jpg

Number of articles on each variety of gestures.

Lastly, considering the modern tools and technologies of today’s era, new datasets have been used for the recognition of complex sign postures. Recent studies have clearly mentioned the utilization and importance of these complex datasets for the understanding of complex postures. Despite traditional regional, local, small, and standard ASL datasets, there are some other advanced datasets that have been used by many researchers of the current times. The DEVISIGN-D dataset is one of the mentioned datasets that contains data of 500 daily vocabularies. Data cover eight different signers where vocabularies are recorded twice for four signers, i.e., two male and two female. The total dataset includes 6000 vides [ 71 ]. Similarly, the SLR-DATASET contains 25,000 labeled video instances with more than 100 h of data recording time. During dataset collection, 50 different signers were involved in the collection process. Manually, the dataset was testified from Chinese Sign Language professionals [ 68 ].

For Chinese Sign Language, two isolated sign language word datasets were used [ 65 ]. ID1 is the public large-scale vocabulary Chinese sign language, CSL, a dataset that contains 125k samples consisting of 500 sign words, each of which was recorded five times by 50 signers; the larger scale vocabulary dataset ID2 was collected by the authors using the Microsoft Kinect 2.0 device for CSL. The ID2 dataset is, in its turn, divided into two parts, ID2-spit1 and ID2-spit2 (specifically, ID2-split1 contains 50k samples consisting of 500 sign words, each of which was recorded 10 times by 10 signers; and ID2-split2 contains 20k samples consisting of 2000 sign words, each of which was recorded one time by one signer) and one continuous sign language sentence dataset CD which contains 25k videos consisting of 100 different sign language sentences; each sentence was recorded five times by 50 sign speakers; the length of each video is 4~8 sign words, which is about 15 s.

Finger spelled Indian Sign Language (ISL) signs were captured for training the model used for sign recognition. Capturing was carried out through mobile cameras, laptop cameras, and digital SLR’s. Signs corresponding to 20 finger-spelled alphabets were captured. This was collected with the help of 15 signers: six male and nine female. Close to 1500 images were collected for each sign, making the total number of images collected to about 20 × 1500. Among the captured signs, certain alphabets like “A” and “B” were double handed and certain others like “C” were single handed [ 67 ]. In the same way, some other datasets that were used in the literature review consisted of the BVC3DSL 3-D Sign Language Dataset and other publically available datasets like HDM05, CMU, and NTU RGBD skeletal data.

4. Machine Learning

Machine learning is the most widely used application of AI, making the system realize the environment intelligently. ML provides the system the ability to learn and improves results through the learning process. Based upon the learning process, ML has two types, i.e., unsupervised and supervised. In supervised machine learning, the training process is involved. The machine, at first, is trained with labelled samples. Samples of input or output data are provided during the training process. Supervised machine learning is used for regression and classification.

Algorithms like Decision trees, SVM, and artificial neural networks are used to implement supervised machine learning approaches. In unsupervised machine learning, no labelled data are provided. A predefined knowledge is provided in unsupervised machine learning. The main goal of unsupervised machine learning is to find a specific pattern in input data. Unsupervised machine learning is used for clustering. Algorithms like self-organizing maps (SOM), Hidden-Markov model (HMM), and K-means clustering are used to implement an unsupervised machine learning approach. Table 2 lists and compares the advantages and disadvantages of ML approaches.

Machine learning approaches: a comparison.

For more prediction accuracy, a more powerful class of machine learning is in practice, i.e., deep learning. Neural networks with a higher ability to derive meaningful data from complex and unreliable data are used to detect complex structures and patterns from input data that humans cannot even detect. Adaptive learning, self-organizing, and fault tolerance of a model are the current benefits of deep learning. Requiring a larger dataset, more power consumption (faster CPU/GPU) and higher computational cost (multi-layer operations) still make it effective in terms of accuracy.

Considering the sensor domain of Sign Language, primarily machine learning-based algorithms have been applied. Support Vector Machine (SVM), K-Nearest Neighbor (KNN), Decision Tree, Discriminant Analysis, and Ensemble with all of their variants were utilized among those algorithms. Some of the experimental setups also applied Artificial Neural networks (ANN) on sign language datasets. Table 3 given below lists the most frequently used machine learning algorithms described above with all of their variants. Most probably, variants of existing algorithms are formed by increasing layers dimensions or by increasing the number of neighboring nodes for feature extraction.

Machine learning algorithms with variants.

All algorithms mentioned in Table 3 are related to the sensor-based domain of sign language recognition systems. Sign language data originated from the smart sensor embedded data glove were analyzed through these algorithms. These algorithms were used for training recognition models. The overall dataset acquired was divided into two main streams.

  • Training Data;
  • Testing Data;

The dataset obtained was used first for training purposes. Almost 80% of the dataset was utilized for training recognition models. After that, training model accuracy was measured, i.e., how accurately datasets have trained recognition models. Most of the modern simulation tools have built-in classification learner modules that help in the training data model and display system training accuracy. After the training procedure, testing was performed; 20% of the dataset that was left after the training model was utilized for testing purposes. Dataset samples were almost different each time when any sign was captured using sensors. Therefore, repetition chances were minimal in this scenario. Testing dataset points are most likely different than training data points. This is the challenge and recognition ability of the trained model to predict new incoming data accurately. The efficiency of the system is completely dependent upon the prediction accuracy of the recognition system. The more accurately the system can classify new incoming data, the more accurate and efficient the system will be. Another challenge in the sign recognition domain is to improve system efficiency by improving the recognition and response time of sign language classifiers. Mostly in the limitation section, authors complain about the slow processing of their sign classifier. A slow response can cause the following two reasons in the sensor-based domain. One is the involvement of a bulk of sensors for capturing input sign data. Embedment of many sensor data for capturing input signs results in slow system performance. This sluggish performance is caused by a multidimensional irrelevant data burden on the recognition system. To remove this data redundancy, Principal Component Analysis (PCA) is used for dimensionality reduction. PCA suppresses multidimensional data and extracts only those data which are helpful in performing recognition smartly and efficiently. The second reason for slow data processing is the wrong choice of the training algorithm. This results in very bad system accuracy and reduces system performance as well. So before moving towards passing the sign language dataset to recognition algorithms, researchers should know the working capability, response time, calculation parameters, and data dimension of the algorithm which is going to be utilized for sign language recognition.

Vision-sensor-based sign language recognition is considered the second most important domain in posture recognition. Fundamental components of capturing user sign language postures include high-definition camera and a processing unit, normally a computer or laptop. This is not considered a good real-time recognition method as a user cannot carry a camera all the time with them to capture sign posture and a processing unit for gesture recognition. The vision-sensor-based approach is mostly used in experimental analysis-based procedures where researchers work in the lab on real-time or static sign postures.

Deep learning-based algorithms with a multi-layered structure are used in a vision-sensor-based approach for training purposes. Usually and most commonly used vision-sensor-based sign language recognition algorithms include Neural Networks, Artificial Neural Network, Convolutional Neural Network, Hidden Markov Model, Support Vector Machine, Recurrent Neural Network, Principal Component Analysis (generally utilized for dimensionality reduction), and Fourier Transform and its other versions including Scale Invariance Fourier Transform (SIFT). These algorithmic approaches are utilized to cope with emerging issues in the sign language recognition domain. Individual algorithms or a combination of two or more algorithmic techniques are used to make the system work efficiently and improve overall system accuracy.

Lastly, the hybrid approach can utilize both algorithmic domains as it is the combination of both vision-sensor-based and sensor-based approaches. A hybrid system is also used for experimental purposes because it irritates users to lift all equipment for gesture translation. The main purpose of all machine learning-based algorithmic techniques is to provide the best recognition results for the deaf-mute community.

5. Article Filtration and Distribution Analysis

To perform literature review, a large variety of online databases for research articles were consulted. Among these databases, Web of Science (WOS), Science Direct, Institute of Electrical and Electronics Engineering (IEEE), and Multidisciplinary Digital Publishing Institute (MDPI) were focused on. Several journal, conference, and review articles related to the field of focus were addressed. Analyzing domain specifications, data filtration was performed, and duplicate and irrelevant research articles were removed. Further filtration was performed after studying the abstract of articles. The main goal of this study kept the focus on Sensor-based Sign language recognition using bright data gloves. The methodology section of articles resulted in further filtration as vision-sensor-based and hybrid technology-based articles were not in focus besides a few one. The last filtration procedure included articles published only in the English language. This whole method of filtration was suggested by ref. [ 66 ]. Figure 13 , given below, demonstrates the filtration procedure in a straightforward way.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g013.jpg

Article filtration procedure for literature review.

A complete prototype or sensor-based system embedded with the latest modules developed for the deaf-mute community was the focus of this study. So, the complete literature review was divided into categories. A major portion of this study was based on prototype development. Almost ¾th amount or 75% of reviewed articles were based on the development of such systems that could translate sign languages into texture format. These articles were based on a prototype that had been designed and implemented perfectly. Some articles focused on proposing methodologies for developing new sign language-based translation and recognition systems. Some of the highlighted articles focused on existing systems. These authors’ highlighted shortcomings and disadvantages of using such pre-developed prototypes and proposed new prototypes that were the enhanced version of existing products.

They were based on such a division of systems; sign language literature-based material is considered to have four main categories. These categories include development-based articles, complete design or prototype-based articles, articles based on suggested or enhanced material design based on existing prototypes, hybrid technology (sensors embedded with vision-sensor-based methodology) based articles, and literature survey or review-based articles. Figure 14 below provides an analysis of filtered articles that consisted of further classification based on metadata and approaches used in writing the article. Filtered articles were categorized according to the number of articles per domain. As most articles are based on prototype development, they lie on the left side with more area covered. Similarly, proposed frameworks, survey and review articles, and hybrid system-based articles were categorized accordingly by considering the number of articles used for this paper.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g014.jpg

Categorical division of filtered articles used in this paper.

Results are considered the most critical parameter in analyzing the accuracy and efficiency of any system. According to the literature review, a wide range of different sensor-based combinations succeeded in achieving good results. Different databases were used for gathering data related to the sign language domain. All literature review was based on data collected from these databases. These databases included IEEE Xplore, Web of Science (WoS), Elsevier, and Science Direct. Almost half of the papers were collected from IEEE Xplore, and the remaining papers were collected from Elsevier, WoS, and Science Direct. There were many articles in the case of general search related to sign gesture recognition. That was due to the involvement of vision-sensor-based and hybrid-sensor-based sign gesture techniques. The focus of this article was based on the literature review of sensor-based sign gesture recognition. Therefore, the number of articles was reduced and limited to almost a hundred in numbers. Figure 15 nominated databases and highlighted the process of gathering data review in this article.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g015.jpg

Databases used for Sign language recognition-based articles.

These articles were cited many times. Their impact factor reflected quality information and innovation presented in these articles. Table 4 lists the publisher of the article, citations of each article, and their impact factor. The impact factor of each article was extracted from the website of that specific publisher, and citations of any article were grabbed from Google Scholar.

Impact factor and citation analysis of reviewed articles.

Articles were filtered and analyzed based on three more different parameters (shown in Figure 16 ) including region-based sign language recognition, type of gesture made for sign language i.e., static or dynamic gestures and analysis based on hardware module equipped on one hand or two hands.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g016.jpg

Article distribution based on region, gesture type, and number of hands used.

If we considered first parameter of analysis (region-based sign language recognition) then most of articles were based on American Sign Language recognition. Almost half of the authors in this literature review had focused American Sign Language for gesture recognition. Whereas, only three or four articles were found related to Arabic Sign Language, Indian SL, Taiwan SL and Malaysian SL. Remaining regions like Pakistan, Japan, Greek, German, France, Chinese and Australia contained only one or two articles in Sign language recognition domain. Based on gesture type analysis for sign language, two main domains were located.

Sign language contained letters, words, and sentences just like any other normal language. So gestures made for letters, words or sentences divided its recognition process into two main streams. One with static gesture recognition and second with dynamic gesture recognition. Although, static gesture recognition was proved easier than dynamic gesture recognition, and therefore almost 48 researchers had worked on static gesture recognition and almost 15 researchers had worked on dynamic gesture recognition. There was another category which did not mention any of above domain and the number of articles that was found were 18. Finally, articles analysis based on number of hands used for gesture recognition were analyzed. Almost half of the researchers had used one hand for gesture recognition. Other had used two hands or did not mention the prototype in their articles.

6. Analysis

Sign language recognition is one of the emerging trends in today’s modern era. Much research has already been conducted, and currently, most researchers are working on this very domain. The focus of this article is to provide a brief analysis of all related work that has been done until now. For this purpose, a complete breakdown of all research activities was developed. Some authors worked on a general discussion about sign language. Most of their work was based on introduction and hypothesis to deal with sign language scenarios. There was no practical implementation of the proposed hypothesis; therefore, these authors lie in the general article category. A group of authors worked on developing systems that we are able to recognize sign gestures. This group of authors is categorized in the developer domain. A good combination of sensors was used to develop a Sign language recognition system. Most of the authors used sensor-based gloves to recognize sign gestures. Another group of authors worked on existing sensor-based models and improved the accuracy and efficiency of the system. Their focus was to use a good combination of Machine Learning and Neural Network-based algorithms for accuracy achievement. Considering the author’s intensions, Machine Learning based algorithms were used by authors working on sensor-based models like sensory gloves. The authors used Neural Network-based models working on vision-sensor-based models for sign gesture recognition.

Considering literature work, some trends were also kept under consideration. Most of the authors in the sign language domain preferred to develop their own sensor-based models. The focus of authors working in this trend was to develop their own cheap and efficient model that could detect and recognize gestures easily. These models were not made for commercial use. Authors obtained another trend to develop commercial gloves. These gloves contained a maximum number of sensors e.g., 18–20 sensors, to detect sign gestures. Cost and efficiency were the main problem in commercial gloves. Analyzing these research articles, advantages and disadvantages of vision-sensor-based, sensor-based and hybrid-based recognition models were listed. Additionally, the last trend of focus in sign language articles, including this article, was the group of those authors who worked on surveys and reviewed articles on sign recognition. These authors provided a deep understanding of research work done previously, provided detailed knowledge of hardware modules, sensor performances, efficiency analysis, and accuracy comparisons. The advantage of review and survey articles over general and development research articles was the filtered knowledge of consideration in one article. Survey-based research articles proved to be a good help for learners and newcomers in that specific topic. Survey and review articles also provided researchers with upcoming challenges, trends, motivations, and future recommendations. A detailed comparative study help use determines uses, limitations, benefits, and advancement in the sign language domain.

Based on results analysis of all literature reviews, a detailed list of accurate achievements is displayed in Table 5 . It contains information about algorithms, or a combination of algorithms used collectively to gain maximum accuracy for sign language recognition techniques. Algorithmic combinations are completely case-dependent. To achieve maximum accuracy or enhanced system efficiency, authors use combinations of algorithms. The table below lists all types of accuracy results obtained in vision-sensor based, combination of different sensor-based, and hybrid-sensor-based posture recognition techniques. Most accuracy results are displayed for sensor-based recognition algorithms. Furthermore, real-time gestures and static gestures are also highlighted with respect to their results in the form of accuracy, efficiency, or outcome in Table 5 .

Results analysis of algorithms used for classification and recognition.

7. Motivations

Motivation is considered the group of multiple parameters that contributed to the continuing sign language domain using different algorithmic or sensor-based designs. Reason can also be related to accuracy and efficiency improvements. Whenever two people communicate with each other, signs and body language in sign gestures help a lot. Two ordinary people use speech and body gestures to share, but deaf or mute individuals ultimately use handmade signs or gestures to communicate with one another. Therefore, sign language is a suitable mode of motivation for researchers. Parameters that affect motivation directly or indirectly are shown in Figure 17 below. These motivational parameters include advancement or improvement in existing sign language modules, overcoming the limitations highlighted in gesture recognition, benefits, and uses of sign gesture recognition modules.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g017.jpg

Motivational domains of sign Language.

7.1. Technological Advancement

Technological advancement is one of the core motivations for Sign language recognition Systems developers. Despite ordinary people, defective individuals use sensor-based wearable to communicate with others. So, today’s advanced world is leaning towards more and more smart sensors day by day, thus creating a good gap for developers to enhance their recognition models. Initially designed recognition models produced less accuracy and efficiency in recognizing static or dynamic gestures. This communication gap can only be minimized by using more efficient and accurate sensors that could easily identify and recognize signs made by a mute person. Embedment of more advanced and precise sensors in existing systems enhanced system recognition ability. In the beginning, only wired high-power and slow operating sensor-based systems were developed. Then, sensor automation moved the entire wired model with wireless modules. Most recent sign language systems are completely based on wireless systems. These systems capture data from hand devices, a process that collects and transmits data towards the receiver end for gesture recognition. Previously used models had computer monitor-based display units with a populated wired environment. However, this gap was also recovered with modern LCDs and display units. Most recent studies have worked on power consumption analysis. Microchips and nanochips are now being designed to cope with energy management. This is one of the developers’ most significant challenges and core motivation to design models that utilize low power. Early work had an analog-based front end with a very complex interface. This system was difficult to understand for new users. People were made comfortable with the wearable device after proper training of developed systems. However, in today’s modern models, touch screen-based graphical user interface (GUI) with proper directions and instructions are used. This modern input device replaced old button-based models. Accuracy improvement, bend detection, hand movement, rotation measurement, hand orientation-based degree of freedom (DoF), and distance measurement improvements are good pieces of motivation for upcoming developers. This smart, fast, efficient, and low power consumption-based model helped deaf-mute individuals to communicate with others easily. Thus, this is also a big cause of motivation for developers to renovate existing models with newly equipped smart sensors, which may result in producing the best results. Systems developed approximately close to ideal systems will give a good living environment for deaf and mute people. This technology-based world will help them improve their living style and become confident in communication with other people.

7.2. Daily Usage

Daily usage of the sensor-based smart device covers a large domain, including an as educational tools for deaf and mute people. Despite developed countries, there are several other countries where deaf and mute people are completely unable to acquire some sort of education. Thus, lake of education also creates an increased ratio of unemployment. Therefore, this is a need for time to build a cost-effective, smart, feasible, and accurate system for those individuals who are unable to speak or hear. Using these smart sensor-based systems would become a very easy and comfortable way of learning through any source. Many developed countries are already using online learning tools, mobile applications, gesture-based animations, and video clips that help them completely correct and understand sign gestures. Successful implementation of a sign language recognition environment will remove hindrance of communication between normal and deaf-mute people. It will also act as a self-learning tool for beginners to help them train. Instructors and sign language teachers can use this sensor device for educational purposes. Affected individuals can learn sign language by observing simulators performing hand, wrist, and arm movements for sign language. Gesture recognition as an educational tool is a big motivation as well as a challenge for researchers working on or willing to work on this domain.

7.3. Benefits

The benefits of using a glove-based system are its ease, comfort, lightweight nature, and good man-machine interaction (MMI). Hand movements are considered the fundamental entities to perform any task or operation. Research has shown that a large number of articles have been published on human body movement, especially hand, wrist, and arm movements. Mostly sensor-based models considering hand motion are closely related to biomedical science or human engineering-based applications. Thus, the man-machine interface involving biomedical science and human engineering has been a hot and emerging domain for the last few decades. To capture hand and wrist movements, sensor-based smart gloves are always important. Considering the evaluation of technology, it is evident that sensor advancement, efficient materials, updated computing scenarios, and modern algorithmic techniques will make recognition systems more accurate and powerful. This lightweight, easy to use, efficient, and user-friendly smart equipment will be helpful in many daily life applications. A smart sensor-based module can act as an assistant for deaf-mute individuals. Other advantages of using smart data gloves are their ability to move easily with the user. Today’s technology is moving towards nano- and microchips, which have made data gloves completely independent from computer connections. Integration of wireless modules in smart systems has made the mobility of data gloves more precise and accurate. Users can move with easiness and comfort after wearing any sensor-based glove. Technological advancements of the modern era have made Man-Machine Interface (MMI) as need of time. Gesture and movement capturing environments mostly operate under the MMI domain. The main purpose of MMI is to capture and analyze signals generated after performing any operation activity like hand, wrist, or arm movement. Captured signals are converted for performance, behavior, and physiological interpretation of performed motion using active computer systems. Motion recognition or gesture recognition-based systems working on vision-sensor, hybrid, or sound-based techniques have contributed a little bit in traditional capture and recognition prototypes. The sign language gesture recognition prototype is also a man-machine interface due to its core connection of deaf-mute individual interaction with the sensor-based environment for communication purposes. A wide variety of gesture involvement in daily usage activities has made the sign language domain the most emerging field of today’s era [ 82 ]. Gesture recognition-based applications like remote controls, immersive gaming, assistive robots, virtual objects, medical health care, substitutional computer interfaces, etc., have made this field a most challenging and advanced area for researchers.

7.4. Limitations

Limitations in existing systems are challenges for upcoming new researchers. Nothing is made perfect; instead, perfection comes with experiment. Considering the sign language recognition environment, only two basic techniques come into mind: sensor-based and vision-sensor-based. The sensor-based smart glove contains a list of sensors to capture, analyze, recognize, and signal. However, in a vision-based approach, cameras are needed to capture sign gestures made by deaf-mute people. A lot of researchers have worked on vision-sensor-based techniques using different types of cameras like Kinect, leap motion, or sometimes a color-based glove for sign recognition. The only advantage of using a vision-sensor-based approach is the relief from warning a glove full of sensors. Considering hand orientation for sign language, the vision-sensor-based approach also captures facial expressions, object orientation, object colors, and depth knowledge. There are a lot of challenges in the vision-sensor-based approach as well. Very complex computation is required to capture gesture information from the scene. Mostly neural networks are used to cope with these complex computations. Object extraction from the background is another challenge as complex and rough backgrounds create a lot of complications in required object extraction. Therefore, it becomes difficult to discern the signs made from irregular backgrounds. Lightening conditions, luminance, and brightness also affect system accuracy and efficiency. All processing is performed via a computer with good specifications and processing ability.

Most importantly, individuals with speech and hearing disabilities will always require a camera to lift with them for gesture capturing purposes and also a display unit to display recognized results. This sort of experimental arrangement will completely affect the daily life activities of the deaf-mute community. A quick overview of all detailed discussions regarding motivational domains of the sign language recognition system is described in Figure 18 . The flow-based list in Figure 18 summarizes the whole discussion in one format for a quick understanding of motivational domains.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g018.jpg

Overview of key points regarding motivational domains of sign language.

8. Challenges

Challenges in the sign language recognition system played a vital role in increasing system accuracy. Any challenge faced by researchers received the universal attention of researchers working in the same domain. Successful implementation of provided challenge help in improved system accuracy and may also highlight some other sort of challenges. Considering the literature review, a deep analysis of challenges regarding the sign language recognition system is obtained. Issues and difficulties faced by developers are analyzed thoroughly. This section contains challenges regarding sensor-based, vision-sensor-based, and hybrid systems equipped for SL recognition. Based on the literature review, the challenge domain was categorized into four main streams: nature and orientation of signs, system characteristics of sign language recognizers, and challenges related to the user, sensors, and device characteristics. An overview of challenging domains for sign language recognition systems is given below in Figure 19 .

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g019.jpg

Challenging domains of sign language.

8.1. Sign Nature

Sign nature and hand orientation are primary considerations in making sign language gestures. Considering any sign language based on any region, their domain is divided into two basic streams: static sign and dynamic sign. Most of the literature review is composed of the development and analysis of static signs, and very few researchers and developers have worked on dynamic gestures for any sign language. This is because static sign gestures are comparatively easier than dynamic ones, and fixed sign accuracy due to better classification is always greater than dynamic sign gestures. Insufficient training and no track of movement history cause low accuracy and a higher error rate of dynamic sign gestures. The similarity of some gestures is another challenge of the sign language domain. Considering American Sign Language alphanumeric gestures, some alphabets like M, N, S, and T have the same posture.

Initially, all these mentioned signs are made with a closed fist, making a very low change in orientation. Alphabet U and V are also categorized into this very domain. Here finger and palm orientation are almost similar. These similar gestures become the cause of misclassification and lead towards low system accuracy. Another challenge in sign recognition is related to the continuous sign domain. Continuous signs are performed by individuals on a specific delay and transition time. So, these delays are calculated before concatenating the whole gesture. Mostly words and phrases are gestured using continuous signs. Words concatenation as speech without any discontinuity is the main issue to be faced in dynamic sign gesture recognition systems. Similarly, the transition from one gesture to another gesture consumes some time. This transition delay is also a big challenge as movement frames are created within gesture sequences that are meaningless and capture space. This is also considered a classification issue, and the frames lost are normally termed as “Movement Epenthesis”. Region-based recognition systems are not identical, and every region has its own gestures corresponding to sign language. Therefore researchers have not listed this issue in global challenges. In a nutshell, it can be stated that there is no universally ideal system for sign language that can cope with sign gestures of all regions. Despite standard static alphanumeric American Sign Language systems, all universally dynamic sign language systems are different due to different languages around the globe. Therefore, there can never be a standard sign language system for the whole world.

8.2. User Interactions

User interaction-related challenges are faced while performing gestures. The tradeoff between a beginner and an expert is kept under consideration. A beginner can never perform sign postures perfectly as compared with an expert one. If any beginner can make correct orientation for sign postures, it would be somehow difficult for the listeners to understand that sign postures perfectly. Using a sign language system as a learning tool, hand orientation, finger movement, wrist, and joint movements are kept under deep consideration. Performing as an experiment tool, one signer is asked to make a single gesture multiple times because the same individual cannot complete the same posture similarly in multiple attempts. This is known as signer dependency because a single individual cannot perform the same gesture and place all wrists, joints, and fingers in the same place all the time. Therefore, the value varies, and a range of different values is taken corresponding to a single posture during the training phase. Among user and gestures differences, another problem is the size and shape of the user’s hands. Typically, data gloves are made of a specific size. Leather-made sensor embedded data gloves are used universally for static or dynamic gesture capturing. Tall or short, slim, or fat, and small or big hands will also create problems in data acquisition. These user interaction-based issues affect the training process of acquired data and reduce overall system efficiency by interpreting more minor system accuracy.

8.3. Device Infrastructure

Device infrastructure and prototype-related challenges are the core factors in the human engineering development model. Among hardware challenges, price and power are the most crucial ones. A large number of companies have already developed smart sensor-based gloves that can capture and display signs made by deaf-mute individuals. However, the price of the international prototypes ranges from USD $1000 to $20,000. This range of price is beyond the range of any common community to buy and use it. The facts show that the deaf-mute individuals mostly belong to middle-standard families among the defected community. So, this looks like a dream for these people to use these smart devices for communication purposes.

Portability is another challenge in sign language recognition. Most of the postures are very complex and need some computational time for gesture recognition. Multiple gestures are combined collectively to make words or sentences. So, in that case, computational work exceeds even more, and it could not be processed without the help of a computer. Practically, it is impossible to lift computers everywhere for SLR. This phenomenon makes the overall prototype limited with respect to portability. Considering only American Sign Language ASL alphabets and numbers, most of the alphabets and numbers are postured with palm and finger orientations. Combining alphabets as words or sentences is tricky to capture with only finger and palm movement. For this purpose, the movement of hands, wrists, arms, and elbows is also involved. Sometimes the action of lips is also captured for recognizing sign gestures.

Sensor-based gloves are capable of capturing only finger and palm data. The data related to the elbow, wrist, and arm are not captured, and hence are a challenge for the researchers. This missing data also create problems in recognizing motion-related data. For example, the alphabet ‘j’ and ‘z’ are related to the motion of the arm; wearing a data glove will not be able to capture data of these two alphabets as arm, and other body parts related data are not captured with the sensor-based data glove. Talking about prototype performance seemed to be another emerging challenge as most sensor-based smart prototypes provide a detailed analysis of sensor efficiency and accuracy. But overall recognition accuracy and prototype efficiency are not described. There is no common standard of utilizing the accuracy of data obtained from gestures. Systems are lack methodologies that would help in determining standardized entities for performance and accuracy analysis of the system. Contributing to the achievement of true data and pure systems efficiency will help society and researchers.

Another challenge in the SLR domain is to achieve the best and desired results with minimum cost. For the sake of minimizing system cost, hardware equipment involved in designing prototypes loses its quality. Low-quality sensors are used for minimizing overall system cost. These low-quality sensors produce faulty data as noise and divert the actual resulting values. In this way, overall system accuracy is decreased. So, noise removal is another big challenge in acquiring authentic and accurate data. Universally, the physical appearance and body structure of humans vary. Some of them are fat, tall, thin, or small. Based on physical appearance, hand size, fingers appearance, and thickness or thinness of hand also vary. Size fluctuation affects recognition performance directly as sensors attached to the finger overlap, and the actual reading of made gestures becomes faulty. Therefore, sign data gloves must be calibrated for different age and hand size users. User training and testing are also performed to cope with this challenge. During the training session, the user is directed to complete specific sign postures to calibrate equipment utilized for recognition purposes.

Among described challenges, the real and actual challenge in sign language recognition is to utilize minimum sensors that can perform with maximum accuracy. However, a very low number of sensors will lose posture information that posing hand will make. In this way, the system accuracy will be affected and will lead to low system accuracy. Another consideration in prototype designing is to keep track of sensors used. The prototype must not include as many sensors that can put the burden on the processor and reduce its task execution capability. Sensor selection is another challenging task in the posture recognition domain. There are a majority of sensors in the market that have been used for hand posture recognition. Not all the sensors are good in their field of view. Every sensor has its own good and bad impact on overall system accuracy. In the wide range of sensors, every sensor has its own way of measuring finger bend. All these ways of capturing finger bends are acceptable but not efficient in all domains. So, the challenge is to find the sensor with the best bending results in the recognition system. Another important challenge is placing desired sensor components on data gloves as sign language is completely based on the movement of the hand, wrist, palm, elbow, and arm. Therefore, the correct placement of bend detection-based and finger orientation detection-based sensors is critical. Correct placement of sensors with the best working ability will enhance system efficiency and produce maximum recognition system accuracy.

8.4. Accuracy

Accuracy-based challenges are prevalent in any research domain. Initially, sign language recognition systems, either static or dynamic, faced terrible accuracy. Like other research domains, this was the main challenge to overcome. For this purpose, instead of covering complete body movement including arm, wrist, elbow, and fingers at once, a partial system was introduced to capture posture movements in chunks. Moreover, the existing glove-based model was renovated to increase overall system accuracy. Sign language was analyzed from the very beginning in different formats. There are a lot of techniques that focus on static sign language recognition but finding best portable solution for a real-time gesture recognition system is still a challenge.

Real-time recognition systems must be able to be more precise, fast, efficient, and accurate to cope with the speed and processing of real-time gesture translation. Despite static postures, dynamic or real-time sign posture must deal with a randomized hand degree of freedom. Exceptional and unwanted data make the process challenging to map incoming input data. Although modern processors are very fast, modeling real-time input data at a time makes it difficult for them to process. Dataset generation or acquisition is another major challenge of research domains. This is the most important consideration and yet is ignored. The availability of an accurate dataset is complicated in the sign recognition domain. The number of available datasets is deficient, and the available datasets are not up to the mark. An accurate dataset is available to the researchers that saves dataset generation and accuracy precision time. However, if the previously used dataset is utilized, it may result in poor system accuracy.

Another major challenge in sign language recognition is to design two-way communication systems. Most of the sign language domain designs are educational models for teaching sign languages. The international market is full of such prototypes which help in teaching sign languages. However, two-way communication-based real-time sign language translators are very few in number. Prototypes designed as educational tools are not suitable to cope with two-way communication. However, these prototypes, including all types of sensors, are boosting processing speed and system accuracy, which is good work done by researchers. Very few sign recognition models have not embedded flex, pressure, and contact sensors. Instead, they worked differently and formed simulation-based models, which produced good efficiency and accuracy. All discussed challenges are listed in Figure 20 .

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g020.jpg

Key points of challenging domains of sign language models.

9. Recommendations

Recommendations play an essential role in increasing system efficiency and demand worldwide. Recommendations are the suggestions that help in improving system design, accuracy, prototype physical appearance, and graphical user interface. All these improvements are based on two parameters: one is the user or the public feedback, and the other is personal or organizational provided analysis. Considering sign language recognition systems, there are a lot of people with hearing and speech disabilities. This creates a communication gap between the society and the affected community. Thus, this is a challenge to the developed society as well. To overcome this hindrance, a lot of systems have been developed. These systems are used globally. Based on user and organizational reviews, recommendations and suggestions are categorized into three different domains, as shown in Figure 21 below. These three domains are the recommendations/offers to the researchers, developers, and public or private sector organizations. To cope with emerging challenges and user expectations, it has become necessary for every developer of the sign language domain to keep in mind that recommendations or suggestions for gaining high accuracy and efficiency.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g021.jpg

Recommendation perspective of sign language.

9.1. Developers

Developers are the main contributors in designing sign recognition prototypes. There are a large number of different parameters which affect system performance directly or indirectly. All parameters are under the development wing, which developers of the system handle. Therefore, suggestions to the developers are on top of the recommendation section. Cost is one of the core challenges to the developers. According to a literature survey, most of the world population affected and unable to speak or hear belong to the middle class or low class. So, cost control is highly recommended to the developers. Sign recognition devices must be cheap, reliable, and efficient to perform accurate sign recognition. Another important suggestion is about system reliability and accuracy. A core feature of the recognition device is to capture and recognize sign gestures perfectly and accurately. The system with more accuracy will help people understand gesture-based language and recover the communication gap between society and the affected individuals. So, a sign language recognition system with the best reliability and highest accuracy is the most demanding parameter. If the overall system performance is enhanced, the error ratio is minimized automatically.

Suitable quality sensors with the highest accuracy can increase system reliability, mainly used in sign language recognition prototypes, flex sensors, three-axis accelerometers, and gyro sensors. A system with these best-quality sensors will produce maximum accuracy results. A high-quality output module is also recommended for a good user experience. In the sigh recognition prototype, a speaker and some graphical user interface GUI are involved. The purpose of the speaker is to transmit recognized posture into speech format so the other person can understand the meaning. For this purpose, the speaker must be loud and clear.

Regarding device usability, the recognition device interface must be user-friendly. The signer must not feel any sort of difficulty in using a prototype. They must also be capable of verifying gestures made by the signer. This will help in the training of newly interacted deaf-mute individuals with the recognition device. GUIs must also be of good quality. Alphabet, words or phrases obtained after making gestures should be displayed perfectly and timely. A good GUI will affect output performance and also increase user attention. 3D sign animation is now becoming the focus of market attention. For this purpose, a proper market analysis-based animation module is required to fulfill user requirements without adding much in cost. Introducing wireless display modules in the sign recognition domain has opened up new display options for developers. The most commonly used wireless module is the smartphone. So, considering smartphones as a wireless output display unit, a beautiful graphical interface, easiness, friendly intractability, and intelligent applications are required. Successful implementation of these features in the sign recognition domain will effectively boost system performance. The future of the sign language domain resides in real-time gesture recognition. Most of the authors have already worked on real-time recognition. It is also recommended that the developers design such real-time applicable systems that can interact with other people without any deal. A real-time system must assist the user and provide urgent feedback to another person. One hand can easily make most sign language postures, including alphabets and numbers. However, regional-based sign language postures involve two hands. This is a challenge and recommendation to the researchers to expand the sign language domain by introducing a perfectly efficient two-hand data glove for sign recognition.

Considering material and sensor attachment on data glove, it is also recommended to use excellent, comfortable, and easily stretchable material for making data gloves. As deaf-mute individuals have to use data gloves almost all the time, the material must be comfortable to use. Data gloves contain sensors on each finger and on palm, so this glove should be waterproof to save any sort of short circuit due to rain, water, and sweating. Glove size is another issue to be addressed. People with different hand and finger sizes use gloves. Therefore it is recommended to use almost all sizes of gloves and adjust sensor size accordingly. Material and data glove calibration must be performed on real-time gestures as there are a lot of variations on values in real-time and virtually simulation time. Real-time sensors continuously emit values and also get affected by the environment. Therefore it is recommended to calibrate the data glove according to all angles and sensor values. Smart data glove for sign translation is used everywhere by affected individuals, so their design must be the latest and up to date. People working in public and commercial places should not feel embarrassed while wearing data gloves. So, its material, design, style, and technological involvement must be up to the mark. Lastly, it is recommended to develop portable recognition devices to assist deaf-mute individuals to cope with their daily routine activities without getting connected with any physically available computer device.

9.2. Organization

Organization is the critical factor in developing prototypes used for sign language recognition. Sign language recognition is one of the most emerging domains. A lot of researchers are already working in this sector. Most of the research organizations are working on real-time applications related to sign language recognition. These real-time applications are implementable on very perspectives directly or indirectly associated with sign recognition. Organizational impact on different sectors will collectively benefit the deaf-mute community and the common public. Targeting public places like bus stations, railway stations, hotels and restaurants, public offices and banks, airports, and hospitals where individual communication is at its peak, organizations could help by providing sign language recognition devices for enabling feasible gesture-based communication. Organizations related to the medical and surgery department can also use these sign recognition devices to facilitate their staff. This could be helpful during surgical operations and staff communication by improving system performance and accuracy. Automation and industry sectors can also utilize sign recognition devices as operational tools. This could also help in machine and equipment maintenance and robotic operability. Organizations and public sectors working on virtual reality-based interactive environments can utilize sign recognition-based data gloves to interact with the environment remotely. Virtual reality-based data gloves can control computer games, home appliances, and act as a mouse for laptops and personal computers. Virtual keyboards and musical instruments can be operated using glove-based innovative technology. Most importantly, educational sectors and training institutes can use sign recognition devices as a learning tool for deaf-mute communities, especially children. They can help them learn words, alphabets, numbers, prepositions, and sentences. Smart sensor-based data gloves can help them communicate with the public and remove social distances effectively.

9.3. Researchers

Researchers are the main working body of any development team. There are a lot of considerations that must be kept in the minds of researchers, so it may help to develop approximately perfect systems. Among these considerations, researchers’ first and foremost recommendation is totally about database generation. A vast collection of all dataset variants, including numbers, alphabets, and words, is required for good research. The extensive dataset will provide better training results than the lower dataset from a machine learning perspective. It is also essential to choose high-efficiency data for dataset perspective. So, it is somehow a big challenge and future recommendation to researchers to work on high-accuracy and high-efficiency datasets. Dataset availability is another consideration in gesture recognition systems.

People working in the sign language recognition domain have access to a minimal dataset. Most of the time, researchers in sensor-based specific environments tend to have their dataset. However, researchers working on vision-sensor-based fields mostly use a predefined dataset provided publicly. However, due to the limited dataset, the accuracy of the dataset is also a big challenge. So, it is recommended to create efficient datasets and give the public access to other researchers. Public datasets available are not adequately classified.

Datasets are available in only digit or alphabet format. Very few researchers have worked on words and phrases posture recognition. So, it is recommended to enhance the sign language dataset concerning numbers, alphabets, words, and sentences. Like spoken language, sign language also has a series of rules and gesture-based procedures to form any phrase or sentence. In the formation of a sign language recognition system, these identical and isolated rules must be followed as they will help in developing an efficient and reliable translation system. Another important recommendation is to keep good consideration of different regional-based sign language systems. Every region’s common communication-based sign language is different from that of other areas. So, it is a recommendation and a challenge to develop region-based isolated sign language translation systems. In this perspective, it was recommended by a researcher to develop a standard universal translation system with specific rules that could be accessible and usable by any deaf-mute individual around the world. It is also recommended to introduce hybrid sign language recognition and translation systems. Almost all researchers had worked on either sensor-based or vision-sensor-based SLR approaches. These vision-sensor-based and sensor-based approaches have their own merits and demerits. However, the combination of both sensor-based and vision-sensor-based techniques will overcome the demerits of both technologies.

A hybrid approach implemented on two hands to identify facial expressions will provide better results—sensor fusion-based prototypes help in capturing all movements performed for sign formation. A flex sensor on each finger and a gyroscope with an accelerometer sensor will provide bending and axis orientation of hand movements. To reduce the ambiguity of differentiating similar postures and misspelling, a filtration process with the embedment of pressure sensor attached on the middle finger is utilized. Moreover, facial expressions and body motion are also captured for proper language translations.

To capture complete posture information made for sign language, elbow and shoulder joints are equipped with sensors to acquire movement data. Minor information regarding capturing facial expressions, data fusion between two hand-based sensors gloves, and head and body movement for posture formation are also captured. The role of the threshold value is very considerable in a vision-sensor-based recognition system. Threshold points act as a filter for input data. It helps in capturing desired data and discarding false input data. For multiple feature environments, the double threshold can also be applied to capture real sign data and discard unwanted inaccurate input data.

Most of the communication errors are removed through this method. Therefore, it is recommended to keep the threshold under good consideration. Using a reasonable and feasible number of sensors impacts system accuracy. Using a lot of sensors results in increased system complexity. A lot of combinations are used to capture hand motion. For this purpose, normally, flex, contact, pressure, accelerometer, and gyroscope-based sensors are used. This combination of sensors has the ability to capture very minute changes in hand. Therefore, it is recommended to utilize these sensors to find good recognition results with improved accuracy and increased efficiency. More sensors are required to capture real-time sign recognition systems with minor details, including head movements, facial expressions, body postures, and hand motions. The involvement of more sensors in real-time recognition systems helps in increased accuracy of enhanced efficiency systems. It is also recommended to merge all sensor data into one format for easy processing of recognition algorithms.

It is also recommended to use minimum sensors that can cope with all system functionalities. This will help reduce system complexity and hardware fusion issues. Figure 22 provides an overview of Sign language recognition’s recommendation domain. These recommendations to developers, organizations, and researchers will play a vital role in improving overall system accuracy. This will also help increase user demand by interacting and fulfilling public needs.

An external file that holds a picture, illustration, etc.
Object name is jimaging-08-00098-g022.jpg

Overview of recommendation domains of the sign language model.

10. Conclusions

Developing an automatic machine-based SL translation system that transforms SL into speech and text or vice versa is particularly helpful in improving intercommunication. Progress in pattern recognition promises automated translation systems, but many complex problems need to be solved before they become a reality. Several aspects of SLR technology, particularly SLR that uses a glove sensor approach, have been previously explored and investigated by researchers. In this paper, an in-depth comparative analysis of different sensors in addressing and describing the challenges, benefits, and recommendations related to SLR was presented. The paper discussed the literature work of other researchers mainly targeting the available glove types, the sensors used for capturing data, the techniques adopted for recognition purposes, the identification of the dataset in each article, and the specification of the processing unit and output devices of the recognition systems. The comparative analysis would be helpful to explore and develop a translation system capable of interpreting different sign languages. Finally, datasets generated form these sensors can be used for tasks of classifications and segmentations to assist in continuous gestures recognition.

Author Contributions

Conceptualization, M.S.A. and S.T.H.R.; formal analysis, M.S.A.; investigation, M.S.A. and S.T.H.R.; writing—original draft preparation, M.S.A., S.T.H.R. and M.M.H.; writing—review and editing, M.S.A., S.T.H.R. and M.M.H.; visualization, M.S.A.; supervision, S.T.H.R.; project administration, S.T.H.R. All authors have read and agreed to the published version of the manuscript.

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Conflicts of interest.

The authors declare no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Sign Language Recognition Using the Electromyographic Signal: A Systematic Literature Review

Affiliations.

  • 1 Research Laboratory LaTICE, University of Tunis, Tunis 1008, Tunisia.
  • 2 Mada-Assistive Technology Center Qatar, Doha P.O. Box 24230, Qatar.
  • 3 Arab League Educational, Cultural, and Scientific Organization, Tunis 1003, Tunisia.
  • PMID: 37837173
  • PMCID: PMC10574929
  • DOI: 10.3390/s23198343

The analysis and recognition of sign languages are currently active fields of research focused on sign recognition. Various approaches differ in terms of analysis methods and the devices used for sign acquisition. Traditional methods rely on video analysis or spatial positioning data calculated using motion capture tools. In contrast to these conventional recognition and classification approaches, electromyogram (EMG) signals, which measure muscle electrical activity, offer potential technology for detecting gestures. These EMG-based approaches have recently gained attention due to their advantages. This prompted us to conduct a comprehensive study on the methods, approaches, and projects utilizing EMG sensors for sign language handshape recognition. In this paper, we provided an overview of the sign language recognition field through a literature review, with the objective of offering an in-depth review of the most significant techniques. These techniques were categorized in this article based on their respective methodologies. The survey discussed the progress and challenges in sign language recognition systems based on surface electromyography (sEMG) signals. These systems have shown promise but face issues like sEMG data variability and sensor placement. Multiple sensors enhance reliability and accuracy. Machine learning, including deep learning, is used to address these challenges. Common classifiers in sEMG-based sign language recognition include SVM, ANN, CNN, KNN, HMM, and LSTM. While SVM and ANN are widely used, random forest and KNN have shown better performance in some cases. A multilayer perceptron neural network achieved perfect accuracy in one study. CNN, often paired with LSTM, ranks as the third most popular classifier and can achieve exceptional accuracy, reaching up to 99.6% when utilizing both EMG and IMU data. LSTM is highly regarded for handling sequential dependencies in EMG signals, making it a critical component of sign language recognition systems. In summary, the survey highlights the prevalence of SVM and ANN classifiers but also suggests the effectiveness of alternative classifiers like random forests and KNNs. LSTM emerges as the most suitable algorithm for capturing sequential dependencies and improving gesture recognition in EMG-based sign language recognition systems.

Keywords: electromyographic signal; sEMG; sign language recognition; systematic review.

Publication types

  • Systematic Review
  • Electromyography / methods
  • Neural Networks, Computer
  • Pattern Recognition, Automated* / methods
  • Reproducibility of Results
  • Sign Language*

Grants and funding

A Comprehensive Review of Sign Language Recognition: Different Types, Modalities, and Datasets

A machine can understand human activities, and the meaning of signs can help overcome the communication barriers between the inaudible and ordinary people. Sign Language Recognition (SLR) is a fascinating research area and a crucial task concerning computer vision and pattern recognition. Recently, SLR usage has increased in many applications, but the environment, background image resolution, modalities, and datasets affect the performance a lot. Many researchers have been striving to carry out generic real-time SLR models. This review paper facilitates a comprehensive overview of SLR and discusses the needs, challenges, and problems associated with SLR. We study related works about manual and non-manual, various modalities, and datasets. Research progress and existing state-of-the-art SLR models over the past decade have been reviewed. Finally, we find the research gap and limitations in this domain and suggest future directions. This review paper will be helpful for readers and researchers to get complete guidance about SLR and the progressive design of the state-of-the-art SLR model.

Index Terms:

I introduction.

According to the WHO (World Health Organization) report, over 466 million people are speech or hearing impaired, and 80% of them are semi-illiterate or illiterate [ 1 ] . Non-verbal manner conveys and communicates our views, emotions, and thoughts visually through sign language. Compared to spoken language, sign language grammar is quite different. A sign comprises specific hands, shapes, or signals produced in a particular location on or around the signer’s body combined with a specific movement.

Hand gestures, signals, body movements, facial expressions, and lip movements are the visual means of communication used by the hand-talk community and ordinary people to convey the meaning; We recognize this language as a sign language. Sign language recognition (SLR) is challenging and complex, and many research opportunities are available with the present technology of artificial intelligence. A taxonomy of SLR is shown in Figure 1 . It comprises datasets, input modality, features, classification, computational resources, and applications. The dataset is further classified into isolated sign dataset and continuous sign dataset. Vision-based modality and sensor-based modality are the general types of input modality. Hand movement, facial expression, and body movement are the major features that concern SLR. Classification is typified into traditional methods (HMM, RNN, etc.), deep learning (CNN), and hybrid method (combination of traditional and deep learning or combination of deep learning and optimization algorithm).

Refer to caption

SLR aims to understand the gestures by suitable techniques, which requires identifying the features and classifying the sign as gesture recognition. In the literature, there is no comprehensive review paper addressing the aspect of the modality (vision and sensor), different types (isolated (manual and no manual) and continuous (manual and no manual)), various sign language datasets, and state-of-the-art methods based studies. This review study focuses on SLR-based research work, recent trends, and barrier concerns to sign language. Different sign languages, modalities, and datasets in sign language have been discussed and presented in tabular form to understand better. From databases like IEEE explore digital library, science direct, springer, web of science, and google scholar, we used the keywords sign language recognition to identify significant related works that exist in the past two decades have included for this review work. We excluded papers other than out-of-scope sign language recognition and not written in English. The contributions to this comprehensive SLR review paper are as follows:

Carried out a review of the past two decades of published related work on isolated manual SLR, isolated non-manual SLR, continuous manual SLR, and continuous non-manual SLR.

Discussed different sensing approaches for sign language recognition and modality

This paper presents SLR datasets concerned with isolated and continuous, various sign languages, and the complexity of the datasets discussed.

Discussed the framework of SLR and provided insightful guidance on SLR

Point out the limitations related to the dataset and current trends available in the SLR and potential application of SLR with human-computer interaction.

This paper studied the results of the current state-of-the-art SLR model regarding the various benchmark SLR datasets for isolated and continuous SLR.

This paper analyzes current SLR issues and advises future SLR research direction.

I-A Need of SLR

As per WHO statistics, around 5% of the population in the world suffers from a lack of hearing power. According to the prediction of the United Nations, the number of deaf people in 2050 will be 900 million [ 1 ] . Hence, SLR receives a lot of attention at present. SLR can eliminate the communication gap between the hand-talk community (deaf and dumb). Also, SLR helps to improve communication in the following ways.

It reduces the frustration of the hand-talk community.

The communication barrier overcome by SLR leads to effective communication.

Much research endeavored to develop high-performance SLR. Despite that, it is challenging, and it is one of the recent research fields with enormous research scope.

I-B Challenges

SLR comprises numerous gestures and facial expressions, making it complex and challenging. In addition, to the manual components, lip shapes and eyebrow positions distinguish similar signs; e.g., many manual signs seem to be of a similar pose. However, these can be differentiated with the help of facial expression and lip movement. Sign language comprises hand movement, shape, position, orientation, palm posture, finger movement, facial expression, and body movements. These components highly influence the performance of SLR. Some of the barriers and problems of SLR are tabulated in Table I . With the advance of hardware, efficient algorithms can improve the processing speed. The scaling and image orientation problems can be resolved with recent deep learning techniques. The illumination problem can be overcome if the RGB is converted to HSV (Hue Saturation Value) or Ycbcr (Luminance Chrominance). Dynamic and non-uniform background environment problems could be resolved using the skin region and background subtraction method.

Refer to caption

I-C Procedure involved in SLR

The SLR involves data collection, preprocessing, feature extraction, and classification phase. The block diagram of SLR and its general process is demonstrated in Figure 2 . These stages are discussed in the following. Note that, for the sensor-based approach, preprocessing and segmentation are optional.

Data Collection: In SLR, the data acquisition is performed using one of two modes; Vision and Sensor. In a vision-based approach, the input is an image or video [ 2 ] , [ 3 ] . A single camera is used to collect standard signs while multiple cameras, active and invasive devices, help collect the depth information. Video camera, webcam, or smartphone device [ 4 ] , [ 5 ] , [ 6 ] , [ 7 ] captured the continuous motion. The sensor-based approach collects the signal with the use of the sensor [ 8 ] , [ 9 ] , [ 10 ] , [ 11 ] .

Image Preprocessing: The performance of the SLR system can be improved by preprocessing methods such as dimension reduction, normalization, and noise removal. [ 12 ] .

Segmentation: The segmentation stage splits the image into various parts or ROI (Region of Interest) [ 13 ] , Skin Colour Segmentation [ 14 ] , HTS (Hands Tracking and Segmentation) [ 15 ] , Entropy Analysis and PIM (Picture Information Measure) [ 16 ] . The background requires the hand gesture extraction to be done effectively by segmentation and tracking process.

Tracking: Tracking of hand position and facial expression from the acquired image/video can be performed using camshaft (continuously adaptive mean shift used to track the head position) [ 17 ] , Adaboost with HOG (Histogram of the gradient) [ 18 ] , Particle filtering (KPF-Kalman Particle Filter) [ 19 ] .

Feature Extraction: Transforming preprocessed input data into the feature space is known as feature extraction. Further, it is discussed in detail in section 2.

Data Base: The acquired data (image/video) is stored in the database and classified into two sets, namely training and testing datasets [ 20 ] . The classifier learns by training dataset and the performance is evaluated by testing data.

Classification: The classifiers perform the classification by extracting features and classify the sign gesture. The Hidden Markov Model (HMM) [ 9 ] , [ 21 ] , Long-Short Term Memory (LSTM) [ 22 ] Deep Learning network [ 23 ] , and hybrid classifier [ 2 ] , [ 24 ] are used as classifiers to recognize sign language.

Evaluation Stage: The performance of a trained classifier is validated with a testing dataset (unseen data during training) [ 25 ] . The error incurred during classification gauges sign recognition performance.

Although there are few review papers in the literature [ 26 ] , [ 12 ] , however, they lack focus and understanding of SLR. This paper provides a comprehensive SLR preamble, recent research progress, barriers or limitations, research gap, and future research direction and scope. We organized the rest of the review paper as follows. Section 2 presents sign language modality, preprocessing, and the various feature extraction methods in SLR. Carried out a literature review concerning the manual and non-manual aspects of SLR in Section 3; Section 4 discusses and illustrates the classification architecture of SLR. Section 5 presents various types of SLR, datasets concerning SLR, and reviews work related to the modalities, current state-of-the-art models based on SLR. The recent trends, challenges, and limitations are highlighted in Section 6. Sections 7 and 8 pointed out future research discussion and conclusion, respectively.

II Modalities of SLR

SLR is one of the most prominent research areas in computer vision and natural language processing. In concern to the acquisition process, the SLR system is classified as a sensor-based and vision-based approach. Both approaches are next classified as manual and non-manual, and further classified as isolated and continuous. Figure 3 illustrates the SLR types. Much research work focused on isolated manual-based SLR. Only a little research work addressed continuous non-manual SLR.

Refer to caption

Sensor-based approach: Physically attached sensors acquire trajectories of the head, finger, and motion of the signer. Sensor-associated gloves track the signer’s hand articulations and recognize the sign. The comparison of SLR methods shown in Table II clarifies vision and sensor-based approaches. In contrast with vision-based SLR, sensor-based SLR provides efficient performance.

Vision-based approach: The gestures captured by multiple cameras (or webcam) are recognized using the vision/image-based approach. From the captured image/video, it extracts palm, finger, and hand movement features. With the help of these extracted features, classification was performed. Poor illumination or lighting environment, noisy background, and blurring present in the image result in misclassification. Although vision-based SLR is suitable for real-time conditions, it must adequately care for preprocessing, feature extraction, and classification.

II-A Preprocessing

The computational burden of data processing could be reduced by preprocessing methods. Image reduction and image conversion methods do the size reduction and conversion from color to gray scale. Image reduction methods reduce the burden of data processing. The unwanted object can be removed by the histogram equalization [ 39 ] . The noise present in the image are removed using the filter, like median, moving average method, and so on [ 40 ] . Gaussian average methods are used to remove the image background component [ 41 ] . Filters perform removal of the unwanted components and minimize the size of the data with the help of image edge detection algorithm [ 42 ] . The filter process speeds up with the help of fast Fourier transformation because instead of an image, the frequency domain is used [ 43 ] . The image is split into possible segments [ 44 ] ; masking is used in segmentation to improve processing. Elimination of background effect using binarization histogram equalization aid for better image contrast. Normalization methods can effectively handle the variance in the data [ 45 ] .

II-B Feature Extraction

In SLR, relevant feature extraction plays a vital role. It is crucial for sign language, as irrelevant features lead to misclassification [ 46 ] . The feature extraction aid in accuracy improvement, and speed [ 47 ] . Some of these feature extraction method include SURF (Speeded Up Robust Feature) [ 34 ] , speed up robust feature (Laplace of Gaussian with box filter) [ 34 ] , SIFT (shift-invariant feature transform) [ 33 ] , PCA (Principal Component Analysis) [ 37 ] , [ 4 ] , LDA (Linear Discriminant Analysis) [ 48 ] , Convexity defects and k-curvature [ 49 ] , time domain to frequency domain [ 31 ] , [ 35 ] , Local binary pattern, etc. The feature extraction methods used for SLR-based study is tabulated in Table III . Various feature extraction methods are showed in Figure 4 . Feature vector dimension reduction performed by PCA, LDA, etc. aid in reducing the computational burden on the classifiers. The dimensionality pruning, features reduction, and lowering of the dimension keep the significant features of high variance and minimizing remaining features, thus, reduces the training complexity. Fourier descriptors are noise resistance and invariant to scale, orientation, and normalization is easy. The process of transforming the correlated into an uncorrected value is known as principal component analysis. Original data are linearly transformed effectively, and the feature vectors get reduced.

The preprocessing and feature extraction methods aid the classifier. Also, they reduce the computation burden, avoid overfitting issues, and wrong recognition possibilities. SIFT’s merits are invariant to lighting, orientation, and scale. However, the performance is not satisfactory [ 33 ] . Using Histogram of Oriented Gradients (HOG) [ 28 ] , the unwanted information is removed, keeping the significant features to ease the image processing. The feature vectors are obtained using the computation of gradient margin and angle. As HOG cell size and the number of bins increase, the extracted feature also increases. Larger subdivisions furnish global information, and small subdivisions given local information that is worthwhile. The demerits of both the methods are that they require more memory. SURF is invariant to image transformation and a faster feature extractor than SIFT. Still, it has the requirement of camera setup in horizontal position for better performance, and the disadvantage is illumination-dependent, not rational. Location and frequency captured using a Discrete Wavelet Transform. Temporal resolution is the critical merit of DWT [ 31 ] , [ 32 ] .

III Literature studies about SLR

Sign language is not generic; it varies according to the region and country [ 1 ] . The sign language classification is available in over 300 sign languages worldwide, namely ASL, BSL, ISL, etc. According to Ethnologue 2014 [ 50 ] in the United States, ASL is a native language for around 2,50,000-5,00,000 people. Chinese Sign Language is being used in China by approximately 1M to 20M deaf people. Approximately 1,50,000 people in the United Kingdom use British Sign Language (BSL). In Brazil, approximately 3 million signers use Brazilian Sign Language to communicate, like Portuguese Sign Language or French Sign Language. According to Ethnologue 2008 in India, approximately 1.5 million signers use Indo-Pakistani Sign Language.

SLR is not only meant for deaf and mute people. Ordinary people also communicate information in the noisy area of public places and the library without disturbing others. Manuel (communication by hands) and non-manual (communication by body posture or facial expression) medium are usually used in sign language. People use sometimes finger spelling which is communicated by splitting words into letters, then spelling the letter using fingers). Manual and non-manual SLR are discussed in detail in following subsections.

III-A Manual SLR

Hand motion, hand posture, hand shape, and hand location are the manual sign components. Figure 5 shows the manual sign components. With one hand or two hands, the signer usually communicates with others. The manual SLR is classified into isolated and continuous.

Refer to caption

III-A 1 Isolated Manual SLR

The literature work on isolated manual SLR are as follows:

Classical methods: Ong et al. [ 51 ] suggested sequential Pattern Tree-based multi-class classifier for DGS (German Sign Language (Deutsche Gebärdensprache) and Greek Sign Language (GSL) recognition. Their proposed SP-Tree Boosting algorithm-based recognition model performs better than the Hidden Markov Model. Chansri and Srinonchat [ 28 ] proposed data fusion incurred ANN-based Thai SLR model. They extracted the hand feature using histograms of oriented gradients, and they did classification using a back-propagation algorithm associated with an ANN. Yin et al. [ 52 ] performed hand gesture recognition using a joint algorithm based on BP and template matching method combination. The joint algorithm takes computation time as 0.0134 and an accuracy of 99.8% was achieved for isolated hand gesture recognition. Jane and Sasidhar [ 53 ] carried out an ANN classifier with an association of data fusion. They performed three hidden layers of artificial neural network with wavelet denoising and TKEO (TeagerKaiser energy operator) methods for a SEE (Signing Exact English). Based on this approach, the recognition rate is 93.27%. Korean finger language recognition model was developed based on ensemble ANN [ 11 ] . The performance was analyzed by varying dataset size (50 to 1500) and classifier (1 to 10). The comparative analysis of eight ANN classifier-associated ensemble models identifies 300 training datasets as an optimal structure to lead to 97.4 % recognition accuracy for Korean finger language recognition.

Almeida et al. [ 54 ] extracted seven vision-based features using RGB-D sensor. They recognized Brazilian Sign Language with an average of 80% using the SVM. They did phonological structure-based decomposition and extraction of signs. Hence, they suggested a model suitable for other SLR purposes. Fatmi et al. [ 9 ] performed SLR based on ANN and SVM. They have compared their performance with HMM. Comparison with other machine learning techniques to ASL words, higher accuracy achieved by proposing ANN. Lee and Lee [ 55 ] developed SVM based on a sign language interpretation device with 98.2 % recognition accuracy. SVM classifier-based sign interpretation device developed system. Wei et al. [ 10 ] presented the CSL sign recognition model using a code matching method by including a fuzzy K-mean algorithm. They determined subclass by a fuzzy K-mean algorithm and classification was done with the Code matching method. Li et al. [ 56 ] suggested ASL recognition prototype model based on KNN, LDA, and SVM classifiers. They carried out a prototype model based on LDA, KNN, and SVM classifiers using a firmly stretchable strain sensor for ASL 0-9 number sign recognition. The authors reported that the model achieved an average accuracy of 98%.

Yang et al. [ 57 ] performed a Chinese Sign Language (CSL) recognition model based on sensor fusion decision tree and Multi-Stream Hidden Markov Models classifier. They developed a wearable sensor associated with the Chinese SLR model with user-dependent and user-independent using Multi-Stream Hidden Markov Models. The searching range improved by optimized tree-structure classification. Dawod and Chakpitak [ 58 ] carried out work on real-time recognition model for ASL alphabets and numbers sign recognition. They used RDF (Random Decision Forest) and HCRF (Hidden Conditional Random Field) based classifiers and Microsoft Kinect sensor v2 for the data collection. The HCRF classifier-based recognition model gets the mean accuracy for numbers-based sign recognition as 99.99% and alphabets sign recognition as 99.9%. The RDF-based recognition model achieved mean accuracy for number sign recognition as 96.3% and alphabets sign recognition as 97.7%. Hence, the HCRF based sign recognition model leads to better performance than RDF for both ASL numbers and alphabets recognition. Hrúz et al. [ 59 ] presented a Hidden Markov Model-based Czech SLR model with an association of kiosk. Also, they performed SLR, automatic speech recognition, and sign language synthesis.

Mummadi et al. [ 60 ] proposed an LSF model based on IMU sensors associated with wearable hand gloves with various classifiers like naïve Bayes, MLP, and RF. Real-time wearable IMU sensor-based glove-associated sign recognition model developed for LSF recognition instead of complimentary filter advanced fusion strategy and the advanced classifier can improve the accuracy rate. Botros et al. [ 8 ] presented a comparative analysis of wrist-based gesture recognition using EMG signal. Forearm and wrist level-based gesture are recognized using EMG signal. Gupta and Kumar [ 61 ] performed a wearable sensor-based multi-class label incurred SLR model. The LP-based SLR model has a minimal error and computation time than the tree-based, BR (binary relevance), and CC (Classifier Chain) based sign recognition models. Compared to the classic tree classification model, the suggested model performs well with minimal classification errors. Hoang [ 62 ] presented a new vision-based captured ASL alphabets sign dataset (HGM-4). With this dataset, using a classifier, developed a contactless SLR system.

Deep learning approaches: Al-Hammadi et al. [ 46 ] performed sign dependent and sign independent SLR using three datasets using single and fusion parallel 3DCNN. The proposed model gets a better recognition rate than other considered six existing literature methods. Sincan and Keles [ 27 ] performed CNN and LSTM based SLR model for Turkish SLR. The feature extraction improved by FPM (Feature Pooling Module), convergence speeds up using the attention model. Yuan et al. [ 24 ] pointed out DCNN (deep convolution neural network and LSTM (long short-term memory) based model for hand gesture recognition. The residual module has overcome the gradient vanishing and overfitting problem. Complex hand gesture long-distance dependency problem addressed by improved deep feature fusion network. Compared to Bayes, KNN, SVM, CNN, LSTM, and CNN-LSTM, the DFFN based model performs well on ASL and CSL datasets.

Aly and Aly [ 2 ] designed an Arabic SLR model using BiLSTM (deep Bi-directional Long Short Term Memory recurrent neural network). Convolutional Self-Organizing Map for hand shape feature extraction, and DeepLabv3+ extracts hand regions. The suggested model proved validity on signer-independent real Arabic SLR. The proposed model is suitable for an isolated sign, and continuous sign-based analysis can be a future direction. Rastgoo et al. [ 3 ] carried out work on a multi-modal and multi-view hand skeleton-based SLR model. Features fusion and single-view vs. a multi-view projection of hand skeleton-based performance analysis performed. SSD (Single Shot Detector), 2DCNN (2D Convolutional Neural Network), 3DCNN (3D Convolutional Neural Network), and LSTM (long short-term memory) based deep pipe-line architectures were proposed to recognize the hand sign language automatically. Lee et al. [ 22 ] designed the k-Nearest-Neighbour method associated with Long-Short Term Memory (LSTM) recurrent neural network-based American SLR model. The leap motion controller is used to gain the sign data. Compared to SVM, RNN, and LSTM models, the proposed model (LSTM with KNN) outperforms 99.44%.

For a clear understanding, the research work related to isolated manual SLR are tabulated in Table LABEL:Table_4 and graphical representation is shown in Figure 6 . The recognition model results in good accuracy for isolated sign recognition, not assured to be generalized for continuous sign recognition with better precision.

III-A 2 Continuous Manual SLR

Processing one-dimensional data is simpler compared to handling a high-dimension dataset like video [ 63 ] . Continuous SLR with uncontrolled environment-based SLR is quite complex as there is no clear pause after each gesture.

Refer to caption

It makes SLR performance way behind performance of speech recognition. The existing research work on continuous manual SLR are as follows:

Traditional methods: Nayak et al. [ 64 ] pointed out the feature extraction approach for continuous sign. Relational distribution are captured from the face and hand present in the images. The parameters are optimized by ICM, so convergence speeds up; they used dynamic time warping for distance computation between two sub-strings. The continuous sign sentence extracts the recurrent features using RD, DWT, and ICM based approaches. Kong and Ranganath [ 65 ] performed continuous SLR by merging of CRF (conditional random field) and SVM in a framework of Bayesian network. They performed a semi-Markov CRF decoding scheme-based merge approach for independent continuous SLR. Tripathi and Nandi [ 4 ] carried out a gesture recognition model for continuous Indian Sign Language. They extracted meaningful gesture frames using the Key-frame extraction method. The orientation histogram technique extracted each gesture-relevant feature and used the Principal Component Analysis to reduce the feature dimension. They used the distance classifier for classification. According to performance analysis with other considered classifiers, the Correlation and Euclidean distance-based classifier perform with a better recognition rate. Gurbuz et al. [ 37 ] developed an ASL model for the RF sensing-based feature fusion approach. They use LDA, SVM, KNN, and RF as classifiers. The random forest classifier-based model for five signs results in 95% recognition accuracy, while 20 signs result in 72 %. They can use the deep learning classifier in the future to improve recognition accuracy. Hassan et al. [ 5 ] proposed Modified k-Nearest Neighbor and Hidden Markov Models based on Continuous Arabic SLR. Window-based statistical features and 2D DCT transformation extract the features. The proposed model performance analyzed with sensors, vision-based datasets, and motion tracker dataset leads to a better recognition rate. For sentence recognition (MKNN) Modified k-Nearest Neighbor yields the best recognition rate than the HMM-based Toolkit. For word recognition, RASR performs better with a higher recognition rate than MKNN GT2K.

CNN, LSTM and Cross model based related work on continuous manual SLR: Ye et al. [ 66 ] pointed out a 3D convolutional neural network (3DCNN) with a fully connected recurrent neural network (FC-RNN) to localize the continuous video temporal boundaries and recognize sign actions using an SVM classifier. Designed Convolutional 3D and recurrent neural network-based integrated SLR model for continuous ASL sign recognition. Al-Hammadi et al. [ 23 ] presented a single modality-based feature fusion adopted 3DCNN model for dynamic hand gestures recognition. They captured the hand feature using an open pose framework. MLP and auto encoder-based feature extracted 3DCNN model with open pose framework based on hand sign capturing model result in good recognition accuracy for KSU-SSL (King Saud University Saudi Sign Language) dataset using a batch size of 16. Gupta and Rajan [ 67 ] examined the performance of three models, namely modified time-LeNet, t-LeNet (time-LeNet), and MC-DCNN based on Indian SLR. continuous Indian SLR models based on MCDCNN, t-Lenet, and modified t- Lenet classifier using sensor-based dataset presents and performance-based investigation carried out. Pan et al. [ 68 ] spatial and temporal fused Attention incurred Bi directional long term memory network-based SLR model developed. They detected captured video key action by optimKCC. Multi-Plane Vector Relation (MPVR) is used to get skeletal features. They performed two dataset-based analyses to prove the validity of continuous Chinese SLR concerns sign independent and dependent cases. Papastratis et al. [ 69 ] suggested a cross-modal learning-based continuous SLR model, and they have proved validity with three public datasets, namely RWTH-Phoenix-Weather-2014, RWTH- Phoenix-Weather-2014T, and CSL. They achieved the performance improvement of the suggested model by considering additional modalities. Table LABEL:Table_5 and Figure 7 provide a better understanding of literature work regarding continuous manual SLR.

Refer to caption

III-B Non-Manual SLR

Facial expressions, head movement, mouth movement, eye movement, eyebrow movement, and body posture are the non-manual sign parameters. Non-manual sign components showed in Figure 8 . A facial expression considering the lowering and raising of eyebrows expresses grammatical information and emotions. Signers are good listeners and follow eye contact. Similar hand pose signs can be recognized by considering non-manual features. The isolated and continuous are the two types of non-manual SLR models.

III-B 1 Isolated Non-Manual SLR

The study of related research works in isolated non-manual-based SLR as follows:

HMM based work: Von Agris et al. [ 70 ] designed Hidden Markov Model-based British SLR with manual and non-manual features. Aran et al. [ 7 ] performed a Turkish SLR model using a cluster-based Hidden Markov Model. They proved the validity by cross-validation with eight folds (sign independent) and five folds (sign dependent). Sarkar et al. [ 71 ] presented an isolated American SLR model using Hidden Markov Model. They improved the segmentation process by a dynamic programming-based approach. Fagiani et al. [ 72 ] carried out the Hidden Markov Model-based isolated Italian sign recognition in concern to signer independence. The suggested model gets better accuracy than the support vector machine-based recognition model. Zhang et al. [ 73 ] suggested adaptive hidden states incurred Hidden Markov Model for Chinese SLR. The carried-out fusion of trajectories and hand shapes leads to better recognition. Kumar et al. [ 74 ] performed an Indian SLR model based on a decision fusion approach with two modalities (facial expression and hand gesture). They used an HMM-based classifier for recognition and used IBCC for decision fusion purposes. They have carried out two modalities (facial expression and hand gesture) associated with IBCC based on HMM classifier decision fusion approach for Indian SLR. Using advanced classifiers and feature extraction algorithms can improve recognition accuracy.

Logistic regression and CNN based work: Sabyrov et al. [ 75 ] developed K-RSL(Kazakh-Russian Sign Language) interpreted as a human-robot model using Logistic Regression with incurred non-manual components. Mukushev et al. [ 76 ] performed Logistic Regression-based SLR using manual and non-manual features. In the captured video, they got key points using OpenPose. Kishore et al. [ 77 ] performed Adaptive Kernels Matching algorithm that incurred 3-D Indian SLR model claims improved classification accuracy compared with state-of-the-art methods. Better classification accuracy achieved by 3D motion capture models than Microsoft Kinect and leap motion sensor-based model. Liu et al. [ 78 ] pointed out ST-Net (Spatial-Temporal Net) associated with self-boosted intelligent systems for Hong Kong SLR. Compared to a Kinect-based system, the suggested approach performs well with a better recognition rate. Albanie et al. [ 6 ] proposed a Spatio-temporal convolutional neural network-based British SLR model. The pretraining has improved by presenting new larger-scale data, namely BSL-1K. We perform a comprehensive study of the recent developments concerning non-manual SLR. Table LABEL:Table_6 and Figure 9 show isolated non-manual SLR-related literature work to make a clear understanding.

Refer to caption

III-B 2 Continuous Non-Manual SLR

Continuous non-manual SLR is highly complex because the issue related to the context sequence has to be handled appropriately for effective performance or enriched accuracy [ 79 ] . The temporal boundaries-related problem makes continuous SLR a complex and arduous task. We discuss the related research work as follows:

Classical methods: Farhadi and Forsyth [ 80 ] carried out the HMM-based continuous ASL to English subtitles alignment model. With simple HMMs based on a discriminative word model, they perform word spotting. Infantino et al. [ 81 ] developed a common-sense engine integrated self-organizing map (SOM) neural network-based SLR model for LIS (Italian sign language). Sarkar et al. [ 71 ] performed a HMM-based continuous ASL. They used a dynamic programming-based approach to improve the segmentation. Forster et al. [ 21 ] pointed out the German SLR model using Multi-stream HMMs based on combination methods. Compared to system combination and feature combination approaches, synchronous and asynchronous combination-based models achieved better performance. Yang and Lee [ 82 ] presented CRF and SVM associated with a continuous ASL using both manual and non-manual features. BoostMap embeddings verified the hand shape, segmenting done by hierarchical CRF, and recognition was performed using SVM. Zhang et al. [ 83 ] suggested a Linear SVM based on an automatic ASL by fusing five modalities. The large-scale dataset-based investigation could be future work to improve recognition accuracy.

Refer to caption

CNN and Hybrid methods: Koller et al. [ 87 ] designed a continuous German SLR model using Iterative Expectation Maximization, incurred CNN. They trained the classifier with over a million hand shape sign data. Brock et al. [ 84 ] performed Continuous Japanese SLR using CNN. They used frame-wise binary Random Forest for segmentation. The improvement of reliability, accuracy, and robustness for large-scale datasets could be a future research direction. Zhou et al. [ 85 ] carried out a continuous SLR model using STMC (Spatial-Temporal Multi-Cue Network) to overcome the vision-based sequence learning problem. Koller et al. [ 86 ] proposed a Hybrid CNN-LSTM-HMMs Continuous German SLR model. They performed sign language learning by sequential parallelism and validated it with three public sign language datasets.

The continuous non-manual SLR-related research works are presented in tabular form in Table LABEL:Table_7 and in graphical chart in Figure 10 for better understating. The research on a continuous sign with a signer independent generic model is important because it has carried very little research on continuous SLR in the past decade.

IV Classification Architectures

The classification is the brain of the SLR model. It aims to classify the sign accurately with minimum error. Researchers used various classifiers, e.g., traditional machine learning-based approach, deep learning-based approach, and hybrid approach.

ANN like back propagation, multi-layer, and recurrent neural networks are employed as classifiers, but handling large data is difficult. It requires enormous data for training to learn to challenge problems using a machine learning-based approach. The complication associated with HMM: 1. Likelihood of observation, 2. Best hidden state sequence decoding, 3. HMM, parameter framing. The parameter need for the 2 DCNN is excessively more, which makes the design process complex; this is the major drawback of 2 DCNN. In 3DCNN, the Spatio-temporal data has directly represented hierarchically, which is one of the unique features of 3 DCNN. Concerns to the long-term temporal dependence sign capturing 3 DCNN cannot assure robustness. LSTM eliminates the long-term dependence problem. The hybrid-based approach is adopted as a classifier to improve the accuracy.

Refer to caption

IV-A Traditional Architectures

The Artificial Neural Network (ANN), Hidden Markov Model (HMM), and Recurrent Neural Network (RNN) are the most widely used classifiers in SLR due to their sequential data processing ability. Fatmi et al. [ 9 ] , Lee et al. [ 22 ] , Von Agris et al. [ 70 ] carried the general ANN, RNN, and HMM-based SLR work.

IV-B Deep Learning Architectures

Deep learning makes massive growth in SLR recently. The spatial and temporal features are easy to handle by the deep learning models. LSTM can handle long-term dependence. Figure 11 highlights the deep learning-based SLR architectures, namely Deep CNN and LSTM-CNN Architecture.

IV-C Evaluation Metrics

Computation of word error rate, accuracy, and recognition rate evaluates SLR models’ performance. The formulations used for evaluation are as follows:

The cross-validation scheme, namely leave-one-subject-out (LOSO) and k-fold cross-validation, is used to validate the SLR model’s effectiveness. The Area Under the Curve (AUC) and ROC (Receiver Operating Characteristic) curve show the trade-off between true positive rate and false positive rate: it is used to measure the classifier performance. The Bilingual Evaluation Understudy (BLEU) score is used to measure the effectiveness of the translation.

V Different Types of Sensing Approach

According to the acquisition, it classifies SLR into two types, vision and sensor-based approaches. Many research works conducted both vision-based and sensor-based SLR to help the hand-talk community. Table LABEL:Table_8 depicts sensing approach-based SLR existing work in literature.

Vision-based sensing devices: Types of cameras used for the vision-based approach are as follows: Invasive device (body marker method): Examples: LED light, writ band, colored gloves. Active devices: Kinect sensor, Leap motion sensor. Stereo camera (depth camera): Capture depth information. Single-camera: Smartphone, video camera, webcam, thermal camera, etc. LMC (Leap Motion Controller): The LMC comprises three infrared LEDs and two cameras. It possesses the ability to track 850 nanometers’ wavelength of light. The range is 60 cm (2 feet). Hand movement detection converts them into a suitable form of computer (commands) with the leap motion controller. Images are in a gray scalar format, and it acquired raw images using leap motion service software. Demerit: Accuracy is minimal. Kinect Sensor: The skeleton (depth) image and movement creation done from three-dimensional image data. Multi-array microphone depth sensor and RGB camera are the comprised components in the Kinect sensor. Demerits: it requires more space (6 to 10 feet) distance between the sensor and signer.

Sensor-based sensing devices: The inexpensive and wearable sensor devices such as ACC (Accelerometer), Gyro (gyroscope), and sEMG (surface electromyogram) make sensor-based SLR a prominent tool for SLR. Data Gloves (sensor-based): Analog form of signal converted into digital format by an ADC converter. It detects hand gestures and signs with the help of various sensors. It comprises an accelerometer and flex sensor (bend signal detection). A gyroscope gets orientation and angular and gains acceleration information with the help of an accelerometer. Finger bending information got by flex sensor. IMU (Inertial measurement units) used for hand movement estimation. EMG (Electromyography): attach or insert the electrodes into the human muscle. With the help of the inserted electrodes, it recorded muscle movement as an electrical form. sEMG (surface electromyogram) is used for finger movement capturing and distinguishing. The RF sensor possesses salient features that make it likable to acquire the sign in the dark environment, contact-less. Radar and Wi-Fi: The motion of hand movements is collected from the multiple antennas. It extracts features based on Doppler shift and the difference between the magnitude. Advantage orientation and position are flexible. It has captured the interaction between the environment background and the signer performing the sign using an ambient sensor. Example: 1. Temperature Sensor, 2. Radar Sensor, 3. Pressure Sensor, 4. Sound Sensor.

Refer to caption

In Figure 12 , some SLR modalities are shown.

V-A Various Datasets in SLR

There is a limitation on the datasets because most of the public datasets lack quality and quantity. The datasets are collected from native signers and ordinary people. Imitation of data was acquired to augment the dataset. We summarize the datasets available to SLR in Table LABEL:Table_9 . The benchmark dataset details, including the URL link, are detailed here.

V-A 1 Datasets Vs. Modality

Datasets and modalities affect largely the performance of SLR models. Many researchers have implemented various SLR models using various methods and datasets. Table LABEL:Table_10 shows Datasets Vs. Modality-based study concerns SLR. The RGB, depth, dynamic-based modality facilitates better performance. Hence, modality-based fusion leads to enhanced performance.

V-A 2 The Complexities of SLR Datasets

The acquisition of sign language is performed using the camera and sensor-based sensing devices like Kinect and leap motion controllers, armbands, gloves, electrodes (EMGs), etc. All datasets comprise its specific capturing format, modalities, mapping, environment, and illumination specific to the region or country sign language among 300 sign languages. Some of the complexities in these datasets are as follows:

Acquire the SLR with various linguistic components based on multiple sensing devices is an arduous task, and the tedious process requires much time and effort.

Redundant and blur frames in data collection are a significant complexity that affects the SLR model recognition rate.

Complex background and various lighting are not considered in most datasets and have constraints; therefore, they cannot accurately recognize in a real-time application.

Most of the datasets were collected with a few native signers, only with a few repetitions. Therefore, it may not guarantee the signer independent recognition performance.

The signer wearing a long sleeve, occlusion, and object interaction during data collection makes it challenging to preprocessing and recognition.

The problem of handling real-time applications because most datasets are acquired using the constant background and illumination in a controlled environment.

Recognition of unseen sign words or sentences is difficult because the limited number of vocabularies and sentences present in the data make them incompatible in the real-time use case.

V-A 3 The Solutions to Overcome the SLR Datasets Complexities

The solutions to overcome the SLR datasets complexities are as follows:

During sign data acquisition, the sample collection must consider various environments, lighting based on multiple times, performing the same words with different signers leads to sign independent SLR model and improves the generic ability.

The distance between the signers and the recording device should be feasible to overcome the blurring data issue.

Hand shape-based modality alone is not good enough to recognize the sign; thus, a non-manual feature-based dataset is required to perceive the grammar of the sign language. The isolated and continuous words/sentences include many signers and more repetitions based on a dataset with a more significant number of cues and corpus to improve accuracy, robustness, and generalization.

Versatile and massive corpus SLR dataset to address all sign components using multi-modal sensing with a complex and more extensive isolated, continuous sign without constraints based on capturing. Thus, it serves as a benchmark for SLR research to validate SLR model validity.

V-B Study of current state-of-the-art models for sign language recognition

This paper further explores the state of the models presented in the sign language recognition as follows Ravi et al. [ 131 ] performed Indian sign language recognition using RGB-D data using CNN models. They used four-stream inputs for training and tested performance on two streams (RGB spatial and temporal). They got a recognition accuracy rate of 89.69 % for the BVCSL3D dataset. Gökçe et al. [ 132 ] carried out an isolated Turkish sign language recognition using 3 D residual CNN with score level fusion and got top 1 accuracy of 94.94 % for the Bosphorus Sign22K dataset. Li et al. [ 133 ] presented isolated sign language recognition using TK-3d convNet (transferring cross-domain knowledge-based 3D convolution network). Recognition accuracy of 77.55% for WLASL 100 and 68.75% for WLASL 200, 83.91% for MSASL 100 and 81.14% for MSASL 200 achieved based on the TK-3d convNet SLR model. Camgoz et al. [ 134 ] SLRT (Sign language recognition and translation using transformer) applicability verified with RWTH-PHOENIX- Weather 2014-T dataset achieved 21.80 as BLEU 4 score. Li et al. [ 135 ] suggested TSPNet – Temporal sematic pyramid network association of hierarchical feature learning based on continuous sign language recognition and result in BLEU 4 of 13.41 for RWTH-PHOENIX- Weather 2014-T dataset.

Zheng et al. [ 136 ] suggested a non-independent multi-stream convolutional and RoIs based multi-region convolutional architecture for sign language translation and obtained BLEU 4 scores – 10.89 (RoI) and 10.73 (stream) for RWTH-PHOENIX- Weather 2014-T. Ahmed et al. [ 137 ] presented Wi-Fi CSI (channel state information) dataset and developed sign language recognition using device-free Wi-Fi. SVM augmented-based model results with an accuracy of 98.5 % for Dynamic sign and 99.9 % for Static sign. Zhou et al. [ 85 ] designed a continuous sign language recognition based on STMC (spatial-temporal multi-cue network). They got the WER (word error rate) of 2.1, 28.6, 20.7, and 21.0 for Continuous SLR 100 dataset Split I case, Split II case, RWTH-PHOENIX-Weather 2014, and RWTH-PHOENIX-Weather 2014 T datasets, respectively. Slimane and Bouguessa [ 138 ] performed self-attention network (SAN-sign attention network)-based continuous sign language recognition. They used 2 D CNN with self-attention considered both hand and full-frame as inputs and combined to get final word glosses on evaluation on RWTH-PHOENIX-Weather 2014 dataset achieved WER of 29.78 %. Töngi [ 139 ] suggested the inflated deep CNN based on isolated SLR. They used the MSASL dataset to transfer the ASL knowledge to recognize GSL (German Sign Language) on the SIGNUM dataset and achieved an accuracy of 0.75 for high target data. Hu et al. [ 140 ] pointed out non-manual feature-aware GLEN (Global local enhancement network) based on the SLR model. They achieved a top 1 accuracy of 69.9% for NMFs-CSL datasets and 96.8% for isolated SLR 500 datasets. De Coster et al. [ 141 ] proposed Pose flow and hand cropping associated to video transformer network-based isolated sign language recognition. The VTN-PF (Video Transformer Network with hand cropping and pose) model evaluation on the AUTSL dataset got an accuracy of 92.92 %.

Jiang et al. [ 142 ] devised a SAM SLR (Skeleton Aware multimodal framework Sign language recognition) concerning isolated sign language recognition. The skeleton-aware multi-modal (SSTCN–Separable spatial-temporal convolution network) results in better accuracy on the AUTSL dataset, with a top 1 accuracy of 98.42% for RGB and 98.53% for RGB RGB-D. Papastratis et al. [ 143 ] performed a generative adversarial network with transformer-based continuous sign language recognition. They used four datasets to validate the performance of SLRGAN (sign language recognition generative adversarial network). SLRGAN Deaf-to-Deaf SLRGAN achieves WER of 36.05 for GSL SD, WER of 2.26 for GSL SI, and SLRGAN WER of 2.98 for GSL SI, WER of 37.11 for GSL SD, WER of 23.4% for RWTH-PHOENIX- Weather 2014-T, and WER of 2.1% Continuous SLR 100. Min et al. [ 144 ] conducted VMC (visual alignment constraint) associated Resnet 18 backbone based on continuous sign language recognition model validated on Continuous SLR 100, and RWTH-PHOENIX-Weather 2014 datasets obtained a WER of 1.6% and 22.3%, respectively.

Jiang et al. [ 145 ] designed SMA-SLR- v2 (Skeleton aware multimodal framework with global ensemble model) based on isolated sign language recognition. They achieved the top 1 accuracy of 98.53% for AUSTL (RGBD all), the top 1 accuracy of 59.39% for the WLASL2000 dataset per instance case, and 56.63% per class, and the top 1 accuracy of 99% accuracy for isolated SLR 500 dataset. Meng and Li [ 146 ] presented a GCN (graph convolution network)-based SLR network (dual sign language recognition model). The fusion of the two-stream models is SLR-Net-J+B, which results in the top 1 accuracy of 98.08% for the isolated SLR -500 dataset and 64.57% for the DEVISIGN-L dataset. Pereira-Montiel et al. [ 147 ] devised Colombian sign language automatic recognition using SVM (support vector machine) with RBFK (radial basis function kernel) with four channels of sEMG (surface electromyography) and three-axis acc (accelerometer). Achieve accuracy 96.66% for 12-word recognition. Boháček and Hrúz [ 148 ] performed isolated sign language recognition using SPOTER (Sign pose base transformer) validated with LSA64 and WLASL datasets, resulting in a 100% accuracy for LSA64 and 63.18% and 43.78% accuracy for WLASL 100 and WLASL 300, respectively. The current state-of-the-art SLR model is summarized in a Table XI for better understanding. We hope this review paper sets a baseline for futuristic and advanced research in the SLR domain.

VI Discussion

Sign language possesses dynamic gestures, trajectory property, and multi-dimensional feature vectors. These factors make it challenging to recognize sign language. Still, many researchers are attempting to develop a generalized, reliable, and robust SLR model. Multi-dimensional features are a novel approach that leads to a better recognition rate. This review paper aims to provide an easy understanding and helpful guidance to the research community. To perform research to develop an effective SLR model to assist the hand-talking community is one of the prominent domains in computer vision, pattern recognition, and natural language processing.

VI-A Limitation of Current Datasets and their sizes

The ambiguities and lack of training dataset make the SLR vulnerable. Therefore, the standardized and large-scale datasets with manual and non-manual features are important. The limitation of the current datasets and their sizes are as follows: Barrier concerning the recording/ collection/ measuring equipment:

Poor camera quality affect the clarity of the sign in the vision-based system when the resolution is reduced leads to decreased accuracy.

Improper camera setup is another barrier because it leads to loss of important sign information when a sign is dynamic or static, performing the signer.

If a multi-camera set-up is used to acquire the signer data, the lack of synchronization lead to information loss, leading to poor performance.

Device dependability should be reliable, cost-effective, and easy to maintenance.

The environment, background, and illumination profoundly affect the dataset preparation.

When the background setup comprises noise, it creates misclassification and reduces the recognition rate, so it should be properly dealt with to overcome this barrier.

Improper light and illumination reduce the clarity and also affect accuracy.

The distance between the camera and the signer should be maintained at a nominal and workable range. Very much closer and farther, much long distance between the signer and the camera affect the performance.

VI-B Limitation of Current Trends

The limitations of the current trends in SLR are as follows:

The barrier regarding the different signers affects the accuracy:

Break off between the letter/ sign and speed up sign performing: The speedy, continuous, and frequent sign performed by the signer creates challenges for segmentation and feature extraction.

Blockage of overlapping, occlusion of hand-face, hand-hand.

Wearing a dress with long sleeves and wearing colored gloves also affects the sign recognition process.

High variation concerns the interpersonal: Sign varies between signers and instants.

The barrier concerning the video domain:

The problem of handling the video data in the limited GPU memory is not tractable. Most CNN techniques are only image-based, videos that have an additional temporal dimension. A simple resizing process may cause a loss of crucial temporal information to perform the fine-tuning and classification process on each frame independently.

The barrier concerning network design in machine learning:

The recognition and classification ability prevailed by the location, illumination, and so on.

Higher batch size causes a fall in local convergence instead of global convergence. Smaller batch sizes lead to larger iterations and a rise in training expenses.

Selection of the loss functions during training cause expenses.

Selection of optimal hyperparameters.

The active research domain is AI-based realistic modeling SLR translation and production of Avatar modeling (manual and non-manual). Developing AI Sign language learning and translation applications (web-based or smartphone) is one of the current trends. Although the advent of deep learning networks improves SLR accuracy, the limitations mentioned above still need to be addressed in the SLR domain.

VI-C Other Potential Applications of SLR with Human-Computer Interaction

Some potential applications of SLR with human-computer interaction are as follows:

Virtual reality: With the help of the electronics equipment, the user experiences artificial simulation of real world.

Smart home: Home attributes to monitor, access, and control using artificial intelligence and electronics devices. It includes a security and alarm alert system.

Health care: Intended to assist the patients in a better quality of life and good health care service.

Social safety: To ensure safe and social engagement and to minimize social threats.

Telehealth: Remotely accessing clinical contacts and care services to enhance patients’ health care.

Virtual shopping: To provide hassle-free, more comfortable shopping with virtual stores.

Digital signature: To transfer the information as an electronics sign.

Gaming and playing: To facilitate more entertaining, and gaming experience to users.

Text and voice assistance: To provide better communication using technology and ease of user comfort.

Education: To facilitate enhanced learning skills using advanced techniques.

VII Future Direction and Research Scope

Compared to the recent developmental achievement in automatic speech recognition, SLR is still lagging with a vast gap and remains at an early development stage. According to the literature study, a good number of research exists in SLR. Much research is struggling to achieve a high performance SLR model by exploring advanced techniques like deep learning, machine learning, optimization, and advanced hardware and sensor experimentation. Finally. We need a thorough exploration to solve the following issues in SLR.

Distinctiveness/contract of sign handling problem.

Multiple sensors/camera fusion problems.

Multi-modalities data handling issues.

Computation problem.

Consistency issues.

Difficult to handle a large vocabulary.

Requirement of standard datasets.

Future Directions

Future directions for SLR are as follows:

SLR model design needs a better understanding of optimal hyperparameter estimation strategy.

Building uncontrolled surrounding/environment-based SLR models is a thrust area because researchers develop most of the existing models in the literature with respect to the lab environment-based datasets. Hence, it is demanding.

Designing a user-friendly, realistic, and robust sign language model is one of the high-scope domains of SLR.

Design a high-precision sign language capturing device (sensor and camera).

Devise a novel training strategy to reduce computational training difficulty.

The lightweight CNN model for SLR is another research scope.

Develop an SLR with the association multi-modal based leverage to improve recognition accuracy.

Devising a generic automatic SLR model.

This review paper is presented to provide a complete guide to the research and allow the reader to know about the existing SLR works. It demonstrates challenging problems, research gaps, future research direction, and dataset resources. Therefore, the readers and researchers can move forward towards developing novel models and products to assist the hands-talk community and contribute to social benefits.

VIII Conclusion

There are several review papers on hand gestures and SLR. Still, existing review papers do not comprehensively discuss facial expression, modality, and dataset-based sign language, lacking in-depth discussion. With this motivation, this review paper studied different types of SLR, various sensing approaches, modalities, and various SLR datasets and listed out the issues of SLR and the future direction of SLR. However, further complete guidance will provide a more precise understanding and acquire knowledge and awareness of the problem’s complexity, state-of-the-art models, and challenges in SLR.

This comprehensive review paper will help to guide upcoming researchers about the SLR introduction, needs, applications, and processes involved in SLR. It discussed various manual and non-manual SLR models. Also, it reviewed isolated and continuous of each type (manual and non-manual) and provided easy understanding to the reader with the help table and diagram. The manual and non-manual type-based SLR present effectively and then examine the works related to the various modalities and datasets. Finally, we reviewed recent research progress, challenges, and barriers of existing SLR models, organized in an informative and valuable manner concerning the various types, modalities, and dataset. The improvisation of accuracy concerns vision-based SLR, one of the ongoing and hot research topics. The sensor-based approach is highly suitable for laboratory-based experimentation. But not an appropriate choice for practical real-time applications. The vision-based SLR model’s accuracy is less than the sensor-based approach and very much less than the speech recognition model. A robust and sophisticated method is essential for extracting manual and non-manual features and overcoming the barriers. Therefore, a lot of scopes are available to the SLR domain. We hope this review paves insight for readers and researchers to propose a state-of-the-art method that facilitates better communication and improves the hand-talk community’s human life.

Declaration

Funding- No Funding was received for this work. Competing interests- The authors declare no conflict of interest. Availability of data and materials- Not Applicable. Code availability- Not Applicable.

Acknowledgment

This work of Dr. M. Madhiarasan was supported by the MHRD (Grant No. OH-31-24-200-328) project.

  • Organization [2018] W. H. Organization. (2018) Deafness and hearing loss. [Online]. Available: https://www.who.int/news-room/fact-sheets/detail/deafness-and-hearing-loss
  • Aly and Aly [2020] S. Aly and W. Aly, “Deeparslr: A novel signer-independent deep learning framework for isolated arabic sign language gestures recognition,” IEEE Access , vol. 8, pp. 83 199–83 212, 2020.
  • Rastgoo et al. [2020a] R. Rastgoo, K. Kiani, and S. Escalera, “Hand sign language recognition using multi-view hand skeleton,” Expert Systems with Applications , vol. 150, p. 113336, 2020.
  • Tripathi and Nandi [2015] K. Tripathi and N. B. G. Nandi, “Continuous indian sign language gesture recognition and sentence formation,” Procedia Computer Science , vol. 54, pp. 523–531, 2015.
  • Hassan et al. [2019] M. Hassan, K. Assaleh, and T. Shanableh, “Multiple proposals for continuous arabic sign language recognition,” Sensing and Imaging , vol. 20, no. 1, p. 4, 2019.
  • Albanie et al. [2020] S. Albanie, G. Varol, L. Momeni, T. Afouras, J. S. Chung, N. Fox, and A. Zisserman, “Bsl-1k: Scaling up co-articulated sign language recognition using mouthing cues,” in European Conference on Computer Vision .   Springer, 2020, pp. 35–53.
  • Aran et al. [2009] O. Aran, I. Ari, L. Akarun, B. Sankur, A. Benoit, A. Caplier, P. Campr, A. H. Carrillo et al. , “Signtutor: An interactive system for sign language tutoring,” IEEE MultiMedia , vol. 16, no. 1, pp. 81–93, 2009.
  • Botros et al. [2020] F. S. Botros, A. Phinyomark, and E. J. Scheme, “Electromyography-based gesture recognition: Is it time to change focus from the forearm to the wrist?” environments , vol. 14, p. 15, 2020.
  • Fatmi et al. [2019] R. Fatmi, S. Rashad, and R. Integlia, “Comparing ann, svm, and hmm based machine learning methods for american sign language recognition using wearable motion sensors,” in 2019 IEEE 9th Annual Computing and Communication Workshop and Conference (CCWC) .   IEEE, 2019, pp. 0290–0297.
  • Wei et al. [2016a] S. Wei, X. Chen, X. Yang, S. Cao, and X. Zhang, “A component-based vocabulary-extensible sign language gesture recognition framework,” Sensors , vol. 16, no. 4, p. 556, 2016.
  • Kim et al. [2018a] S. Kim, J. Kim, S. Ahn, and Y. Kim, “Finger language recognition based on ensemble artificial neural network learning using armband emg sensors,” Technology and Health Care , vol. 26, no. S1, pp. 249–258, 2018.
  • Cheok et al. [2019] M. J. Cheok, Z. Omar, and M. H. Jaward, “A review of hand gesture and sign language recognition techniques,” International Journal of Machine Learning and Cybernetics , vol. 10, no. 1, pp. 131–153, 2019.
  • Kim et al. [2018b] S. Kim, Y. Ji, and K.-B. Lee, “An effective sign language learning with object detection based roi segmentation,” in 2018 Second IEEE International Conference on Robotic Computing (IRC) .   IEEE, 2018, pp. 330–333.
  • Paulraj et al. [2010] M. P. Paulraj, S. Yaacob, M. S. bin Zanar Azalan, and R. Palaniappan, “A phoneme based sign language recognition system using skin color segmentation,” in 2010 6th International Colloquium on Signal Processing & its Applications .   IEEE, 2010, pp. 1–5.
  • Ghotkar and Kharate [2013] A. S. Ghotkar and G. K. Kharate, “Vision based real time hand gesture recognition techniques for human computer interaction,” Int J Comput Appl , vol. 70, no. 16, pp. 1–8, 2013.
  • Shin et al. [2006] J.-H. Shin, J.-S. Lee, S.-K. Kil, D.-F. Shen, J.-G. Ryu, E.-H. Lee, H.-K. Min, and S.-H. Hong, “Hand region extraction and gesture recognition using entropy analysis,” IJCSNS International Journal of Computer Science and Network Security , vol. 6, no. 2A, pp. 216–222, 2006.
  • Akmeliawati et al. [2009] R. Akmeliawati, F. Dadgostar, S. Demidenko, N. Gamage, Y. C. Kuang, C. Messom, M. Ooi, A. Sarrafzadeh, and G. SenGupta, “Towards real-time sign language analysis via markerless gesture tracking,” in 2009 IEEE Instrumentation and Measurement Technology Conference .   IEEE, 2009, pp. 1200–1204.
  • Wang et al. [2012] X. Wang, M. Xia, H. Cai, Y. Gao, and C. Cattani, “Hidden-markov-models-based dynamic hand gesture recognition,” Mathematical Problems in Engineering , vol. 2012, 2012.
  • Li et al. [2003] P. Li, T. Zhang, and A. E. Pece, “Visual contour tracking based on particle filters,” Image and Vision Computing , vol. 21, no. 1, pp. 111–123, 2003.
  • Kadhim and Khamees [2020] R. A. Kadhim and M. Khamees, “A real-time american sign language recognition system using convolutional neural network for real datasets,” TEM Journal , vol. 9, no. 3, p. 937, 2020.
  • Forster et al. [2013] J. Forster, C. Oberdörfer, O. Koller, and H. Ney, “Modality combination techniques for continuous sign language recognition,” in Iberian Conference on Pattern Recognition and Image Analysis .   Springer, 2013, pp. 89–99.
  • Lee et al. [2021] C. K. Lee, K. K. Ng, C.-H. Chen, H. C. Lau, S. Chung, and T. Tsoi, “American sign language recognition and training method with recurrent neural network,” Expert Systems with Applications , vol. 167, p. 114403, 2021.
  • Al-Hammadi et al. [2020a] M. Al-Hammadi, G. Muhammad, W. Abdul, M. Alsulaiman, M. A. Bencherif, T. S. Alrayes, H. Mathkour, and M. A. Mekhtiche, “Deep learning-based approach for sign language gesture recognition with efficient hand gesture representation,” IEEE Access , vol. 8, pp. 192 527–192 542, 2020.
  • Yuan et al. [2020] G. Yuan, X. Liu, Q. Yan, S. Qiao, Z. Wang, and L. Yuan, “Hand gesture recognition using deep feature fusion network based on wearable sensors,” IEEE Sensors Journal , vol. 21, no. 1, pp. 539–547, 2020.
  • Vamplew and Adams [1996] P. Vamplew and A. Adams, “Recognition of sign language gestures using neural networks,” in European Conference on Disabilities, Virtual Reality and Associated Technologies , 1996.
  • Kudrinko et al. [2020] K. Kudrinko, E. Flavin, X. Zhu, and Q. Li, “Wearable sensor-based sign language recognition: A comprehensive review,” IEEE Reviews in Biomedical Engineering , vol. 14, pp. 82–97, 2020.
  • Sincan and Keles [2020] O. M. Sincan and H. Y. Keles, “Autsl: A large scale multi-modal turkish sign language dataset and baseline methods,” IEEE Access , vol. 8, pp. 181 340–181 355, 2020.
  • Chansri and Srinonchat [2016] C. Chansri and J. Srinonchat, “Hand gesture recognition for thai sign language in complex background using fusion of depth and color video,” Procedia Computer Science , vol. 86, pp. 257–260, 2016.
  • Pansare and Ingle [2016] J. R. Pansare and M. Ingle, “Vision-based approach for american sign language recognition using edge orientation histogram,” in 2016 International Conference on Image, Vision and Computing (ICIVC) .   IEEE, 2016, pp. 86–90.
  • Singha and Das [2013] J. Singha and K. Das, “Indian sign language recognition using eigen value weighted euclidean distance based classification technique,” arXiv preprint arXiv:1303.0634 , 2013.
  • Ahmed et al. [2016] W. Ahmed, K. Chanda, and S. Mitra, “Vision based hand gesture recognition using dynamic time warping for indian sign language,” in 2016 International Conference on Information Science (ICIS) .   IEEE, 2016, pp. 120–125.
  • Prasad et al. [2016] M. Prasad, P. Kishore, E. K. Kumar, and D. A. Kumar, “Indian sign language recognition system using new fusion based edge operator.” Journal of Theoretical & Applied Information Technology , vol. 88, no. 3, 2016.
  • Gurjal P [2012] K. K. Gurjal P, “Real time hand gesture recognition using sift.” Int J Electron Electr Eng , vol. 2, no. 3, 2012.
  • Yao and Li [2012] Y. Yao and C.-T. Li, “Hand posture recognition using surf with adaptive boosting,” in British Machine Vision Conference , 2012.
  • Kumar [2017] N. Kumar, “Sign language recognition for hearing impaired people based on hands symbols classification,” in 2017 International Conference on Computing, Communication and Automation (ICCCA) .   IEEE, 2017, pp. 244–249.
  • Shukla et al. [2015] P. Shukla, A. Garg, K. Sharma, and A. Mittal, “A dtw and fourier descriptor based approach for indian sign language recognition,” in 2015 Third International Conference on Image Information Processing (ICIIP) .   IEEE, 2015, pp. 113–118.
  • Gurbuz et al. [2021] S. Z. Gurbuz, A. C. Gurbuz, E. A. Malaia, D. J. Griffin, C. Crawford, M. M. Rahman, E. Kurtoglu, R. Aksu, T. Macks, and R. Mdrafi, “American sign language recognition using rf sensing,” IEEE Sensors Journal , 2021.
  • Dour and Sharma [2016] G. Dour and S. Sharma, “Recognition of alphabets of indian sign language by sugeno type fuzzy neural network,” Pattern Recognit Lett , vol. 30, pp. 737–742, 2016.
  • Sethi et al. [2012] A. Sethi, S. Hemanth, K. Kumar, N. Bhaskara Rao, and R. Krishnan, “Signpro-an application suite for deaf and dumb,” IJCSET , vol. 2, no. 5, pp. 1203–1206, 2012.
  • Lahiani et al. [2015] H. Lahiani, M. Elleuch, and M. Kherallah, “Real time hand gesture recognition system for android devices,” in 2015 15th International Conference on Intelligent Systems Design and Applications (ISDA) .   IEEE, 2015, pp. 591–596.
  • Pansare et al. [2012] J. R. Pansare, S. H. Gawande, and M. Ingle, “Real-time static hand gesture recognition for american sign language (asl) in complex background,” J Signal Inf Process , vol. 3, pp. 364–367, 2012.
  • Lionnie et al. [2012] R. Lionnie, I. K. Timotius, and I. Setyawan, “Performance comparison of several pre-processing methods in a hand gesture recognition system based on nearest neighbor for different background conditions,” Journal of ICT Research and Applications , vol. 6, no. 3, pp. 183–194, 2012.
  • Zhang et al. [2016a] Z. Zhang, X. Qin, X. Wu, F. Wang, and Z. Yuan, “Recognition of chinese sign language based on dynamic features extracted by fast fourier transform,” in Pacific Rim Conference on Multimedia .   Springer, 2016, pp. 508–517.
  • Zorins and Grabusts [2016] A. Zorins and P. Grabusts, “Review of data preprocessing methods for sign language recognition systems based on artificial neural networks.” Information Technology & Management Science (Sciendo) , vol. 19, no. 1, 2016.
  • Tsagaris and Manitsaris [2013] A. Tsagaris and S. Manitsaris, “Colour space comparison for skin detection in finger gesture recognition,” International Journal of Advances in Engineering & Technology , vol. 6, no. 4, p. 1431, 2013.
  • Al-Hammadi et al. [2020b] M. Al-Hammadi, G. Muhammad, W. Abdul, M. Alsulaiman, M. A. Bencherif, and M. A. Mekhtiche, “Hand gesture recognition for sign language using 3dcnn,” IEEE Access , vol. 8, pp. 79 491–79 509, 2020.
  • Khalid et al. [2014] S. Khalid, T. Khalil, and S. Nasreen, “A survey of feature selection and feature extraction techniques in machine learning,” in 2014 science and information conference .   IEEE, 2014, pp. 372–378.
  • Jiang et al. [2018] S. Jiang, B. Lv, W. Guo, C. Zhang, H. Wang, X. Sheng, and P. B. Shull, “Feasibility of wrist-worn, real-time hand, and surface gesture recognition via semg and imu sensing,” IEEE Transactions on Industrial Informatics , vol. 14, no. 8, pp. 3376–3385, 2018.
  • Tariq et al. [2012] M. Tariq, A. Iqbal, A. Zahid, Z. Iqbal, and J. Akhtar, “Sign language localization: Learning to eliminate language dialects,” in 2012 15th International Multitopic Conference (INMIC) .   IEEE, 2012, pp. 17–22.
  • ethnologue [2014] ethnologue. (2014) ethnologue. [Online]. Available: https://www.ethnologue.com/
  • Ong et al. [2012] E.-J. Ong, H. Cooper, N. Pugeault, and R. Bowden, “Sign language recognition using sequential pattern trees,” in 2012 IEEE Conference on Computer Vision and Pattern Recognition .   IEEE, 2012, pp. 2200–2207.
  • Yin et al. [2018] S. Yin, J. Yang, Y. Qu, W. Liu, Y. Guo, H. Liu, and D. Wei, “Research on gesture recognition technology of data glove based on joint algorithm,” in 2018 International Conference on Mechanical, Electronic, Control and Automation Engineering (MECAE 2018) .   Atlantis Press, 2018, pp. 41–50.
  • Jane and Sasidhar [2018] S. P. Y. Jane and S. Sasidhar, “Sign language interpreter: Classification of forearm emg and imu signals for signing exact english,” in 2018 IEEE 14th International Conference on Control and Automation (ICCA) .   IEEE, 2018, pp. 947–952.
  • Almeida et al. [2014] S. G. M. Almeida, F. G. Guimarães, and J. A. Ramírez, “Feature extraction in brazilian sign language recognition based on phonological structure and using rgb-d sensors,” Expert Systems with Applications , vol. 41, no. 16, pp. 7259–7271, 2014.
  • Lee and Lee [2018] B. G. Lee and S. M. Lee, “Smart wearable hand device for sign language interpretation system with sensors fusion,” IEEE Sensors Journal , vol. 18, no. 3, pp. 1224–1232, 2018.
  • Li et al. [2018] L. Li, S. Jiang, P. B. Shull, and G. Gu, “Skingest: artificial skin for gesture recognition via filmy stretchable strain sensors,” Advanced Robotics , vol. 32, no. 21, pp. 1112–1121, 2018.
  • Yang et al. [2017] X. Yang, X. Chen, X. Cao, S. Wei, and X. Zhang, “Chinese sign language recognition based on an optimized tree-structure framework,” IEEE Journal of Biomedical and Health Informatics , vol. 21, no. 4, pp. 994–1004, 2017.
  • Dawod and Chakpitak [2019] A. Y. Dawod and N. Chakpitak, “Novel technique for isolated sign language based on fingerspelling recognition,” in 2019 13th International Conference on Software, Knowledge, Information Management and Applications (SKIMA) .   IEEE, 2019, pp. 1–8.
  • Hrúz et al. [2009] M. Hrúz, P. Campr, and A. Karpov, “Input and output modalities used in a sign-language-enabled information kiosk,” SCREEN , vol. 1, no. C3, p. C2, 2009.
  • Mummadi et al. [2018] C. K. Mummadi, F. P. P. Leo, K. D. Verma, S. Kasireddy, P. M. Scholl, J. Kempfle, and K. V. Laerhoven, “Real-time and embedded detection of hand gestures with an imu-based glove,” in Informatics , vol. 5, no. 2.   Multidisciplinary Digital Publishing Institute, 2018, p. 28.
  • Gupta and Kumar [2020] R. Gupta and A. Kumar, “Indian sign language recognition using wearable sensors and multi-label classification,” Computers & Electrical Engineering , p. 106898, 2020.
  • Hoang [2020] V. T. Hoang, “Hgm-4: A new multi-cameras dataset for hand gesture recognition,” Data in Brief , vol. 30, p. 105676, 2020.
  • Elakkiya [2020] R. Elakkiya, “Machine learning based sign language recognition: A review and its research frontier,” Journal of Ambient Intelligence and Humanized Computing , pp. 1–20, 2020.
  • Nayak et al. [2012] S. Nayak, K. Duncan, S. Sarkar, and B. Loeding, “Finding recurrent patterns from continuous sign language sentences for automated extraction of signs,” Journal of Machine Learning Research , vol. 13, pp. 2589––2615, 2012.
  • Kong and Ranganath [2014] W. Kong and S. Ranganath, “Towards subject independent continuous sign language recognition: A segment and merge approach,” Pattern Recognition , vol. 47, no. 3, pp. 1294–1308, 2014.
  • Ye et al. [2018] Y. Ye, Y. Tian, M. Huenerfauth, and J. Liu, “Recognizing american sign language gestures from within continuous videos,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops , 2018, pp. 2064–2073.
  • Gupta and Rajan [2020] R. Gupta and S. Rajan, “Comparative analysis of convolution neural network models for continuous indian sign language classification,” Procedia Computer Science , vol. 171, pp. 1542–1550, 2020.
  • Pan et al. [2020] W. Pan, X. Zhang, and Z. Ye, “Attention-based sign language recognition network utilizing keyframe sampling and skeletal features,” IEEE Access , vol. 8, pp. 215 592–215 602, 2020.
  • Papastratis et al. [2020] I. Papastratis, K. Dimitropoulos, D. Konstantinidis, and P. Daras, “Continuous sign language recognition through cross-modal alignment of video and text embeddings in a joint-latent space,” IEEE Access , vol. 8, pp. 91 170–91 180, 2020.
  • Von Agris et al. [2008] U. Von Agris, J. Zieren, U. Canzler, B. Bauer, and K.-F. Kraiss, “Recent developments in visual sign language recognition,” Universal Access in the Information Society , vol. 6, no. 4, pp. 323–362, 2008.
  • Sarkar et al. [2011] S. Sarkar, B. Loeding, R. Yang, S. Nayak, and A. Parashar, “Segmentation-robust representations, matching, and modeling for sign language,” in CVPR 2011 WORKSHOPS .   IEEE, 2011, pp. 13–19.
  • Fagiani et al. [2015] M. Fagiani, E. Principi, S. Squartini, and F. Piazza, “Signer independent isolated italian sign recognition based on hidden markov models,” Pattern Analysis and Applications , vol. 18, no. 2, pp. 385–402, 2015.
  • Zhang et al. [2016b] J. Zhang, W. Zhou, C. Xie, J. Pu, and H. Li, “Chinese sign language recognition with adaptive hmm,” in 2016 IEEE International Conference on Multimedia and Expo (ICME) .   IEEE, 2016, pp. 1–6.
  • Kumar et al. [2018] P. Kumar, P. P. Roy, and D. P. Dogra, “Independent bayesian classifier combination based sign language recognition using facial expression,” Information Sciences , vol. 428, pp. 30–48, 2018.
  • Sabyrov et al. [2019] A. Sabyrov, M. Mukushev, and V. Kimmelman, “Towards real-time sign language interpreting robot: Evaluation of non-manual components on recognition accuracy.” in CVPR Workshops , 2019.
  • Mukushev et al. [2020] M. Mukushev, A. Sabyrov, A. Imashev, K. Koishybay, V. Kimmelman, and A. Sandygulova, “Evaluation of manual and non-manual components for sign language recognition,” in Proceedings of The 12th Language Resources and Evaluation Conference , 2020, pp. 6073–6078.
  • Kishore et al. [2018] P. Kishore, D. A. Kumar, A. C. S. Sastry, and E. K. Kumar, “Motionlets matching with adaptive kernels for 3-d indian sign language recognition,” IEEE Sensors Journal , vol. 18, no. 8, pp. 3327–3337, 2018.
  • Liu et al. [2018] Z. Liu, X. Qi, and L. Pang, “Self-boosted gesture interactive system with st-net,” in Proceedings of the 26th ACM International Conference on Multimedia , 2018, pp. 145–153.
  • Wilbur [2013] R. B. Wilbur, “Phonological and prosodic layering of nonmanuals in american sign language,” in The signs of language revisited .   Psychology Press, 2013, pp. 196–220.
  • Farhadi and Forsyth [2006] A. Farhadi and D. Forsyth, “Aligning asl for statistical translation using a discriminative word model,” in 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06) , vol. 2.   IEEE, 2006, pp. 1471–1476.
  • Infantino et al. [2007] I. Infantino, R. Rizzo, and S. Gaglio, “A framework for sign language sentence recognition by commonsense context,” IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews) , vol. 37, no. 5, pp. 1034–1039, 2007.
  • Yang and Lee [2013] H.-D. Yang and S.-W. Lee, “Robust sign language recognition by combining manual and non-manual features based on conditional random field and support vector machine,” Pattern Recognition Letters , vol. 34, no. 16, pp. 2051–2056, 2013.
  • Zhang et al. [2016c] C. Zhang, Y. Tian, and M. Huenerfauth, “Multi-modality american sign language recognition,” in 2016 IEEE International Conference on Image Processing (ICIP) .   IEEE, 2016, pp. 2881–2885.
  • Brock et al. [2020] H. Brock, I. Farag, and K. Nakadai, “Recognition of non-manual content in continuous japanese sign language,” Sensors , vol. 20, no. 19, p. 5621, 2020.
  • Zhou et al. [2020] H. Zhou, W. Zhou, Y. Zhou, and H. Li, “Spatial-temporal multi-cue network for continuous sign language recognition,” in Proceedings of the AAAI Conference on Artificial Intelligence , vol. 34, no. 07, 2020, pp. 13 009–13 016.
  • Koller et al. [2020] O. Koller, N. C. Camgoz, H. Ney, and R. Bowden, “Weakly supervised learning with multi-stream cnn-lstm-hmms to discover sequential parallelism in sign language videos,” IEEE Transactions on Pattern Analysis and Machine Intelligence , vol. 42, no. 9, pp. 2306–2320, 2020.
  • Koller et al. [2016] O. Koller, H. Ney, and R. Bowden, “Deep hand: How to train a cnn on 1 million hand images when your data is continuous and weakly labelled,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 3793–3802.
  • Sridhar et al. [2020] A. Sridhar, R. G. Ganesan, P. Kumar, and M. Khapra, “Include: A large scale dataset for indian sign language recognition,” in Proceedings of the 28th ACM International Conference on Multimedia , 2020, pp. 1366–1375.
  • Joze and Koller [2018] H. R. V. Joze and O. Koller, “Ms-asl: A large-scale data set and benchmark for understanding american sign language,” arXiv preprint arXiv:1812.01053 , 2018.
  • Özdemir et al. [2020] O. Özdemir, A. A. Kındıroğlu, N. Cihan Camgoz, and L. Akarun, “BosphorusSign22k Sign Language Recognition Dataset,” in Proceedings of the LREC2020 9th Workshop on the Representation and Processing of Sign Languages: Sign Language Resources in the Service of the Language Community, Technological Challenges and Application Perspectives , 2020.
  • Ko et al. [2019] S.-K. Ko, C. J. Kim, H. Jung, and C. Cho, “Neural sign language translation based on human keypoint estimation,” Applied Sciences , vol. 9, no. 13, p. 2683, 2019.
  • Elboushaki et al. [2020] A. Elboushaki, R. Hannane, K. Afdel, and L. Koutti, “Multid-cnn: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in rgb-d image sequences,” Expert Systems with Applications , vol. 139, p. 112829, 2020.
  • Köpüklü et al. [2019] O. Köpüklü, A. Gunduz, N. Kose, and G. Rigoll, “Real-time hand gesture detection and classification using convolutional neural networks,” in 2019 14th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2019) .   IEEE, 2019, pp. 1–8.
  • Lim et al. [2019] K. M. Lim, A. W. C. Tan, C. P. Lee, and S. C. Tan, “Isolated sign language recognition using convolutional neural network hand modelling and hand energy image,” Multimedia Tools and Applications , vol. 78, no. 14, pp. 19 917–19 944, 2019.
  • Chen et al. [2019] Y. Chen, L. Zhao, X. Peng, J. Yuan, and D. N. Metaxas, “Construct dynamic graphs for hand gesture recognition via spatial-temporal attention,” arXiv preprint arXiv:1907.08871 , 2019.
  • Ferreira et al. [2019] P. M. Ferreira, J. S. Cardoso, and A. Rebelo, “On the role of multimodal learning in the recognition of sign language,” Multimedia Tools and Applications , vol. 78, no. 8, pp. 10 035–10 056, 2019.
  • Gomez-Donoso et al. [2019] F. Gomez-Donoso, S. Orts-Escolano, and M. Cazorla, “Accurate and efficient 3d hand pose regression for robot hand teleoperation using a monocular rgb camera,” Expert Systems with Applications , vol. 136, pp. 327–337, 2019.
  • Spurr et al. [2018] A. Spurr, J. Song, S. Park, and O. Hilliges, “Cross-modal deep variational hand pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 89–98.
  • Kazakos et al. [2018] E. Kazakos, C. Nikou, and I. A. Kakadiaris, “On the fusion of rgb and depth information for hand pose estimation,” in 2018 25th IEEE International Conference on Image Processing (ICIP) .   IEEE, 2018, pp. 868–872.
  • Li et al. [2019] Y. Li, Z. Xue, Y. Wang, L. Ge, Z. Ren, and J. Rodriguez, “End-to-end 3d hand pose estimation from stereo cameras.” in BMVC , vol. 1, 2019, p. 2.
  • Mueller et al. [2018] F. Mueller, F. Bernard, O. Sotnychenko, D. Mehta, S. Sridhar, D. Casas, and C. Theobalt, “Ganerated hands for real-time 3d hand tracking from monocular rgb,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 49–59.
  • Victor [2017] D. Victor, “Handtrack: A library for prototyping real-time hand trackinginterfaces using convolutional neural networks,” GitHub repository , 2017.
  • Baek et al. [2018] S. Baek, K. I. Kim, and T.-K. Kim, “Augmented skeleton space transfer for depth-based hand pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 8330–8339.
  • Moon et al. [2018] G. Moon, J. Y. Chang, and K. M. Lee, “V2v-posenet: Voxel-to-voxel prediction network for accurate 3d hand and human pose estimation from a single depth map,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2018, pp. 5079–5088.
  • Ge et al. [2018] L. Ge, H. Liang, J. Yuan, and D. Thalmann, “Robust 3d hand pose estimation from single depth images using multi-view cnns,” IEEE Transactions on Image Processing , vol. 27, no. 9, pp. 4422–4436, 2018.
  • Ge and et. al [2017] L. Ge and et. al, “3d convolutional neural networks for efficient and robust hand pose estimation from single depth images,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 1991–2000.
  • Dibra et al. [2017] E. Dibra, T. Wolf, C. Oztireli, and M. Gross, “How to refine 3d hand pose estimation from unlabelled depth data?” in 2017 International Conference on 3D Vision (3DV) .   IEEE, 2017, pp. 135–144.
  • Sinha et al. [2016] A. Sinha, C. Choi, and K. Ramani, “Deephand: Robust hand pose estimation by completing a matrix imputed with deep features,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 4150–4158.
  • Zimmermann and Brox [2017] C. Zimmermann and T. Brox, “Learning to estimate 3d hand pose from single rgb images,” in Proceedings of the IEEE International Conference on Computer Vision , 2017, pp. 4903–4911.
  • Marin-Jimenez et al. [2018] M. J. Marin-Jimenez, F. J. Romero-Ramirez, R. Munoz-Salinas, and R. Medina-Carnicer, “3d human pose estimation from depth maps using a deep combination of poses,” Journal of Visual Communication and Image Representation , vol. 55, pp. 627–639, 2018.
  • Deng et al. [2017] X. Deng, S. Yang, Y. Zhang, P. Tan, L. Chang, and H. Wang, “Hand3d: Hand pose estimation using 3d neural network,” arXiv preprint arXiv:1704.02224 , 2017.
  • Oberweger et al. [2016] M. Oberweger, G. Riegler, P. Wohlhart, and V. Lepetit, “Efficiently creating 3d training data for fine hand pose estimation,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 4957–4965.
  • Oberweger et al. [2015] M. Oberweger, P. Wohlhart, and V. Lepetit, “Hands deep in deep learning for hand pose estimation,” arXiv preprint arXiv:1502.06807 , 2015.
  • Rastgoo et al. [2018] R. Rastgoo, K. Kiani, and S. Escalera, “Multi-modal deep hand sign language recognition in still images using restricted boltzmann machine,” Entropy , vol. 20, no. 11, p. 809, 2018.
  • Duan et al. [2016] J. Duan, S. Zhou, J. Wan, X. Guo, and S. Z. Li, “Multi-modality fusion based on consensus-voting and 3d convolution for isolated gesture recognition,” arXiv preprint arXiv:1611.06689 , 2016.
  • Chen et al. [2020] X. Chen, G. Wang, H. Guo, and C. Zhang, “Pose guided structured region ensemble network for cascaded hand pose estimation,” Neurocomputing , 2020.
  • Dadashzadeh et al. [2019] A. Dadashzadeh, A. T. Targhi, M. Tahmasbi, and M. Mirmehdi, “Hgr-net: a fusion network for hand gesture segmentation and recognition,” IET Computer Vision , vol. 13, no. 8, pp. 700–707, 2019.
  • Wang et al. [2018] M. Wang, X. Chen, W. Liu, C. Qian, L. Lin, and L. Ma, “Drpose3d: Depth ranking in 3d human pose estimation,” arXiv preprint arXiv:1805.08973 , 2018.
  • Yuan et al. [2017] S. Yuan, Q. Ye, B. Stenger, S. Jain, and T.-K. Kim, “Bighand2. 2m benchmark: Hand pose dataset and state of the art analysis,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2017, pp. 4866–4874.
  • Guo et al. [2017] H. Guo, G. Wang, X. Chen, and C. Zhang, “Towards good practices for deep 3d hand pose estimation,” arXiv preprint arXiv:1707.07248 , 2017.
  • Fang and Lei [2017] X. Fang and X. Lei, “Hand pose estimation on hybrid cnn-ae model,” in 2017 IEEE International Conference on Information and Automation (ICIA) .   IEEE, 2017, pp. 1018–1022.
  • Madadi et al. [2017] M. Madadi, S. Escalera, X. Baró, and J. Gonzalez, “End-to-end global to local cnn learning for hand pose recovery in depth data,” arXiv preprint arXiv:1705.09606 , 2017.
  • Wang et al. [2016] P. Wang, W. Li, S. Liu, Z. Gao, C. Tang, and P. Ogunbona, “Large-scale isolated gesture recognition using convolutional neural networks,” in 2016 23rd International Conference on Pattern Recognition (ICPR) .   IEEE, 2016, pp. 7–12.
  • Haque et al. [2016] A. Haque, B. Peng, Z. Luo, A. Alahi, S. Yeung, and L. Fei-Fei, “Towards viewpoint invariant 3d human pose estimation,” in European Conference on Computer Vision .   Springer, 2016, pp. 160–177.
  • Tagliasacchi et al. [2015] A. Tagliasacchi, M. Schröder, A. Tkach, S. Bouaziz, M. Botsch, and M. Pauly, “Robust articulated-icp for real-time hand tracking,” in Computer Graphics Forum , vol. 34, no. 5.   Wiley Online Library, 2015, pp. 101–114.
  • Rastgoo et al. [2020b] R. Rastgoo, K. Kiani, and S. Escalera, “Video-based isolated hand sign language recognition using a deep cascaded model,” Multimedia Tools and Applications , vol. 79, pp. 22 965–22 987, 2020.
  • Wei et al. [2016b] S.-E. Wei, V. Ramakrishna, T. Kanade, and Y. Sheikh, “Convolutional pose machines,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2016, pp. 4724–4732.
  • Newell et al. [2016] A. Newell, K. Yang, and J. Deng, “Stacked hourglass networks for human pose estimation,” in European Conference on Computer Vision .   Springer, 2016, pp. 483–499.
  • Koller et al. [2015] O. Koller, J. Forster, and H. Ney, “Continuous sign language recognition: Towards large vocabulary statistical recognition systems handling multiple signers,” Computer Vision and Image Understanding , vol. 141, pp. 108–125, 2015.
  • Toshev and Szegedy [2014] A. Toshev and C. Szegedy, “Deeppose: Human pose estimation via deep neural networks,” in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition , 2014, pp. 1653–1660.
  • Ravi et al. [2019] S. Ravi, M. Suman, P. Kishore, K. Kumar, A. Kumar et al. , “Multi modal spatio temporal co-trained cnns with single modal testing on rgb–d based sign language gesture recognition,” Journal of Computer Languages , vol. 52, pp. 88–102, 2019.
  • Gökçe et al. [2020] Ç. Gökçe, O. Özdemir, A. A. Kındıroğlu, and L. Akarun, “Score-level multi cue fusion for sign language recognition,” in European Conference on Computer Vision .   Springer, 2020, pp. 294–309.
  • Li et al. [2020a] D. Li, X. Yu, C. Xu, L. Petersson, and H. Li, “Transferring cross-domain knowledge for video sign language recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2020, pp. 6205–6214.
  • Camgoz et al. [2020] N. C. Camgoz, O. Koller, S. Hadfield, and R. Bowden, “Sign language transformers: Joint end-to-end sign language recognition and translation,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , 2020, pp. 10 023–10 033.
  • Li et al. [2020b] D. Li, C. Xu, X. Yu, K. Zhang, B. Swift, H. Suominen, and H. Li, “Tspnet: Hierarchical feature learning via temporal semantic pyramid for sign language translation,” arXiv preprint arXiv:2010.05468 , 2020.
  • Zheng et al. [2021] J. Zheng, Y. Chen, C. Wu, X. Shi, and S. M. Kamal, “Enhancing neural sign language translation by highlighting the facial expression information,” Neurocomputing , vol. 464, pp. 462–472, 2021.
  • Ahmed et al. [2020] H. F. T. Ahmed, H. Ahmad, K. Narasingamurthi, H. Harkat, and S. K. Phang, “Df-wislr: Device-free wi-fi-based sign language recognition,” Pervasive and Mobile Computing , vol. 69, p. 101289, 2020.
  • Slimane and Bouguessa [2021] F. B. Slimane and M. Bouguessa, “Context matters: Self-attention for sign language recognition,” in 2020 25th International Conference on Pattern Recognition (ICPR) .   IEEE, 2021, pp. 7884–7891.
  • Töngi [2021] R. Töngi, “Application of transfer learning to sign language recognition using an inflated 3d deep convolutional neural network,” arXiv preprint arXiv:2103.05111 , 2021.
  • Hu et al. [2021] H. Hu, W. Zhou, J. Pu, and H. Li, “Global-local enhancement network for nmf-aware sign language recognition,” ACM transactions on multimedia computing, communications, and applications (TOMM) , vol. 17, no. 3, pp. 1–19, 2021.
  • De Coster et al. [2021] M. De Coster, M. Van Herreweghe, and J. Dambre, “Isolated sign recognition from rgb video using pose flow and self-attention,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021, pp. 3441–3450.
  • Jiang et al. [2021a] S. Jiang, B. Sun, L. Wang, Y. Bai, K. Li, and Y. Fu, “Skeleton aware multi-modal sign language recognition,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition , 2021, pp. 3413–3423.
  • Papastratis et al. [2021] I. Papastratis, K. Dimitropoulos, and P. Daras, “Continuous sign language recognition through a context-aware generative adversarial network,” Sensors , vol. 21, no. 7, p. 2437, 2021.
  • Min et al. [2021] Y. Min, A. Hao, X. Chai, and X. Chen, “Visual alignment constraint for continuous sign language recognition,” arXiv preprint arXiv:2104.02330 , 2021.
  • Jiang et al. [2021b] S. Jiang, B. Sun, L. Wang, Y. Bai, K. Li, and Y. Fu, “Sign language recognition via skeleton-aware multi-model ensemble,” arXiv preprint arXiv:2110.06161 , 2021.
  • Meng and Li [2021] L. Meng and R. Li, “An attention-enhanced multi-scale and dual sign language recognition network based on a graph convolution network,” Sensors , vol. 21, no. 4, p. 1120, 2021.
  • Pereira-Montiel et al. [2022] E. Pereira-Montiel, E. Pérez-Giraldo, J. Mazo, D. Orrego-Metaute, E. Delgado-Trejos, D. Cuesta-Frau, and J. Murillo-Escobar, “Automatic sign language recognition based on accelerometry and surface electromyography signals: A study for colombian sign language,” Biomedical Signal Processing and Control , vol. 71, p. 103201, 2022.
  • Boháček and Hrúz [2022] M. Boháček and M. Hrúz, “Sign pose-based transformer for word-level sign language recognition,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , 2022, pp. 182–191.

Authors Biography

ar5iv homepage

Sign language detection using convolutional neural network

  • Original Research
  • Published: 26 March 2024

Cite this article

  • Pranati Rakshit   ORCID: orcid.org/0000-0001-5037-1326 1 ,
  • Sarbajeet Paul 1 &
  • Shruti Dey 1  

26 Accesses

Explore all metrics

Sign language recognition is an important social issue to be addressed which can benefit the deaf and hard of hearing community by providing easier and faster communication. Some previous studies on sign language recognition have used complex input modalities and feature extraction methods, limiting their practical applicability. This research aims to compare two custom-made convolutional neural network (CNN) models for recognizing American Sign Language (ASL) letters from A to Z, and determine which model performs better. The proposed models utilize a combination of CNN and Softmax activation function, which are powerful and widely used classification methods in the field of computer vision. The purpose of the proposed study is to compare the performance of two specially created CNN models for identifying 26 distinct hand signals that represent the 26 English alphabets. The study found that Model_2 had better overall performance than Model_1, with an accuracy of 98.44% and F1 score 98.41%. However, the performance of each model varied depending on the specific label, suggesting that the choice of model may depend on the specific use case and the labels of interest. This research contributes to the growing field of sign language recognition using deep learning techniques and highlights the importance of designing custom models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price includes VAT (Russian Federation)

Instant access to the full article PDF.

Rent this article via DeepDyve

Institutional subscriptions

literature review for sign language recognition

Similar content being viewed by others

literature review for sign language recognition

HCRNN: A Novel Architecture for Fast Online Handwritten Stroke Classification

literature review for sign language recognition

Deep learning-based facial emotion recognition for human–computer interaction applications

M. Kalpana Chowdary, Tu N. Nguyen & D. Jude Hemanth

literature review for sign language recognition

TextConvoNet: a convolutional neural network based architecture for text classification

Sanskar Soni, Satyendra Singh Chouhan & Santosh Singh Rathore

Data availability

The datasets used/generated and/or analysed during the current study are available from the corresponding author on reasonable request.

Abraham A, Rohini V (2018) Real time conversion of sign language to speech and prediction of gestures using Artificial Neural Network. Proc Comput Sci 143:587–594

Article   Google Scholar  

Adaloglou N, Chatzis T, Papastratis I, Stergioulas A, Papadopoulos GT, Zacharopoulou V, Daras P (2021) A comprehensive study on deep learning-based methods for sign language recognition. IEEE Trans Multimed 24:1750–1762

Adeyanju IA, Bello OO, Adegboye MA (2021) Machine learning methods for sign language recognition: a critical review and analysis. Intell Syst Appl 12:200056

Google Scholar  

Adithya V, Rajesh R (2020) A deep convolutional neural network approach for static hand gesture recognition. Proc Comput Sci 171:2353–2361

Al-Shamayleh AS, Ahmad R, Abushariah MA, Alam KA, Jomhari N (2018) A systematic literature review on vision based gesture recognition techniques. Multimed Tools Appl 77:28121–28184

Amrutha K, Prabu P (2021) ML based sign language recognition system. In: 2021 International Conference on innovative trends in information technology (ICITIIT), pp 1–6. IEEE

Bamwenda J, Özerdem MS (2019) Static hand gesture recognition system using artificial neural networks and support vector machine. Dicle Üniversitesi Mühendislik Fakültesi Mühendislik Dergisi 10(2):561–568

Cassim MR, Parry J, Pantanowitz A, Rubin DM (2022) Design and construction of a cost-effective, portable sign language to speech translator. Inform Med Unlock 30:100927

Davey S, Davey A, Jain R (2020) Impact of Community Oriented Ear Care (COEC) on national programme for control of deafness in India: a critical look. Adv Treat ENT Disord 4(1):001–002

Dima TF, Ahmed ME (2021) Using YOLOv5 algorithm to detect and recognize American sign language. In: 2021 International Conference on information technology (ICIT), pp 603–607. IEEE

Gilorkar NK, Ingle MM (2014) Real time detection and recognition of Indian and American sign language using sift. Int J Electron Commun Eng Technol 5(5):11–18

Girshick R, Donahue J, Darrell T, Malik J (2014) Rich feature hierarchies for accurate object detection and semantic segmentation. In: Proceedings of the IEEE Conference on computer vision and pattern recognition, pp 580–587

https://www.kaggle.com/ . Accessed 24 Dec 2022

Indolia S, Goswami AK, Mishra SP, Asopa P (2018) Conceptual understanding of convolutional neural network-a deep learning approach. Proc Comput Sci 132:679–688

Jenkins J, Rashad S (2022) LeapASL: A platform for design and implementation of real time algorithms for translation of American Sign Language using personal supervised machine learning models. Softw Impacts 12:100302

Katoch S, Singh V, Tiwary US (2022) Indian Sign Language recognition system using SURF with SVM and CNN. Array 14:100141

Kaur K, Kumar P (2016) HamNoSys to SiGML conversion system for sign language automation. Proc Comput Sci 89:794–803

Kumar VK, Goudar RH, Desai VT (2015) Sign language unification: the need for next generation deaf education. Proc Comput Sci 48:673–678

Kuznetsova A, Leal-Taixé L, Rosenhahn B (2013) Real-time sign language recognition using a consumer depth camera. In: Proceedings of the IEEE International Conference on computer vision workshops, pp 83–90

Li Z, Nie F, Chang X, Yang Yi, Zhang C, Sebe N (2018a) Dynamic affinity graph construction for spectral clustering using multiple features. IEEE Trans Neural Netw Learn Syst 29(12):6323–6332

Article   MathSciNet   Google Scholar  

Li Z, Nie F, Chang X, Nie L, Zhang H, Yang Yi (2018b) Rank-constrained spectral clustering with flexible embedding. IEEE Trans Neural Netw Learn Syst 29(12):6073–6082

Li Z, Xu P, Chang X, Yang L, Zhang Y, Yao L, Chen X (2023) When object detection meets knowledge distillation: A survey. IEEE Trans Pattern Anal Mach Intell 45:10555–10579

Liu Y, Liu X, Liu X (2020) Real-time hand gesture recognition using motion history images. In: 2020 International Conference on image and vision computing New Zealand (IVCNZ), 2020, pp 1–6

Luqman H (2022) An efficient two-stream network for isolated sign language recognition using accumulative video motion. IEEE Access 10:93785–93798

Pigou L, Dieleman S, Kindermans PJ, Schrauwen B (2015) Sign language recognition using convolutional neural networks. In: Computer Vision-ECCV 2014 Workshops: Zurich, Switzerland, September 6–7 and 12, 2014, Proceedings, Part I 13, pp 572–578. Springer International Publishing

Rajaganapathy S, Aravind B, Keerthana B, Sivagami M (2015) Conversation of sign language to speech with human gestures. Proc Comput Sci 50:10–15

Sarma D, Bhuyan MK (2021) Methods, databases and recent advancement of vision-based hand gesture recognition for hci systems: a review. SN Comput Sci 2(6):436

Sethi A, Hemanth S, Kumar K, Bhaskara Rao N, Krishnan R (2012) Signpro—an application suite for deaf and dumb. IJCSET 2(5):1203–1206

Sharma P, Anand RS (2021) A comprehensive evaluation of deep models and optimizers for Indian sign language recognition. Graph vis Comput 5:200032

Starner T, Pentland A (1997) Real-time american sign language recognition from video using hidden Markov models. Motion-based Recognit, pp 227–243

Sunitha KA, Saraswathi PA, Aarthi M, Jayapriya K, Lingam S (2016) Deaf mute communication interpreter-a review. Int J Appl Eng Res 11:290–296

Tan M, Pang R, Le QV (2020).Efficientdet: scalable and efficient object detection. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition, pp 10781–10790

Vogler C, Metaxas D (2004) Handshapes and movements: multiple-channel ASL recognition Lecture Notes in Artificial Intelligence Springer. In: Lecture Notes in Artificial Intelligence, 2915, 247–258

Wang X, Zhang L, Zhang J (2019) “Hand Gesture Recognition using Temporal Convolutional Networks,” in. International Conference on Robotics and Automation (ICRA) 2019:1–6

Yan C, Chang X, Li Z, Guan W, Ge Z, Zhu L, Zheng Q (2021) Zeronas: differentiable generative adversarial networks search for zero-shot learning. IEEE Trans Pattern Anal Mach Intell 44(12):9733–9740

Zhang L, Chang X, Liu J, Luo M, Li Z, Yao L, Hauptmann A (2022) Tn-zstad: Transferable network for zero-shot temporal activity detection. IEEE Trans Pattern Anal Mach Intell 45(3):3848–3861

Zhou R, Chang X, Shi L, Shen Y-D, Yang Yi, Nie F (2019) Person reidentification via multi-feature fusion with adaptive graph learning. IEEE Trans Neural Netw Learn Syst 31(5):1592–1601

Download references

Author information

Authors and affiliations.

Computer Science & Engineering Department, JIS College of Engineering, Kalyani, West Bengal, India

Pranati Rakshit, Sarbajeet Paul & Shruti Dey

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Pranati Rakshit .

Ethics declarations

Conflict of interest.

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Ethical statement

We declare that there are no ethical issues for human or animal rights in the work presented here.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Rakshit, P., Paul, S. & Dey, S. Sign language detection using convolutional neural network. J Ambient Intell Human Comput (2024). https://doi.org/10.1007/s12652-024-04761-7

Download citation

Received : 19 May 2023

Accepted : 24 January 2024

Published : 26 March 2024

DOI : https://doi.org/10.1007/s12652-024-04761-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Sign language
  • Convolutional Neural Network
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. (PDF) A Comprehensive Review of Sign Language Recognition: Different

    literature review for sign language recognition

  2. (PDF) Sign language recognition through kinect based depth images and

    literature review for sign language recognition

  3. (PDF) Sign Language Recognition Using Deep Learning and Computer Vision

    literature review for sign language recognition

  4. (PDF) Sign Language Recognition: High Performance Deep Learning

    literature review for sign language recognition

  5. (PDF) Sign Language Recognition Systems: A Decade Systematic Literature

    literature review for sign language recognition

  6. (PDF) American sign language recognition with the kinect

    literature review for sign language recognition

VIDEO

  1. Sign Language Recognition

  2. FinalProject XLA

  3. multi sign Language Recognition

  4. Sign language recognition using CNN || Final year project 2019-2023 || Group 42 || NIET || AKTU

  5. Real-time Vernacular Sign Language Recognition using MediaPipe and Machine Learning

  6. Sign Language Recognition Software using C# OpenCV (CvBlob- ConvexHull

COMMENTS

  1. Sign Language Recognition Systems: A Decade Systematic Literature Review

    Despite the importance of sign language recognition systems, there is a lack of a Systematic Literature Review and a classification scheme for it. This is the first identifiable academic literature review of sign language recognition systems. It provides an academic database of literature between the duration of 2007-2017 and proposes a classification scheme to classify the research articles ...

  2. Sign Language Recognition Systems: A Decade Systematic Literature Review

    This is the first identifiable academic literature review of sign language recognition systems. It provides an academic database of literature between the duration of 2007-2017 and proposes a ...

  3. Systematic Literature Review: American Sign Language Translator

    Sign Language Recognition (SLR) is a relatively popular research area yet contrary to its popularity, the implementation of SLR in daily basis is rare; this is due to the complexity and various resources required. In this literature review, the authors have analyzed various techniques that can be used to implement an automated sign-language ...

  4. Sign Language Recognition Using the Electromyographic Signal: A

    This review demonstrates that sign language analysis and recognition, which recognizes signs using EMG signals, is a very recent and emerging area of research. Most of the studies reviewed use both sEMG and IMU data, while a relatively limited number of studies only use sEMG data for sign language gesture recognition.

  5. (PDF) Sign Language Recognition

    Sign Language Recognition (SLR) deals with recognizing the hand gestures acquisition and continues till text or speech is generated for corresponding hand gestures. ... LITERATURE REVIEW . 2.1 ...

  6. Machine learning methods for sign language recognition: A critical

    The literature review presented in this paper shows the importance of incorporating intelligent solutions into the sign language recognition systems and reveals that perfect intelligent systems for sign language recognition are still an open problem. ... There are limited numbers of researchers who have used mean filters in sign language ...

  7. Sign Language Recognition: A Deep Survey

    The remainder of this paper is organized as follows. Section 2 includes a brief review of Deep Learning algorithms. Section 3 presents a taxonomy of the sign language recognition area. Hand sign language, face sign language, and human sign language literature are reviewed in Sections 4, 5, and 6, respectively.Section 7 presents the recent models in continuous sign language recognition.

  8. A Systematic Literature Review on the Robustness of Sign Language

    The Sign Language Recognition System (SLRS) is a cutting-edge technology that aims to enhance communication accessibility for the deaf community in India by replacing the traditional approach of ...

  9. [2204.03328] A Comprehensive Review of Sign Language Recognition

    A machine can understand human activities, and the meaning of signs can help overcome the communication barriers between the inaudible and ordinary people. Sign Language Recognition (SLR) is a fascinating research area and a crucial task concerning computer vision and pattern recognition. Recently, SLR usage has increased in many applications, but the environment, background image resolution ...

  10. Artificial Intelligence Technologies for Sign Language

    Previous literature reviews mainly concentrate on specific sign language technologies, such as video-based and sensor-based sign language recognition [3,4,5,6,7] and sign language translation [8,9].Lately, with the development of sign language applications, there are also reviews that presented sign language systems to facilitate hearing-impaired people in teaching and learning, as well as in ...

  11. Sign Language Recognition Using the Electromyographic Signal: A ...

    This prompted us to conduct a comprehensive study on the methods, approaches, and projects utilizing EMG sensors for sign language handshape recognition. In this paper, we provided an overview of the sign language recognition field through a literature review, with the objective of offering an in-depth review of the most significant techniques.

  12. A Comparative Review on Applications of Different Sensors for Sign

    Considering the literature review, a deep analysis of challenges regarding the sign language recognition system is obtained. Issues and difficulties faced by developers are analyzed thoroughly. This section contains challenges regarding sensor-based, vision-sensor-based, and hybrid systems equipped for SL recognition.

  13. Sign Language Recognition Using the Electromyographic Signal: A

    This prompted us to conduct a comprehensive study on the methods, approaches, and projects utilizing EMG sensors for sign language handshape recognition. In this paper, we provided an overview of the sign language recognition field through a literature review, with the objective of offering an in-depth review of the most significant techniques.

  14. Deepsign: Sign Language Detection and Recognition Using Deep Learning

    The rest of the paper is organized as follows: Section 2 provides an overview of the current literature on state-of-the-art isolated and continuous SLR (sign language recognition). Section 3 describes the methodology for implementing an isolated SLR system for real-time sign language detection and recognition, which involves pre-processing ...

  15. Sign Language Recognition Systems: A Decade Systematic Literature Review

    This is the first identifiable academic literature review of sign language recognition systems and provides an academic database of literature between the duration of 2007-2017 and proposes a classification scheme to classify the research articles. Despite the importance of sign language recognition systems, there is a lack of a Systematic Literature Review and a classification scheme for it.

  16. PDF JOURNAL OF LA A Comprehensive Review of Sign Language Recognition

    JOURNAL OF LATEX CLASS FILES, VOL.XX, NO. X, APRIL 2022 1 A Comprehensive Review of Sign Language Recognition: Different Types, Modalities, and Datasets M. MADHIARASAN 1, Member, IEEE Department of Computer Science and Engineering, Indian Institute of Technology Roorkee, PARTHA PRATIM ROY2, Member, IEEE 2Department of Computer Science and Engineering, Indian Institute of Technology Roorkee

  17. [2204.03328] A Comprehensive Review of Sign Language Recognition

    Sign Language Recognition (SLR) is a fascinating research area and a crucial task concerning computer vision and pattern recognition. ... and the various feature extraction methods in SLR. Carried out a literature review concerning the manual and non-manual aspects of SLR in Section 3; Section 4 discusses and illustrates the classification ...

  18. A Review for Sign Language Recognition Techniques

    Musa Al-Yaman. Deaf people are using sign language for communication, and it is a combination of gestures, movements, postures, and facial expressions that correspond to alphabets and words in ...

  19. A Review of Real-Time Sign Language Recognition for Virtual Interaction

    This report provides guidance to researchers in assistive technology and sheds light on the possibility of realtime sign language detection in online meetings. It is vital for Deaf and hard of hearing persons to learn sign language, particularly in the digital age. A flawless virtual engagement requires realtime sign language translation. This evaluation is on how it may be used in virtual ...

  20. 25294 PDFs

    Explore the latest full-text research PDFs, articles, conference papers, preprints and more on SIGN LANGUAGE. Find methods information, sources, references or conduct a literature review on SIGN ...

  21. PDF Literature Review on Indian Sign Language Recognition System

    Table 1 shows the comparison of various Sign Language recognition systems while Table 2 describes the positives and negatives of the various sign language systems. The important steps involved in the sign language recognition system include preprocessing, feature extraction, and classification. 4.1 Pre-processing

  22. Sign language detection using convolutional neural network

    Sign language recognition is an important social issue to be addressed which can benefit the deaf and hard of hearing community by providing easier and fas. ... Alam KA, Jomhari N (2018) A systematic literature review on vision based gesture recognition techniques. Multimed Tools Appl 77:28121-28184. Article Google Scholar