163x Filetype PDF File size 1.22 MB Source: globaljournals.org
Global Journal of Computer Science and Technology: F Graphics & vision Volume 17 I ssue 2 Version 1.0 Year 2017 Type: Double Blind Peer Reviewed International Research Journal Publisher: Global Journals Inc. (USA) Online ISSN: 0975-4172 & Print ISSN: 0975-4350 Towards Arabic Alphabet and Numbers Sign Language Recognition By Ahmad Hasasneh & Sameh Taqatqa Palestine Ahliya University Abstract- This paper proposes to develop a new Arabic sign language recognition using Restricted Boltzmann Machines and a direct use of tiny images. Restricted Boltzmann Machines are able to code images as a superposition of a limited number of features taken from a larger alphabet. Repeating this process in deep architecture (Deep Belief Networks) leads to an efficient sparse representation of the initial data in the feature space. A complex problem of classification in the input space is thus transformed into an easier one in the feature space. After appropriate coding, a softmax regression in the feature space must be sufficient to recognize a hand sign according to the input image. To our knowledge, this is the first attempt that tiny images feature extraction using deep architecture is a simpler alternative approach for Arabic sign language recognition that deserves to be considered and investigated. Keywords: component; arabic sign language recognition, restricted boltzmann machines, deep belief networks, softmax regression, classification, sparse representation. GJCST-FClassification: I.5, I.7.5 TowardsArabicAlphabetandNumbersSignLanguageRecognition Strictly as per the compliance and regulations of: © 2017. Ahmad Hasasneh & Sameh Taqatqa. This is a research/review paper, distributed under the terms of the Creative Commons Attribution-Noncommercial 3.0 Unported License http://creativecommons.org/licenses/by-nc/3.0/), permitting all non-commercial use, distribution, and reproduction inany medium, provided the original work is properly cited. owards Arabic Alphabet and Numbers Sign T Language Recognition α σ Ahmad Hasasneh & Sameh Taqatqa Abstra ct- This paper proposes to develop a new Arabic sign features can be used as a reference to understand the language recognition using Restricted Boltzmann Machines differences among the classes. and a direct use of tiny images. Restricted Boltzmann Recognizing and documenting of ArSL have Machines are able to code images as a superposition of a only been paid attention recently, where few attempts 2017 limited number of features taken from a larger alphabet. have investigated and addressed this problem, see for Repeating this process in deep architecture (Deep Belief example [8]–[11]. The question of ArSL recognition is Year Networks) leads to an efficient sparse representation of the therefore a major requirement for the future of ArSL. It initial data in the feature space. A complex problem of 15 classification in the input space is thus transformed into an facilitates the communication between the deaf and easier one in the feature space. After appropriate coding, a normal people by recognizing the alphabet and softmax regression in the feature space must be sufficient to numbers signs of Arabic sign language to text or recognize a hand sign according to the input image. To our speech. To achieve that goal, this paper proposes a knowledge, this is the first attempt that tiny images feature new Arabic sign recognition system based on new extraction using deep architecture is a simpler alternative machine learning methods and a direct use of tiny approach for Arabic sign language recognition that deserves images. to be considered and investigated. The rest of the paper is organized as follows. Keywords: component; arabic sign language recognition, restricted boltzmann machines, deep belief Section2 presents the current approaches to Arabic alphabet sign language recognition (ArASLR). Section 3 networks, softmax regression, classification, sparse describes the proposed model for ArASLR. Conclusions representation. and future works are presented in section 4. I. Introduction ) II. Current Approaches (F ign language continues to be the best method to Studies in Arabic sign language recognition, Scommunicate between the deaf and hearing impaired. Hand gestures enable communication although not as advanced as those devoted to other between deaf people during their daily lives rather than scripts (e.g. Latin), have recently shown interest [8]– speaking. In our society, Arabic Sign Language (ArSL) is [11]. We have also seen that current research in ArSLR only known for deaf people and specialists, thus the has only been satisfactory for alphabet recognition with community of deaf people is narrow. To help people accuracy exceeding 98%. Isolate Arabic word with normal hearing communicate effectively with the recognition has only been successful with medium-size deaf and the hearing-impaired, numerous systems have vocabularies (less than 300 signs). On the other hand, been developed for translating diverse sign languages continuous ArSLR is still in its early stages, with very from around the world. Several review papers have been restrictive conditions. published that discuss such systems and they can be Current approaches on sign language found in [1]–[7]. recognition usually falls into two major approaches. The Generally, the process of ArSL recognition first one is sensors based approaches, which employs (ArSLR) can be achieved through two main phases: sensors attached to the glove. Look-up table software is detection and classification. In stage one, each given usually provided with the glove to be used for hand image is pre-processed, improved, and then the regions gesture recognition. Recent sensors based approaches of interest (ROI) is segmented using a segmentation can be found, for instance, in [11]–[14]. The second algorithm. The output of the segmentation process can approaches, vision-based analysis, are based on the Global Journal of Computer Science and Technology Volume XVII Issue II Version I thus be used to perform the sign recognition process. use of video cameras to capture the movement of the Indeed, accuracy and speed of detection play an hand that is sometimes aided by making the signer wear important role in obtaining accurate and fast recognition a glove that has painted areas indicating the positions of process. In the recognition stage, a set of features the fingers and the wrist then use those measurements (patterns) for each segmented hand sign is first in the recognition process. Image-based techniques extracted and then used to recognize the sign. These exhibit a number of challenges. These include: lighting Auth conditions, image background, face and hands or α σ: Information Technology Department Palestine Ahliya segmentation, and different types of noise. University Bethlehem, West Bank, Palestine. e-mails: ahasasneh@paluniv.edu.ps, sameh@paluniv.edu.ps ©2017 Global Journals Inc. (US) wards Arabic Alphabet and Numbers Sign Language Recognition To Among of image-based approaches, some focuses on static and simple moving gestures. The authors [15] introduced a method for automatic inputs are color images of the gestures. To extract the recognition of Arabic sign language alphabet. For skin blobs, the YCbCr space is used. The Prewitt edge feature extraction, Hus moments were used followed by detector is used to extract the hand shape. To convert support vector machines (SVMs) to perform the the image area into feature vectors, principal component classification process. A correct recognition rate of 87% analysis (PCA) is used with a K-Nearest Neighbor was achieved. Other authors in [16] developed a neuro- Algorithm (KNN) in the classification stage. Furthermore, fuzzy system. The proposed system includes five main the authors in [22] and [23] proposed a pulse-coupled steps: image acquisition, filtering, segmentation, and neural network (PCNN) ArSLR system able to hand outline detection, followed by feature extraction. compensate for lighting nonhomogeneity and Bare hands were considered in the experiments, background brightness. The proposed system showed 2017achieving a recognition accuracy of 93.6%. In [17], the invariance under geometrical transforms, bright authors proposed an adaptive neuro-fuzzy inference background, and lighting conditions, achieving a Yearsystem for alphabet sign recognition. A colored glove recognition accuracy of 90%. Moreover, the authors in was used to simplify the segmentation process, and [24] introduced an Arabic Alphabet and Numbers Sign 16 geometric features were extracted from the hand region. Language Recognition (ArANSLR). The phases of the The recognition rate was improved to 95.5%. In [18], the proposed algorithm consists of skin detection, authors developed an image-based ArSL system that background exclusion, face and hands extraction, does not use visual markings. The images of bare feature extraction, and also classification using Hidden hands are processed to extract a set of features that are Markov Model (HMM). The proposed algorithm divides translation, rotation, and scaling invariant. A recognition the rectangle surrounding by the hand shape into zones. accuracy of 97.5% was achieved on a database of 30 The best number of zones is 16 zones. The observation Arabic alphabet signs. In [19], the authors used of HMM is created by sorting zone numbers in recurrent neural networks for alphabet recognition. A ascending order depending on the number of white database of 900 samples, covering 30 gestures pixels in each zone. Experimental results showed that performed by two signers, was used in their the proposed algorithm achieves 100% recognition rate. experiments. The Elman network achieved an accuracy On the other hand, new systems for facilitating ) rate of 89.7%, while a fully recurrent network improved human machine interaction have been introduced F the accuracy to 95.1%. The authors extended their work recently. In particular, the Microsoft Kinect and the leap (by considering the effect of different artificial neural motion controller (LMC) have attracted special attention. network structures on the recognition accuracy. In The Kinect system uses an infrared emitter and depth particular, they extracted 30 features from colored sensors, in addition to a high resolution video camera. gloves and achieved an overall recognition rate of 95% The LMC uses two infrared cameras and three LEDs to [20]. capture information within its interaction range. A recent paper reviews the different systems However, the LMC does not provide images of detected and methods for the automatic recognition of Arabic objects. The LMC has recently been used for Arabic sign language can be found in [7]. It highlights the main alphabet sign recognition with promising results [25]. challenges characterizing Arabic sign language as well After presenting the different existing image- as potential future research directions. Recent works on based approaches that have been used to achieve image-based recognition of Arabic sign language ArASLR, we have noted that these approaches generally alphabet can be found in [9], [10], [21]–[25]. In include two main phases of coding and classification. particular, Naoum et al. [9] proposes an ArSLR using We have also seen that most of the coding methods are KNN. To achieve good recognition performance, they based on hand-crafted feature extractors, which are proposed to combine this algorithm with a glove based empirical detectors. By contrast, a set of recent analysis technique. The system starts by finding methods based on deep architectures of neural histograms of the images. Profiles extracted from such networks give the ability to build it from theoretical histograms are then used as input to a KNN classifier. considerations. Global Journal of Computer Science and Technology Volume XVII Issue II Version I Mohandes [10] proposes a more sophisticated ArSLR therefore requires projecting images onto recognition algorithm to achieve high performance of an appropriate feature space that allows an accurate ArSLR. The first attempt to recognize two-handed signs and rapid classification. Contrarily to these empirical from the Unified Arabic Sign Language Dictionary using methods mentioned above, new machine learning the CyberGlove and SVMs to perform the recognition methods have recently emerged which strongly related process. PCA is used for feature extraction. The authors to the way natural systems code images [26]. These in [21] proposed an Arabic sign language alphabet methods are based on the consideration that natural recognition system that converts signs into voice. The image statistics are not Gaussian as it would be if they technique is much closer to a real-life setup; however, have had a completely random structure [27]. The auto- recognition is not performed in real time. The system similar structure of natural images allowed the evolution ©20 1 Journa ls Inc. (US) 7 Global wards Arabic Alphabet and Numbers Sign Language Recognition To to build optimal codes. These codes are made of DBNs coupled with tiny images can also be successfully statistically independent features and many different used in the context of ArASLR. methods have been proposed to construct them from III. Proposed Model image datasets. Imposing locality and sparsity constraints in these features is very important. This is The methodology of this research mainly probably due to the fact that any simple algorithms includes four stages (see figure 1) which can be based on such constraints can achieve linear signatures summarized as follows: 1) data collection and image similar to the notion of receptive field in natural systems. acquisition, 2) image pre-processing, 3) feature Recent years have seen an interesting interest in extraction and finally 4) gesture recognition. computer vision algorithms that rely on local sparse a) Description of the Database image representations, especially for the problems of image classification and object recognition [28]–[32]. The alphabet used for Arabic sign language is Moreover, from a generative point of view, the displayed in Figure 2, left [38], will be used to 2017 effectiveness of local sparse coding, for instance for investigate the performance of the proposed model. In image reconstruction [33], is justified by the fact that an this database, the signer performs each letter Year natural image can be reconstructed by a smallest separately. Mostly, letters are represented by a static 17 posture, and the vocabulary size is limited. In this possible number of features. It has been shown that Independent Component Analysis (ICA) produces section, several methods for image-based Arabic sign localized features. Besides it is efficient for distributions language alphabet recognition are discussed. Even with high kurtosis well representative of natural image though the Arabic alphabet only consists of 28 letters, statistics dominated by rare events like contours; Arabic sign language uses 39 signs. The 11 additional however the method is linear and not recursive. These signs represent basic signs combining two letters. For two limitations are released by DBNs [34] that introduce example, the two letters “ال” are quite common in Arabic nonlinearities in the coding scheme and exhibit multiple (similar to the article “the” in English). Therefore, most layers. Each layer is made of a RBM, a simplified version literature on ArASLR uses these basic 39 signs. of a Boltzmann machine proposed by Smolensky [35] b) Image Pre-processing and Hinton [36]. Each RBM is able to build a generative The typical input dimension for a DBN is statistical model of its inputs using a relatively fast approximately 1000 units (e.g. 30x30 pixels). Dealing ) learning algorithm, Contrastive Divergence (CD), first with smaller patches could make the model unable to F introduced by Hinton [36]. Another important extract interesting features. Using larger patches can be ( characteristic of the codes used in natural systems, the extremely time-consuming during feature learning. sparsity of the representation [26], is also achieved in Additionally the multiplication of the connexion weights DBNs. Moreover, it has been shown that these acts negatively on the convergence of the CD algorithm. approaches remain robustness to extract local sparse The question is therefore how could we scale the size of efficient features from tiny images [37]. This model has realistic images (e.g. 300x300 pixels) to make them been successfully used in [32] to achieve semantic appropriate for DBNs? place recognition. The hope is to demonstrate that Global Journal of Computer Science and Technology Volume XVII Issue II Version I Figure 1: Proposed model ©2017 Global Journals Inc. (US)
no reviews yet
Please Login to review.