Keywords

1 Introduction

Sign language (SL) is the primary mode of communication used by the Deaf community (signers). It involves gestural components such as arm, and hand movements. Non-manual signals are also used to convey messages. Combining these different components, sign language on its own is different to comprehend. We applied this framework in the Philippine scenario where statistics show that there is one (1) interpreter for every 53,000 Filipino signers. Human Interpreters mediate communication between signers and non-signers. Moreover, having sufficient knowledge on sign language is necessary to communicate with signers. This means there is a gap that needs to be addressed for the Deaf community. Several attempts have been made to develop interpreter systems for American Sign Language (ASL) such as [12, 13]. However these studies are limited to their visual boundaries and confining environments. We believe that interpreter systems will be more usable if we enable a free, seamless approach to understanding sign language gestures that are not entirely vision-based. In this paper, we intend to discover if we can utilize the Myo Armband as an alternative device that can be used for recognizing sign language. This way, recognizing gestures are not restricted by vision-based constraints such as lighting, angles as the inputs are mostly gestured based. We believe that such interaction design would be considered more usable and would support our proposed framework. This is supported by a discussion of the user insights that we are able to acquire from the user study in the latter parts of this paper. In the long run we envision signers to be able to communicate directly with non-signers by being able to gesture freely while wearing their armbands, sending the message through a mobile device that can be received by non signers. Through this, they would be able to seamlessly send their intended message without the need for an interpreter and in an approach that is empowered by technology (Fig. 1).

Fig. 1.
figure 1

MyoSL Methodology Framework showing the different components

In this work-in-progress, we discuss our attempts to gather user insights regarding the use of two EMG-based gestural armbands by signers. Also, we attempt to model and identify certain scenarios and expressions that are most convenient and natural. While we understand that there is a large domain at hand, we limited our library of words into a specific scenario. We will also discuss the data collection process and how the EMG data for two gestural devices would look alike which will be then used to generate a model that can identify words. In this study, we have used both American Sign Language (ASL) and Filipino Sign Language (FSL) in order to see if there may be also differences in the terms, gestures and translation. In the succeeding sections we shall also discuss our findings and the next steps needed to complete the study.

2 Related Work

Several systems have been made for sign language interpretation. Each one focuses on different aspects of user needs. [11] created an extendable system which uses a camera placed on top of the brim of a cap to keep track of the hands of the user. The system can track the hands with or without a glove. The usage of gloves does affect the accuracy of the system though. However, for natural scenarios, they stated that the system would be unpleasant for signers since head movements are included in conversational sign language. Signers will also not be able to wear a baseball cap wherever they go. Like the work of [11], SignSpeak, an EU funded project, focuses on recognition and translation of sign language through vision-based input. However, according to [3], difficulties for a vision-based input appear because of different environment assumptions. SignSpeak was only developed for close-world environments with simple backgrounds coupled with special gloves for tracking.

Another vision-based system makes use of the Kinect. The Kinect features a camera with a depth sensor that is capable of tracking body movements [6]. Complex backgrounds and illumination conditions affects hand tracking which makes sign language translation through visual-input difficult. Due to the Kinect’s depth sensor, the hand and body action can be tracked easier while maintaining accuracy without the need of special backgrounds [9]. However, practical use of the Kinect is only a partial solution. In terms of portability, its dimensions and the need to be plugged does not allow it to be conveniently carried by the user [5]. To be able to make a portable system, [2] made Sensory Glove. It was designed to translate ASL alphabet into text on a mobile phone. The glove transmits data into an Android phone through bluetooth connection which displays the translated text. Since the system only uses one glove, only a few words can be translated aside from the ASL alphabet. Unfortunately, the glove has an obtrusive design with exposed wires.

3 Methods

3.1 Participants

We have two (2) sets of study groups that participated in this study. The first group involved the thorough understanding of User-Signer needs through a series of User Research studies. The second group took part in the data collection that will be used for the initial model. Eleven (11) signers aged 18–24 were recruited through snowball sampling method in order to take part in the user study part of our framework. The impairments of the participants varied from complete deafness, partial deafness, complete muteness, and partial muteness. They participated in focus group discussions which were used to better help pinpoint the pains and gains of each user type. Additionally, these participants gave insights on the different activities they usually do and its corresponding struggles they experience every day. From these needfinding activities, we were able to derive personas which will be later discussed in the succeeding sections.

Ten (10) signers aged 21–31 were gathered through snowball sampling method took part in the initial usability testing of the two Myo armbands. Their average years of experience in using Sign Language is 9.22 years. The level of expertise varies from beginner (0–10 years), intermediate (10–20), all the way to expert (20 and above). It is important to note that all of the participants are the target end-users of the product. See Table 1 for the demographic information of the usability testing participants.

Table 1. Signer demographics (Usability testing)
Table 2. Framework elements

3.2 Study Design

Our methodology in this study has been divided into four (4) major components that define the framework towards a more accessible and usable product. See Table 2 for the specific elements. In short, we can refer to these elements as (E1) User Study, (E2) Data Modeling, (E3) Model Implementation and (E4) Usability Testing. In addition to that, a user-centric design was employed in every stage of this framework. This enables us to have a constant communication and collaboration with our participants from the Deaf community. As seen in Fig. 5, the third and fourth part of the framework is done repetitively. This is designed to continuously develop and improve the system based on the results gathered from the usability tests of the participants. It is also important to note that only the first three parts are included in this paper.

User study is an important aspect of the design thinking process which can greatly improve the user experience of a product [8]. In achieving a user-centered design, it is essential to first empathize with the target users. This is done to gain an understanding on the user’s needs and preferences as well as their tasks within the context of our system. These are the three (3) main goals of our user study:

  1. 1.

    To know more about the users and determine what is important for them

  2. 2.

    To know the way they do things and why

  3. 3.

    To understand the difficulties and pain points in interpreting Sign Language

In this stage, we conducted Focus Group Discussions (FGD) between eleven (11) signers and two (2) interpreters. FGD is a research strategy where people from similar backgrounds gathered together in order to discuss a specific topic. The participants are chosen purposely based on their common characteristics. It has a facilitator that guides the participants to express their feelings freely in order to have a natural discussion among themselves [1]. It also allows each participant to agree or disagree with each other that can show the range of opinions on how the group thinks about a certain issue in terms of their experiences and beliefs. Based from that, a general view can be established based from the stimulation of ideas of each participant [10]. The main focus of this FGD is to gather opinion on how they feel and think on the context of our study.

3.3 Data Modeling

In order to create the machine learning model using EMG, the ten (10) participants took part in the model building stage. Their average years of experience in sign language is 9.22 years. Most of them started as early as they were born. Each participant wore two Myo armbands in both of his/her forearm and were asked to perform different FSL signs to produce the gesture data. Each sign is done repetitively up to five (5) times to provide a stronger data. The EMG, acceleration, and orientation data of each sign were captured by the armband’s 8 EMG Channels Fig. 3 and 3D IMU Sensors. All of the captured data is transmitted to a laptop via Bluetooth. We will sync the data from both of the Myo armbands. This is done so that the data in both armbands are synchronized. Timestamps will be used to ensure the data that are captured at the same time. A sampling rate of 50 hz will be taken from the armbands. An SDK is provided to do this procedure. Once data is gathered, the first and last second of each sign will be trimmed. This process removes noise data. We will have a rest position that will help us determine the start and end of a gesture and make it easier to apply Dynamic Time Warping (DTW) to the sample. After trimming the data, DTW will be applied to be able to standardize the length of the gesture. The reason behind this step is that same gestures can be performed in varying speeds. DTW allows the system to identify which gestures are the same while allowing said gestures to be spread over a variable length of time. With a standardized length, the data will then be normalized to be able to spread the data across a smaller feature space. The normalization will be done by scaling all the data from 0–1. This can be done with the help of tools such as RapidMiner which will automate this step. The data collected will then undergo feature extraction. The study will adapt well-established features from previous studies. Both EMG features, and accelerometer and gyroscope features will be used. Extracted features will be placed under InfoGain attribute evaluator along with the Ranker search technique to determine the features that would be most useful for the dataset (Fig. 2).

Fig. 2.
figure 2

Data modeling process

Fig. 3.
figure 3

EMG Visualization of the sign ‘Ano’ (What) and ‘Oras’ (Time)

3.4 Model Implementation

The model was initially proposed to be implemented in Android operating system with the help of the Myo SDK (Fig. 4).

The system involves four processes particularly, processing input, recording input, recognition techniques, and projecting output. It utilizes two Myo armbands which are connected to a smartphone. First, the Myo’s sensors will capture the data. It will then transmit the data to the computer via Bluetooth. Then, the computer will record the input. Subsequently, recognition techniques will be employed to identify the gestures from the built-in model. Once the application identifies a match in the model, the translation will be voiced out by the speakers of the computer for the non-signers. The translation is also shown on the screen.

Fig. 4.
figure 4

System architecture

So far we have just started building the dataset of words. We still need to train the model to recognize these words. Concerns were also brought up with regards to the processing power and speed of processing regarding the use of a smartphone. We decided to first build the model to be run on a computer so that the processing time could be handled better. A faster translation and processing time will improve the user experience and make the interaction smoother. With regards to translation the words will first be collected as text and then some NLP techniques will be performed to smooth out the sentences and make them easier on the ears. One example being when the user performs the gestures “what time” “store” “open”, the system will then output the phrase “what time will the store open?”.

3.5 Usability Testing

Usability testing can be done in several ways, but each of them has these common five characteristics [4]:

  1. 1.

    The objective of each test is to improve the usability of a product

  2. 2.

    The participants are the end-users

  3. 3.

    The participants do real tasks that are associated with the system

  4. 4.

    All actions of the of the participants are observed and recorded

  5. 5.

    The results are analyzed to determine the real problems and to recommend solutions to fix them

Fig. 5.
figure 5

Usability testing framework

This is done to learn more about the system, specifically its strengths and weaknesses. Additionally, feedbacks from the end-users are solicited. Along the process, both high and low fidelity feedback will be considered and immediately integrated in the system. Here, a usability testing was conducted by allowing the deaf signers to wear the two armbands and run through a low fidelity prototype of the interpreter system. Questions were raised to better understand if the armband was intrusive when it came to the signers gestures. Questions about its level of comfort, weight, and position, gave us insights on how to make the experience more enjoyable for the users. The questions are answered on a 1–4 scale, 1 being the lowest (strongly disagree) and 4 being the highest (strongly agree). See Table 3 for the list of questions about the Myo armband.

Table 3. Myo Armband Related Questions

3.6 Experiment Design

To validate our framework, we proposed three different experiment setups as seen in Fig. 6 namely (1) Signer-Non-signer conversation with a human interpreter, (2) Signer-Non-signer conversation using only one Myo armband and (3) Signer-Non-signer using two Myo armbands. These experiments will be done in a closed environment with camera recordings to preserve artifacts of this study. In this approach we can benchmark both traditional and existing frameworks as compared to our proposed framework. Furthermore, Software Usability Measurement Inventory (SUMI) [7], and Cognitive Tools will be incorporated. This is to quantitatively measure the impact of the changes we made in the proposed framework. Additionally, a series of interviews and questionnaires will be conducted with the participants. Thus, a more comprehensive feedback is captured.

Fig. 6.
figure 6

Experiment design

In the experiments, the participants were given a set of expressions in purchasing scenarios. We ensure that each participants start on equal footing and try to achieve a smooth flow of conversations. While the participants will be communicating with each other, we observed the time it takes for them to respond to the expression, the frustrations they had while conversing, and the total time it took for them to finish the set of expressions. After the experiment, we verified on how the experience was from their conversation with each other.

4 Results

In this preliminary study, there were three results that were produced: the results of the user study, the initial machine learning model, and the user feedback/insights collected during the usability testing.

Based from the results, we discovered that majority of the signers are reliant on interpreters. They stated that interpreters can make their sentences short which is faster than the time it takes if they write or type what they want to say. Furthermore, sign language has a different syntax which makes the grammar of their translations uncommon. The signers also stated that interpreters are highly needed especially in hospital, employment, and purchasing situations. In regards to interpreter systems, they are aware that there are already existing systems. Some of them have already tried some of these systems. One system is a video relay service. A signer connects with the service, and the interpreter relays the interpreted message to the intended person with a telephone. However, majority of the signers stated that these systems are either expensive or available to more developed countries. They end up relying on interpreters, writing, or online messaging. Results of the focus group discussion highlights the existing gap for the Deaf community. We stated awhile ago that there is only one (1) interpreter for 53,000 Filipino signers. Signers expressing that they highly need interpreter will show how a sign language interpretation system will be able to help them. Despite the fact that there are current interpretation systems, its pricing or availability is the problem. Our proposed system uses the Myo armband which is affordable and available in every market.

Fig. 7.
figure 7

User feedback

Fig. 8.
figure 8

The aesthetic of the armband is appealing.

Fig. 9.
figure 9

The armband is not intrusive when I try to perform my gestures.

We also conducted usability tests on a group of nine deaf students. The meeting was facilitated by an interpreter. The goal of the survey was to find out if the signers were comfortable with the armband and if it limits their movement to perform signs. We also wanted to know if they found the aesthetic pleasing and if the vibrations of the armband were intuitive. The results of the initial usability tests showed that 90% of respondents found the armband comfortable. All participants said that the vibrations were intuitive, and they found the placement of the armband comfortable. All participants also responded that the armband was not intrusive when it came to performing gestures. 80% of the participants stated that the aesthetic of the armband was pleasing and would usually go well with what they wear despite the fact that the armband cannot be worn over clothes. They also said that the weight of the armband was comfortable. Regarding the armband’s comfortability, 10% did not find it comfortable because of their body size. Since the armband has rubbers to keep it snug to the wearer, different body types react differently to its comfortability. Small frames may have the armband moving up and down their arms, and big frames may have the armband too tight on their arms (Fig. 7).

The two armbands are worn on both upper forearms of the user. The position allows the armband not to be intrusive when signers will be gesturing signs. The weight and slimness of the armband permits signers to have full range of motion. At the same time, the armband does not tire their arms out. However, 20% did not find the armband aesthetically pleasing due to the fact that it has a futuristic look. Moreover, it may disrupt their fashion style. If the signer is wearing long sleeves, they would have to have a bulge in the middle in their arms which may have made them find the armband not aesthetically pleasing (Figs. 8 and 9).

5 Conclusion

Before any development or data collection was be done, we first decided to conduct specific user research so as to better understand the pain points of our users. This part of the framework is especially effective in determining how to properly address the problems the users encounter and the situation or problem that we want to solve. Through this portion of the research, we were able to find out which specific situations the signers wanted to be able to interact with the hearing.

Ten (10) participants took part in the model building stage to create our machine learning model. Participants wore two Myo armbands their forearms, and each was asked to perform sixty (60) different FSL signs to produce the gesture data five (5) times each. EMG, acceleration, and orientation data of each sign were captured and transmitted to a laptop via Bluetooth.

We then evaluated user experience by testing the model. The participants wore the armband and calibrated it to be able to read their gestures. They were then asked to perform some gestures to see if the software would be able to detect which gesture they were performing. After the activity, they were asked to answer a survey regarding the ease of use of the system. The participants answered questions about how comfortable or intrusive the armband was, and if they found the system intuitive. Based on the results of the survey majority of the users found the system easy to use and the armband comfortable enough to wear for extended periods of time.

6 Future Work

The Myo armband is a promising interface for collecting hand gestures. However, in terms of interpreting sign language, hand gestures are not enough to interpret the entire conversation. Facial expressions and body movement are indeed important in terms of applying context in a given language. A way to capture Facial expressions and integrate them into the system would greatly help in improving this study.

The application was also limited to techniques such as Dynamic Time Warping and Support Vector Machines in our data processing. Although these techniques both yielded quality results. Other variations might provide higher accuracy or faster processing times. In addition to Filipino Sign Language Recognition, this system can be modified for other forms of sign language such as American Sign Language, Chinese Sign Language, etc.

The study could also be extended to include a deeper natural language processing portion in which grammar for different languages could be selected as an output. An example being FSL being the input but the system outputting to spoken english would be a nice quality of life improvement. The implications this has on the system would mean that the part where the data is processed and then compared to templates grammar templates would have to be redone for different languages.