Video conference is open to all people, which should also include users who use sign language to communicate. However, since most video conference systems automatically track the speaker’s prompt window, it is difficult for sign language communicators to communicate easily and effectively.
Therefore, it is very challenging to use real-time sign language detection in video conference. The system needs to use a lot of video feedback as input for classification, which makes the task calculation very heavy. To some extent, the existence of these challenges also leads to the lack of research on sign language detection.
Recently, at the ECCV 2020 and slrtp 2020 global summit, Google’s research team proposed a real-time sign language detection model, and described in detail how this model will be used to identify “speakers” in video conferencing systems.
2. Proof of concept
In the actual scene, having a complete sign language detection model is only the first step. The team also needs to design a method to start the active speaker function of the video conference system. The team has developed a lightweight online sign language detection demo that can be connected to any video conferencing system and set sign language communicators as “speakers.”.
When the gesture detection model determines that the user is communicating with sign language, it will transmit ultrasonic audio through the virtual audio cable, which can be detected by any video conference system, just like the sign language communicator is “talking”. Audio is transmitted at 20kHz, usually outside the range of human hearing. Because video conferencing systems usually use the volume of audio as a standard to detect whether they are speaking, rather than detecting voice, applications can mistakenly assume that sign language communicators are speaking.
At present, the online video demo source code of this model has been published on GitHub.
GitHub portal: https://github.com/AmitMY/sign-language-detector
3. Demonstration process
In the video, the research team demonstrated how to use the model. The Yellow chart in the video reflects the confirmation value of the model when sign language communication is detected. When the user uses sign language, the chart value will increase to nearly 100, and when the user stops using sign language, the chart value will decrease to 0.
Google new research: let aphasia communicate with sign language in video conference!
In order to further verify the effect of the model, the team also conducted a user experience feedback survey. Participants were asked to use the model during the video conference and communicate in sign language as usual. They were also asked to use sign language with each other to detect switching to speakers. The feedback result is that the model detects sign language, recognizes it as audible speech, and successfully identifies the gesture participants.
From the present point of view, the starting point of this attempt and the operability of a series of methods adopted in the process are based on the scene landing as the starting point. Although there may be more unexpected mass user needs from the practical application, such as the huge difference of sign language in different countries and regions, how to abstract these capabilities to meet more people will be the next step This work can be truly implemented in the business environment, which requires positive thinking.
Reference link: https://ai.googleblog.com/2020/10/developing-real-time-automatic-sign.html