Stanford Students Develop Speech-to-Text Glasses

A team of students at Stanford University has developed a pair of glasses that can convert spoken language into written text in real-time for deaf and hard-of-hearing users. The innovation is part of a broader trend in emerging assistive technologies designed for educational and everyday use. It follows other recent advancements like OpenBMB's multimodal LLM, which can also perform complex vision-language tasks.

- The project, known as TranscribeGlass, was co-founded by Stanford Master's student Tom Pritsky and Yale student Madhav Lavakare. Lavakare was inspired to create the device after a friend in high school dropped out due to communication difficulties related to hearing loss. - The glasses function by receiving a Bluetooth signal from existing speech-to-text mobile applications, such as Google's Live Transcribe or Otter.ai, and then projecting the text as an augmented reality display onto the inside of the lens. - The initial beta version of TranscribeGlass was priced at $55, with the final consumer version expected to cost around $95, presenting a significantly more affordable alternative to traditional hearing aids that can cost thousands of dollars. - Early funding for the project was secured in 2020 through support from the Indian Institute of Technology in Delhi, as well as grants from both the Indian and U.S. governments. - This innovation is part of a growing field of haptic and sensory assistive technologies, which includes developments like the SoundShirt, a garment that allows deaf and hard-of-hearing users to experience the feeling of music through vibrations. - The technology builds on advancements in Artificial Intelligence that are also being integrated directly into modern hearing aids. These new devices use AI algorithms and advanced computer chips to mimic how the human brain processes sound, allowing them to better separate speech from background noise. - While current assistive applications often rely on single-function AI, the development of multimodal large language models (LLMs) points toward future devices that can interpret a combination of text, audio, and visual inputs to better understand user intent and environmental context.

Get your own daily briefing

Scout delivers personalized news, insights, and conversations tailored to your role and industry.

Download on the App Store

Shared from Scout - Be the smartest in the room.