Amirreza Ahmadnejad-Ahmad Mahmoodian

B.Sc, University of Esfahan

Smartphones That Reads Minds

Abstract

Speech communication in acoustic environments with more than one speaker can be extremely challenging for hearing-impaired listeners. Assistive hearing devices have seen substantial progress in suppressing background noises that are acoustically different from speech but they cannot enhance a target speaker without knowing which speaker the listener is conversing with. Recent discoveries of the properties of speech representation in the human auditory cortex have shown an enhanced representation of the attended speaker relative to unattended sources. These findings have motivated the prospect of a brain controlled assistive hearing device to constantly monitor the brainwaves of a listener and compare them with sound sources in the environment to determine the most likely talker that a subject is attending to. Then, this device can amplify the attended speaker relative to others to facilitate hearing that speaker in a crowd. This process is termed auditory attention decoding. Multiple challenging problems, including nonintrusive methods for neural data acquisition and optimal decoding methods for accurate and rapid detection of attentional focus, must be resolved to realize a brain-controlled assistive hearing device. In addition, we have only a mixture of sound sources in realistic situations that can be recorded with one or more microphones. Because the attentional focus of the subject is determined by comparing the brainwaves of the listener with each sound source, a practical AAD system needs to automatically separate the sound sources in the environment to detect the attended source and subsequently amplify it. One solution that has been proposed to address this problem is beamforming; in this process, neural signals are used to steer a beamformer to amplify the sounds arriving from the location of the target speaker. However, this approach requires multiple microphones and can be beneficial only when ample spatial separation exists between the target and interfering speakers. An alternative and possibly complementary method is to leverage the recent success in automatic speech separation algorithms that use deep neural network models.
In one such approach, neural networks were trained to separate a pretrained, closed set of speakers from mixed audio. Next, separated speakers were compared with neural responses to determine the attended speaker, who was then amplified and added to the mixture. Although this method can help a subject interact with known speakers, such as family members, this approach is limited in generalization to new, unseen speakers, making it ineffective if the subject converses with a new person, in addition to the difficulty of scaling up to a large number of speakers.