Speech Recognition Techniques For Robustness In Adverse Environments, E.g., In Noise, Of Stress Induced Speech, Etc. (epo) Patents (Class 704/E15.039)
-
Patent number: 11984110Abstract: A device operates to perform acoustic echo cancellation. The device includes a speaker to output a far-end signal at the device, a microphone to receive at least a near-end signal and the far-end signal from the speaker to produce a microphone output, and an AI accelerator operative to perform neural network operations according to a first neural network model and a second neural network model to output an echo-suppressed signal. The device further includes a digital signal processing (DSP) unit. The DSP unit is operative to perform adaptive filtering to remove at least a portion of the far-end signal from the microphone output to generate a filtered near-end signal, and perform Fast Fourier Transform (FFT) and inverse FFT (IFFT) to generate input to the first neural network model and the second neural network model, respectively.Type: GrantFiled: March 7, 2022Date of Patent: May 14, 2024Assignee: MEDIATEK SINGAPORE PTE. LTD.Inventors: Xiaoxi Yu, Hantao Huang, Ziang Yang, Chia Hsin Yang, Li-Wei Cheng
-
Patent number: 11917384Abstract: Disclosed herein are systems and methods for processing speech signals in mixed reality applications. A method may include receiving an audio signal; determining, via first processors, whether the audio signal comprises a voice onset event; in accordance with a determination that the audio signal comprises the voice onset event: waking a second one or more processors; determining, via the second processors, that the audio signal comprises a predetermined trigger signal; in accordance with a determination that the audio signal comprises the predetermined trigger signal: waking third processors; performing, via the third processors, automatic speech recognition based on the audio signal; and in accordance with a determination that the audio signal does not comprise the predetermined trigger signal: forgoing waking the third processors; and in accordance with a determination that the audio signal does not comprise the voice onset event: forgoing waking the second processors.Type: GrantFiled: March 26, 2021Date of Patent: February 27, 2024Assignee: Magic Leap, Inc.Inventors: David Thomas Roach, Jean-Marc Jot, Jung-Suk Lee
-
Patent number: 11863938Abstract: The present application relates to a hearing aid adapted to be worn in or at an ear of a hearing aid user and/or to be fully or partially implanted in the head of the hearing aid user.Type: GrantFiled: May 27, 2022Date of Patent: January 2, 2024Assignee: Oticon A/SInventors: Thomas Lunner, Lars Bramsløw
-
Patent number: 11790935Abstract: In some embodiments, a first audio signal is received via a first microphone, and a first probability of voice activity is determined based on the first audio signal. A second audio signal is received via a second microphone, and a second probability of voice activity is determined based on the first and second audio signals. Whether a first threshold of voice activity is met is determined based on the first and second probabilities of voice activity. In accordance with a determination that a first threshold of voice activity is met, it is determined that a voice onset has occurred, and an alert is transmitted to a processor based on the determination that the voice onset has occurred. In accordance with a determination that a first threshold of voice activity is not met, it is not determined that a voice onset has occurred.Type: GrantFiled: April 6, 2022Date of Patent: October 17, 2023Assignee: Magic Leap, Inc.Inventors: Jung-Suk Lee, Jean-Marc Jot
-
Patent number: 11765501Abstract: Methods and systems for identifying abnormal sounds in a particular environment. A normal audio stream obtained in the absence of abnormal sounds may be used as a baseline for subsequently processing an incoming audio stream with a processor to determine whether the incoming audio stream from the microphone in the particular environment includes an abnormal audio event for the particular environment. When it is determined that the incoming audio stream includes an abnormal audio event for the particular environment an electronic database may be accessed to determine a location of the abnormal audio event in the particular environment. A video camera with a field of view that includes the location of the abnormal audio event in the particular environment may be identified and the video stream from the identified video camera retrieved and displayed.Type: GrantFiled: March 10, 2021Date of Patent: September 19, 2023Assignee: HONEYWELL INTERNATIONAL INC.Inventors: Lalitha M. Eswara, Syed Omar Khaiyam, Siddharth Sonkamble, Deepak Kaul, K Karthikeyan
-
Patent number: 11721334Abstract: A method and apparatus for controlling a device according to an embodiment of the present disclosure may be based on a speech feature of a user reflecting the Lombard effect so as to operate a device located far away from the user, among a plurality of electronic devices. As such, even when the user calls a device located far away from the user without any separate context information, speech recognition neural networks and weight calculation neural networks may be selected and used to operate the device located far away from the user, and reception of a speech signal of the user calling a device located far away from the user may be performed in an Internet of Things (IoT) environment using a 5G network.Type: GrantFiled: March 5, 2020Date of Patent: August 8, 2023Assignee: LG ELECTRONICS INC.Inventors: Jong Hoon Chae, Minook Kim, Yongchul Park, Sungmin Han, Siyoung Yang, Sangki Kim, Juyeong Jang
-
Patent number: 11683638Abstract: A modular speaker system, comprising an exoskeleton, configured to mechanically support and quick attach and release at least one functional panel and an electrical interface provided within the exoskeleton, configured to mate with a corresponding electrical connector of the functional panel. An optional endoskeleton is provided to support internal components. The system preferably provides a digital electronic controller, and the electrical interface is a digital data and power bus, with multiplexed communications between the elements of the system. The elements of the system preferably include at least one speaker, and other audiovisual and communications components. Multiple modules may be interconnected, communicating through the electrical interface. A base module may be provided to provide power and typical control, user and audiovisual interface connectors.Type: GrantFiled: July 4, 2022Date of Patent: June 20, 2023Assignee: Sonic Blocks, Inc.Inventors: Scott D. Wilker, Jordan D. Wilker
-
Patent number: 11631404Abstract: Audio distortion compensation methods to improve accuracy and efficiency of audio content identification are described. The method is also applicable to speech recognition. Methods to detect the interference from speakers and sources, and distortion to audio from environment and devices, are discussed. Additional methods to detect distortion to the content after performing search and correlation are illustrated. The causes of actual distortion at each client are measured and registered and learnt to generate rules for determining likely distortion and interference sources. The learnt rules are applied at the client, and likely distortions that are detected are compensated or heavily distorted sections are ignored at audio level or signature and feature level based on compute resources available. Further methods to subtract the likely distortions in the query at both audio level and after processing at signature and feature level are described.Type: GrantFiled: August 12, 2021Date of Patent: April 18, 2023Assignee: ROKU, INC.Inventors: Jose Pio Pereira, Sunil Suresh Kulkarni, Mihailo M. Stojancic, Shashank Merchant, Peter Wendt
-
Patent number: 11605379Abstract: Disclosed is an artificial intelligence server. The artificial intelligence server includes a communicator in communication with at least one electronic device and a processor for receiving input data from a specific electronic device, applying personalized information corresponding to the specific electronic device to a recognition model, inputting the input data into the recognition model to which the personalized information is applied to obtain a final result value, and transmitting the final result value to the specific electronic device.Type: GrantFiled: July 11, 2019Date of Patent: March 14, 2023Assignee: LG ELECTRONICS INC.Inventor: Jongwoo Han
-
Patent number: 11587563Abstract: A method of presenting a signal to a speech processing engine is disclosed. According to an example of the method, an audio signal is received via a microphone. A portion of the audio signal is identified, and a probability is determined that the portion comprises speech directed by a user of the speech processing engine as input to the speech processing engine. In accordance with a determination that the probability exceeds a threshold, the portion of the audio signal is presented as input to the speech processing engine. In accordance with a determination that the probability does not exceed the threshold, the portion of the audio signal is not presented as input to the speech processing engine.Type: GrantFiled: February 28, 2020Date of Patent: February 21, 2023Assignee: Magic Leap, Inc.Inventors: Anthony Robert Sheeder, Colby Nelson Leider
-
Patent number: 11363367Abstract: A dual-microphone arrangement (300) provides improve voice performance in a wireless headset (12). A vibration sensor (1130) is used for voice pickup and will add low-frequency voice audio content in windy conditions. An equalizer (810) is used to restore low-frequency voice audio content in wind-free conditions. Depending on the measured wind power, the output will derive more signal from the equalizer (810) or more signal from the vibration sensor (1130).Type: GrantFiled: November 30, 2020Date of Patent: June 14, 2022Assignee: Dopple IP B.V.Inventors: Jacobus Cornelis Haartsen, Aalbert Stek
-
Publication number: 20150149167Abstract: Aspects of this disclosure are directed to accurately transforming speech data into one or more word strings that represent the speech data. A speech recognition device may receive the speech data from a user device and an indication of the user device. The speech recognition device may execute a speech recognition algorithm using one or more user and acoustic condition specific transforms that are specific to the user device and an acoustic condition of the speech data. The execution of the speech recognition algorithm may transform the speech data into one or more word strings that represent the speech data. The speech recognition device may estimate which one of the one or more word strings more accurately represents the received speech data.Type: ApplicationFiled: September 30, 2011Publication date: May 28, 2015Applicant: GOOGLE INC.Inventors: Françoise Beaufays, Johan Schalkwyk, Vincent Olivier Vanhoucke, Petar Stanisa Aleksic
-
Patent number: 8953812Abstract: Improvements in voice signals transmitted within communication systems are obtained by use of adaptive filters, front and rear microphones, noise cancelling systems and other means and methods. Disclosed embodiments include the use of directional microphones, primary inputs, secondary inputs, adaptive weight generators, canceller outputs to improve signal to noise ratios and other communication attributes.Type: GrantFiled: July 20, 2013Date of Patent: February 10, 2015Inventor: Alon Konchitsky
-
Publication number: 20140074464Abstract: Some embodiments of the inventive subject matter may include a method for detecting speech loss and supplying appropriate recollection data to the user. The method can include detecting a speech stream from a user. The method can include converting the speech stream to text. The method can include storing the text. The method can include detecting an interruption to the speech stream, wherein the interruption to the speech stream indicates speech loss by the user. The method can include searching a catalog using the text as a search parameter to find relevant catalog data. The method can include presenting the relevant catalog data to remind the user about the speech stream.Type: ApplicationFiled: September 12, 2012Publication date: March 13, 2014Applicant: International Business Machines CorporationInventor: Scott H. Berens
-
Publication number: 20140067387Abstract: Scalar operations for model adaptation or feature enhancement may be utilized for recognizing an utterance during automatic speech recognition in a noisy environment. An utterance including distorted speech generated from a transmission source for delivery to a receiver, may be received by a computer. The distorted speech may be caused by the noisy environment and channel distortion. Computations using scalar operations in the form of an algorithm may then be performed for recognizing the utterance. As a result of performing all of the computations with scalar operations, computational complexity is very small in comparison to matrix and vector operations. Vector Taylor Series with diagonal Jacobian approximation may also be utilized as a distortion-model-based noise robust algorithm with scalar operations.Type: ApplicationFiled: September 5, 2012Publication date: March 6, 2014Applicant: MICROSOFT CORPORATIONInventors: Jinyu Li, Michael Lewis Seltzer, Yifan Gong
-
Publication number: 20140012573Abstract: A signal processing apparatus includes a speech recognition system and a voice activity detection unit. The voice activity detection unit is coupled to the speech recognition system, and arranged for detecting whether an audio signal is a voice signal and accordingly generating a voice activity detection result to the speech recognition system to control whether the speech recognition system should perform speech recognition upon the audio signal.Type: ApplicationFiled: September 13, 2012Publication date: January 9, 2014Inventors: Chia-Yu Hung, Tsung-Li Yeh, Yi-Chang Tu
-
Publication number: 20130311176Abstract: A wireless headset capable of receiving audio signals transmitted wirelessly and compatible for use in an MRI scanner is disclosed. The headset includes a first wireless module connected to the first earphone and a second wireless module connected to the second earphone. Each wireless module is electrically connected to a speaker in the respective earphone. The first wireless module receives the audio signal from a remote source and coordinates transmission of the audio signal to each of the speakers. The compact nature of each earphone minimizes the length of wire runs. In addition, the headset is made of materials having low magnetic susceptibility such that they will not be affected by the magnetic field from the MRI scanner.Type: ApplicationFiled: June 8, 2012Publication date: November 21, 2013Inventors: Brian Brown, Manuel J. Ferrer Herrera, Richard J. Smaglick
-
Publication number: 20130304463Abstract: An embodiment of the invention provides a noise cancellation method for an electronic device. The method comprises: receiving an audio signal; applying a Fast Fourier Transform operation on the audio signal to generate a sound spectrum; acquiring a first spectrum corresponding to a noise and a second spectrum corresponding to a human voice signal from the sound spectrum; estimating a center frequency according to the first spectrum and the second spectrum; and applying a high pass filtering operation to the sound spectrum according to the center frequency.Type: ApplicationFiled: May 14, 2012Publication date: November 14, 2013Inventors: Lei Chen, Yu-Chieh Lai, Chun-Ren Hu, Hann-Shi Tong
-
Publication number: 20130297305Abstract: A non-spatial speech detection system includes a plurality of microphones whose output is supplied to a fixed beamformer. An adaptive beamformer is used for receiving the output of the plurality of microphones and one or more processors are used for processing an output from the fixed beamformer and identifying speech from noise though the use of an algorithm utilizing a covariance matrix.Type: ApplicationFiled: May 2, 2012Publication date: November 7, 2013Applicant: GENTEX CORPORATIONInventors: Robert R. Turnbull, Michael A. Bryson
-
Publication number: 20130297306Abstract: An adaptive equalization system that adjusts the spectral shape of a speech signal based on an intelligibility measurement of the speech signal may improve the intelligibility of the output speech signal. Such an adaptive equalization system may include a speech intelligibility measurement module, a spectral shape adjustment module, and an adaptive equalization module. The speech intelligibility measurement module is configured to calculate a speech intelligibility measurement of a speech signal. The spectral shape adjustment module is configured to generate a weighted long-term speech curve based on a first predetermined long-term average speech curve, a second predetermined long-term average speech curve, and the speech intelligibility measurement. The adaptive equalization module is configured to adapt equalization coefficients for the speech signal based on the weighted long-term speech curve.Type: ApplicationFiled: May 4, 2012Publication date: November 7, 2013Applicant: QNX Software Systems LimitedInventors: Phillip Alan Hetherington, Xueman Li
-
Publication number: 20130246062Abstract: Method and system for tracking fundamental frequencies of pseudo-periodic signals in the presence of noise that include receiving a time-frequency representation of signals measured in a predefined environment; estimating and tracking a fundamental frequency of a respective pseudo-periodic signal at each time frame of the time-frequency representation by tracking detections of harmonious frequencies in the time-frequency representation over time; and outputting each respective estimated fundamental frequency associated with the pseudo-periodic signal of each respective time frame.Type: ApplicationFiled: March 19, 2012Publication date: September 19, 2013Applicant: VOCALZOOM SYSTEMS LTD.Inventors: Yekutiel Avargel, Tal Bakish
-
Publication number: 20130226581Abstract: A communication method includes: capturing analog sound signals output by the audio output unit, and analyze the captured analog sound signals to obtain a corresponding digital audio information. Comparing the obtained digital audio information with a digital feature information stored in a storage unit to determine whether the obtained digital audio information includes the stored digital feature information. Playing a reply information stored in the storage unit if the obtained digital audio information includes the stored digital feature information.Type: ApplicationFiled: September 26, 2012Publication date: August 29, 2013Applicants: HON HAI PRECISION INDUSTRY CO., LTD., HONG FU JIN PRECISION INDUSTRY (Shenzhen) CO., LTD .Inventors: HONG FU JIN PRECISION INDUSTRY (Shenzhen, HON HAI PRECISION INDUSTRY CO., LTD.
-
Publication number: 20130211832Abstract: A method of speech recognition in a vehicle. Audio including noise and a speech signal representative of an utterance from a user is received via a microphone, and a signal-to-noise ratio (SNR) for the received audio is calculated using a processor. It is determined whether the calculated SNR is greater than a predetermined SNR. If so, then a noise distribution is identified for addition to the received audio, and noise corresponding to the identified noise distribution is injected into the received audio to produce noise-injected audio including the speech signal.Type: ApplicationFiled: February 9, 2012Publication date: August 15, 2013Applicant: GENERAL MOTORS LLCInventors: Gaurav Talwar, Robert D. Sims
-
Publication number: 20130191117Abstract: In speech processing systems, compensation is made for sudden changes in the background noise in the average signal-to-noise ratio (SNR) calculation. SNR outlier filtering may be used, alone or in conjunction with weighting the average SNR. Adaptive weights may be applied on the SNRs per band before computing the average SNR. The weighting function can be a function of noise level, noise type, and/or instantaneous SNR value. Another weighting mechanism applies a null filtering or outlier filtering which sets the weight in a particular band to be zero. This particular band may be characterized as the one that exhibits an SNR that is several times higher than the SNRs in other bands.Type: ApplicationFiled: November 6, 2012Publication date: July 25, 2013Applicant: Qualcomm IncorporatedInventor: Qualcomm Incorporated
-
Patent number: 8494174Abstract: A clear, high quality voice signal with a high signal-to-noise ratio is achieved by use of an adaptive noise reduction scheme with two microphones in close proximity. The method includes the use of two omini directional microphones in a highly directional mode, and then applying an adaptive noise cancellation algorithm to reduce the noise.Type: GrantFiled: June 14, 2010Date of Patent: July 23, 2013Inventor: Alon Konchitsky
-
Publication number: 20130185065Abstract: An audio signal may be received, in a processor associated with a vehicle. Sound related vehicle information representing one or more sounds may be received by the processor. The sound related vehicle information may or may not include an audio signal. A speech recognition process or system may be modified based on the sound related vehicle information.Type: ApplicationFiled: January 17, 2012Publication date: July 18, 2013Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLCInventors: Eli TZIRKEL-HANCOCK, Omer Tsimhoni
-
Publication number: 20130185066Abstract: Sound related vehicle information representing one or more sounds may be received in a processor associated with a vehicle. The sound related vehicle information may or may not include an audio signal. An audio signal output to a passenger may be modified based on the sound related vehicle information.Type: ApplicationFiled: January 17, 2012Publication date: July 18, 2013Applicant: GM GLOBAL TECHNOLOGY OPERATIONS LLCInventors: Eli TZIRKEL-HANCOCK, Omer Tsimhoni
-
Publication number: 20130179163Abstract: An In-Car Communication (ICC) system supports the communication paths within a car by receiving the speech signals of a speaking passenger and playing it back for one or more listening passengers. Signal processing tasks are split into a microphone related part and into a loudspeaker related part. A sound processing system suitable for use in a vehicle having multiple acoustic zones includes a plurality of microphone In-Car Communication (Mic-ICC) instances coupled and a plurality of loudspeaker In-Car Communication (Ls-ICC) instances. The system further includes a dynamic audio routing matrix with a controller and coupled to the Mic-ICC instances, a mixer coupled to the plurality of Mic-ICC instances and a distributor coupled to the Ls-ICC instances.Type: ApplicationFiled: January 10, 2012Publication date: July 11, 2013Inventors: Tobias Herbig, Markus Buck, Meik Pfeffinger
-
Publication number: 20130144618Abstract: A disclosed embodiment provides a speech recognition method to be performed by an electronic device. The method includes: collecting user-specific information that is specific to a user through the user's usage of the electronic device; recording an utterance made by the user; letting a remote server generate a remote speech recognition result for the recorded utterance; generating rescoring information for the recorded utterance based on the collected user-specific information; and letting the remote speech recognition result rescored based on the rescoring information.Type: ApplicationFiled: March 12, 2012Publication date: June 6, 2013Inventors: Liang-Che Sun, Yiou-Wen Cheng, Chao-Ling Hsu, Jyh-Horng Lin
-
Publication number: 20130138437Abstract: A speech recognition apparatus, includes a reliability estimating unit configured to estimate reliability of a time-frequency segment from an input voice signal; and a reliability reflecting unit configured to reflect the reliability of the time-frequency segment to a normalized cepstrum feature vector extracted from the input speech signal and a cepstrum average vector included for each state of an HMM in decoding. Further, the speech recognition apparatus includes a cepstrum transforming unit configured to transform the cepstrum feature vector and the average vector through a discrete cosine transformation matrix and calculate a transformed cepstrum vector. Furthermore, the speech recognition apparatus includes an output probability calculating unit configured to calculate an output probability value of time-frequency segments of the input speech signal by applying the transformed cepstrum vector to the cepstrum feature vector and the average vector.Type: ApplicationFiled: July 25, 2012Publication date: May 30, 2013Applicant: Electronics and Telecommunications Research InstituteInventors: Hoon-Young Cho, Youngik Kim, Sanghun Kim
-
Publication number: 20130132077Abstract: Systems and methods for semi-supervised source separation using non-negative techniques are described. In some embodiments, various techniques disclosed herein may enable the separation of signals present within a mixture, where one or more of the signals may be emitted by one or more different sources. In audio-related applications, for instance, a signal mixture may include speech (e.g., from a human speaker) and noise (e.g., background noise). In some cases, speech may be separated from noise using a speech model developed from training data. A noise model may be created, for example, during the separation process (e.g., “on-the-fly”) and in the absence of corresponding training data.Type: ApplicationFiled: May 27, 2011Publication date: May 23, 2013Inventors: Gautham J. Mysore, Paris Smaragdis
-
Publication number: 20130103397Abstract: Exemplary embodiments provide systems, devices and methods that allow creation and management of lists of items in an integrated manner on an interactive graphical user interface. A user may speak a plurality of list items in a natural unbroken manner to provide an audio input stream into an audio input device. Exemplary embodiments may automatically process the audio input stream to convert the stream into a text output, and may process the text output into one or more n-grams that may be used as list items to populate a list on a user interface.Type: ApplicationFiled: October 21, 2011Publication date: April 25, 2013Applicant: WAL-MART STORES, INC.Inventors: Dion Almaer, Bernard Paul Cousineau, Ben Galbraith
-
Publication number: 20130096915Abstract: A speech processing method and arrangement are described. A dynamic noise adaptation (DNA) model characterizes a speech input reflecting effects of background noise. A null noise DNA model characterizes the speech input based on reflecting a null noise mismatch condition. A DNA interaction model performs Bayesian model selection and re-weighting of the DNA model and the null noise DNA model to realize a modified DNA model characterizing the speech input for automatic speech recognition and compensating for noise to a varying degree depending on relative probabilities of the DNA model and the null noise DNA model.Type: ApplicationFiled: October 17, 2011Publication date: April 18, 2013Applicant: NUANCE COMMUNICATIONS, INC.Inventors: Steven J. Rennie, Pierre Dognin, Petr Fousek
-
Publication number: 20130085753Abstract: A computing device is able to use an embedded speech recognizer and a network speech recognizer for speech recognition. In response to detecting speech in the captured audio, the computing device may forward the captured audio to its embedded speech recognizer and to a speech client for the network speech recognizer. The embedded speech recognizer provides an embedded-recognizer result for the captured audio. If a network-recognition criterion is met, the speech client forwards the captured audio to the network speech recognizer and receives a network-recognizer result for the captured audio from the network speech recognizer. A speech recognition result for the captured audio is forwarded to at least one application, wherein the speech recognition result is based on at least one of the embedded-recognizer result and the network-recognizer result.Type: ApplicationFiled: August 15, 2012Publication date: April 4, 2013Applicant: GOOGLE INC.Inventors: Bjorn Erik Bringert, Johan Schalkwyk, Michael J. LeBeau, Richard Zarek Cohen, Luca Zanolin, Simon Tickner
-
Publication number: 20130060567Abstract: VoIP phones according to the present invention include a microphone, which may be internal or external, and allow the user to communicate unobtrusively, check voice mail and conduct other activities in an environment which can be noisy in general and extremely noisy sometimes. Speech recognition functionally may also be used to generate and send touch tone or DTMF tones such as in response to call trees or voice recognition functionality used by airlines, credit card companies, voice mail systems, and other applications. A system and method of audio processing which provides enhanced speech recognition is provided. Audio input is received at the microphone which is processed by adaptive noise cancellation to generate an enhanced audio signal. The operation of the speech recognition engine and the adaptive noise canceller may be advantageously controlled based on Voice Activity Detection (VAD).Type: ApplicationFiled: October 31, 2012Publication date: March 7, 2013Inventor: Alon Konchitsky
-
Publication number: 20130054236Abstract: A method for the detection of noise and speech segments in a digital audio input signal, the input signal being divided into a plurality of frames including a first stage in which a first classification of a frame as noise is performed if the mean energy value for this frame and the previous N frames is not greater than a first energy threshold, N>1, a second stage in which for each frame that has not been classified as noise in the first stage it is decided if the frame is classified as noise or as speech based on combining at least a first criterion of spectral similarity of the frame with acoustic noise and speech models, a second criterion of analysis of the energy of the frame and a third criterion of duration, and of using a state machine for detecting the beginning of a segment as an accumulation of a determined number of consecutive frames with acoustic similarity greater than a first threshold and for detecting the end of the segment; a third stage in which the classification as speech or as noiseType: ApplicationFiled: October 7, 2010Publication date: February 28, 2013Applicant: TELEFONICA, S.A.Inventors: Carlos Garcia Martinez, Helenca Duxans Barrobés, Mauricio Sendra Vicens, David Cadenas Sanchez
-
Publication number: 20130046536Abstract: Methods and apparatuses for performing song detection on an audio signal are described. Clips of the audio signal are classified into classes comprising music. Class boundaries of music clips are detected as candidate boundaries of a first type. Combinations including non-overlapped sections are derived. Each section meets the following conditions: 1) including at least one music segment longer than a predetermined minimum song duration, 2) shorter than a predetermined maximum song duration, 3) both starting and ending with a music clip, and 4) a proportion of the music clips in each of the sections is greater than a predetermined minimum proportion. In this way, various possible song partitions in the audio signal can be obtained for investigation.Type: ApplicationFiled: July 26, 2012Publication date: February 21, 2013Applicant: DOLBY LABORATORIES LICENSING CORPORATIONInventors: Lie Lu, Claus Bauer
-
Publication number: 20130035935Abstract: The present invention allows a man to recognize a location of a sound source in a three-dimensional space using two ears and applies a method of separating a sound source in a certain orientation to improve the performance of an application technology using a speech in a noisy environment. The present invention acquires a speech signal using two sensors and determines an orientation angle of a sound source in a zero-crossing point step with respect to a frequency separated signal with a band pass filter bank. An object of the present invention is to obtain excellent sound source orientation detection and division performance which is difficult to be obtained in an existing crossing correlation method calculated in units of time frames in a noisy environment with a plurality of sound sources.Type: ApplicationFiled: May 1, 2012Publication date: February 7, 2013Applicant: Electronics and Telecommunications Research InstituteInventors: Young Ik KIM, Hoon Young Cho, Sang Hun Kim
-
Patent number: 8359020Abstract: In one implementation, a computer-implemented method includes detecting a current context associated with a mobile computing device and determining, based on the current context, whether to switch the mobile computing device from a current mode of operation to a second mode of operation during which the mobile computing device monitors ambient sounds for voice input that indicates a request to perform an operation. The method can further include, in response to determining whether to switch to the second mode of operation, activating one or more microphones and a speech analysis subsystem associated with the mobile computing device so that the mobile computing device receives a stream of audio data. The method can also include providing output on the mobile computing device that is responsive to voice input that is detected in the stream of audio data and that indicates a request to perform an operation.Type: GrantFiled: August 6, 2010Date of Patent: January 22, 2013Assignee: Google Inc.Inventors: Michael J. Lebeau, John Nicholas Jitkoff, Dave Burke
-
Publication number: 20130006624Abstract: An apparatus and a method that achieve physical separation of sound sources by pointing directly a beam of coherent electromagnetic waves (i.e. laser). Analyzing the physical properties of a beam reflected from the vibrations generating sound source enable the reconstruction of the sound signal generated by the sound source, eliminating the noise component added to the original sound signal. In addition, the use of multiple electromagnetic waves beams or a beam that rapidly skips from one sound source to another allows the physical separation of these sound sources. Aiming each beam to a different sound source ensures the independence of the sound signals sources and therefore provides full sources separation.Type: ApplicationFiled: September 12, 2012Publication date: January 3, 2013Applicant: AUDIOZOOM LTDInventor: Tal Bakish
-
Publication number: 20120330657Abstract: A speech feature extraction apparatus, speech feature extraction method, and speech feature extraction program. A speech feature extraction apparatus includes: first difference calculation module to: (i) receive, as an input, a spectrum of a speech signal segmented into frames for each frequency bin; and (ii) calculate a delta spectrum for each of the frame, where the delta spectrum is a difference of the spectrum within continuous frames for the frequency bin; and first normalization module to normalize the delta spectrum of the frame for the frequency bin by dividing the delta spectrum by a function of an average spectrum; where the average spectrum is an average of spectra through all frames that are overall speech for the frequency bin; and where an output of the first normalization module is defined as a first delta feature.Type: ApplicationFiled: September 6, 2012Publication date: December 27, 2012Applicant: International Business Machines CorporationInventors: Takashi Fukuda, Osamu Ichikawa, Masafumi Nishimura
-
Publication number: 20120330655Abstract: A voice recognition device includes a voice recognition dictionary in which a word which is recognized as a result of voice recognition on an inputted voice is registered, a reply voice data storage unit for storing recorded voice data about words registered in the voice recognition dictionary, a dialog control unit for, when a word registered in the voice recognition dictionary is recognized, acquiring recorded voice data corresponding to the word from the reply voice data storage unit, a reproduction noise reduction unit for carrying out a process of reducing noise included in the recorded voice data, an amplitude adjusting unit for adjusting an amplitude of the recorded voice data in which the noise has been reduced to a predetermined amplitude level, and a voice reproduction unit for reproducing a voice from the amplitude-adjusted recorded voice data.Type: ApplicationFiled: June 28, 2010Publication date: December 27, 2012Inventors: Masanobu Osawa, Kazuyuki Nogi
-
Publication number: 20120330651Abstract: A voice data transferring device intermediates between an in-vehicle terminal and a voice recognition server. In order to check a change in voice recognition performance of the voice recognition server, the voice data transferring device performs a noise suppression processing on a voice data for evaluation in a noise suppression module; transmits the voice data for evaluation to the voice recognition server; and receives a recognition result thereof. The voice data transferring device sets a value of a noise suppression parameter used for a noise suppression processing or a value of a result integration parameter used for a processing of integrating a plurality of recognition results acquired from the voice recognition server, at an optimum value, based on the recognition result of the voice recognition server. This makes it possible to set a suitable parameter even if the voice recognition performance of the voice recognition server changes.Type: ApplicationFiled: June 22, 2012Publication date: December 27, 2012Inventors: Yasunari Obuchi, Takeshi Homma
-
Publication number: 20120330656Abstract: Discrimination between two classes comprises receiving a set of frames including an input signal and determining at least two different feature vectors for each of the frames. Discrimination between two classes further comprises classifying the two different feature vectors using sets of preclassifiers trained for at least two classes of events and from that classification, and determining values for at least one weighting factor. Discrimination between two classes still further comprises calculating a combined feature vector for each of the received frames by applying the weighting factor to the feature vectors and classifying the combined feature vector for each of the frames by using a set of classifiers trained for at least two classes of events.Type: ApplicationFiled: September 4, 2012Publication date: December 27, 2012Applicant: INTERNATIONAL BUSINESS MACHINES CORPORATIONInventor: Zica Valsan
-
Publication number: 20120316872Abstract: Embodiments of the present invention provide an adaptive noise canceling system. The adaptive noise canceling system may be used in a handset to cancel background noise by generating an anti-noise signal. The adaptive noise canceling system may include first input to receive a first signal from a feedforward microphone; a second input to receive a second signal from an error microphone; a controller coupled to the inputs, the controller configured to adaptively generate an anti-noise signal according to the received signals, wherein the controller derives a profile of the anti-noise signal from the first signal and derives a magnitude of the anti-noise signal from both first and second signal; and an output to transmit the anti-noise signal to a speaker.Type: ApplicationFiled: June 7, 2011Publication date: December 13, 2012Applicant: ANALOG DEVICES, INC.Inventors: Thomas Stoltz, Kim Spetzler Berthelsen, Robert Adams
-
Publication number: 20120310641Abstract: In accordance with an example embodiment of the invention, there is provided an apparatus for detecting voice activity in an audio signal. The apparatus comprises a first voice activity detector for making a first voice activity detection decision based at least in part on the voice activity of a first audio signal received from a first microphone. The apparatus also comprises a second voice activity detector for making a second voice activity detection decision based at least in part on an estimate of a direction of the first audio signal and an estimate of a direction of a second audio signal received from a second microphone. The apparatus further comprises a classifier for making a third voice activity detection decision based at least in part on the first and second voice activity detection decisions.Type: ApplicationFiled: August 13, 2012Publication date: December 6, 2012Inventors: Riitta Elina Niemistö, Päivi Marianna Valve
-
Publication number: 20120310640Abstract: A personal audio device, such as a wireless telephone, includes noise canceling circuit that adaptively generates an anti-noise signal from a reference microphone signal and injects the anti-noise signal into the speaker or other transducer output to cause cancellation of ambient audio sounds. An error microphone may also be provided proximate the speaker to estimate an electro-acoustical path from the noise canceling circuit through the transducer. A processing circuit uses the reference and/or error microphone, optionally along with a microphone provided for capturing near-end speech, to determine whether one of the reference or error microphones is obstructed by comparing their received signal content and takes action to avoid generation of erroneous anti-noise.Type: ApplicationFiled: September 30, 2011Publication date: December 6, 2012Inventors: Nitin Kwatra, Jeffrey Alderson, Jon D. Hendrix
-
Patent number: 8326328Abstract: In one implementation, a computer-implemented method includes detecting a current context associated with a mobile computing device and determining, based on the current context, whether to switch the mobile computing device from a current mode of operation to a second mode of operation during which the mobile computing device monitors ambient sounds for voice input that indicates a request to perform an operation. The method can further include, in response to determining whether to switch to the second mode of operation, activating one or more microphones and a speech analysis subsystem associated with the mobile computing device so that the mobile computing device receives a stream of audio data. The method can also include providing output on the mobile computing device that is responsive to voice input that is detected in the stream of audio data and that indicates a request to perform an operation.Type: GrantFiled: September 29, 2011Date of Patent: December 4, 2012Assignee: Google Inc.Inventors: Michael J. LeBeau, John Nicholas Jitkoff, Dave Burke
-
Publication number: 20120303367Abstract: An enhancement system improves the estimate of noise from a received signal. The system includes a spectrum monitor that divides a portion of the signal at more than one frequency resolution. Adaptation logic derives a noise adaptation factor of the received signal. A plurality of devices tracks the characteristics of an estimated noise in the received signal and modifies multiple noise adaptation rates. Weighting logic applies the modified noise adaptation rates derived from the signal divided at a first frequency resolution to the signal divided at a second frequency resolution.Type: ApplicationFiled: August 13, 2012Publication date: November 29, 2012Applicant: QNX Software Systems LimitedInventor: Phillip A. Hetherington
-
Publication number: 20120303366Abstract: A system detects a speech segment that may include unvoiced, fully voiced, or mixed voice content. The system includes a window function that passes signals within a programmed aural frequency range while substantially blocking signals above and below the programmed aural frequency range. A frequency converter converts the signals passing within the programmed aural frequency range into a plurality of frequency bins. A background voice detector estimates the strength of a background speech segment relative to the noise of selected portions of the aural spectrum. A noise estimator estimates a maximum distribution of noise to an average of an acoustic noise power of some of the plurality of frequency bins. A voice detector compares the strength of a desired speech segment to a maximum of an output of the background voice detector and an output of the noise estimator.Type: ApplicationFiled: August 3, 2012Publication date: November 29, 2012Inventors: Phillip Alan Hetherington, Mark Ryan Fallat