WO2020206579A1 - 一种基于脸部振动的智能设备输入方法 - Google Patents

一种基于脸部振动的智能设备输入方法 Download PDF

Info

Publication number
WO2020206579A1
WO2020206579A1 PCT/CN2019/081676 CN2019081676W WO2020206579A1 WO 2020206579 A1 WO2020206579 A1 WO 2020206579A1 CN 2019081676 W CN2019081676 W CN 2019081676W WO 2020206579 A1 WO2020206579 A1 WO 2020206579A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
vibration signal
facial
threshold
hidden markov
Prior art date
Application number
PCT/CN2019/081676
Other languages
English (en)
French (fr)
Inventor
伍楷舜
关茂柠
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to US17/051,179 priority Critical patent/US11662610B2/en
Priority to PCT/CN2019/081676 priority patent/WO2020206579A1/zh
Publication of WO2020206579A1 publication Critical patent/WO2020206579A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G02OPTICS
    • G02CSPECTACLES; SUNGLASSES OR GOGGLES INSOFAR AS THEY HAVE THE SAME FEATURES AS SPECTACLES; CONTACT LENSES
    • G02C11/00Non-optical adjuncts; Attachment thereof
    • G02C11/10Electronic devices other than hearing aids
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2413Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on distances to training or reference patterns
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/011Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/24Speech recognition using non-acoustical features
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01HMEASUREMENT OF MECHANICAL VIBRATIONS OR ULTRASONIC, SONIC OR INFRASONIC WAVES
    • G01H1/00Measuring characteristics of vibrations in solids by using direct conduction to the detector
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01HMEASUREMENT OF MECHANICAL VIBRATIONS OR ULTRASONIC, SONIC OR INFRASONIC WAVES
    • G01H11/00Measuring mechanical vibrations or ultrasonic, sonic or infrasonic waves by detecting changes in electric or magnetic properties
    • G01H11/06Measuring mechanical vibrations or ultrasonic, sonic or infrasonic waves by detecting changes in electric or magnetic properties by electric means
    • G01H11/08Measuring mechanical vibrations or ultrasonic, sonic or infrasonic waves by detecting changes in electric or magnetic properties by electric means using piezoelectric devices
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2218/00Aspects of pattern recognition specially adapted for signal processing
    • G06F2218/12Classification; Matching
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/14Speech classification or search using statistical models, e.g. Hidden Markov Models [HMMs]
    • G10L15/142Hidden Markov Models [HMMs]
    • G10L15/144Training of HMMs
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum

Definitions

  • the invention relates to the field of text input, and in particular to a method for inputting a smart device based on facial vibration.
  • the traditional smart device input method is typing input or voice recognition input through the keyboard, but with the development of wearable devices, the limitations of this method gradually appear.
  • the smart watch input method is to use the virtual keyboard on the touch screen for typing input, but because the smart watch screen is too small, it is difficult for the user to type input. For example, when the user wears gloves, he cannot perform typing input. .
  • the purpose of the present invention is to overcome the above-mentioned defects of the prior art and provide a method for inputting a smart device based on facial vibration.
  • a smart device input method based on facial vibration which includes the following steps:
  • Step S1 Collect facial vibration signals generated when the user performs voice input
  • Step S2 Extract the Mel frequency cepstrum coefficient from the facial vibration signal
  • Step S3 Taking the Mel frequency cepstrum coefficient as an observation sequence, and using the trained hidden Markov model to obtain the text input corresponding to the facial vibration signal.
  • step S1 the facial vibration signal is collected by a vibration sensor provided on the glasses.
  • step S2 the following processing is performed on a vibration signal: amplify the collected facial vibration signal; send the amplified facial vibration signal to the smart device via a wireless module; The smart device intercepts a section of the received facial vibration signal as an effective part and extracts the Mel frequency cepstrum coefficient from the effective part.
  • intercepting the effective part from the facial vibration signal includes:
  • intercepting the effective part from the facial vibration signal further includes: for a vibration signal, setting a maximum interval threshold maxInter and a minimum length threshold minLen between signal peaks; if the interval between two signal peaks of the vibration signal is less than The maximum interval threshold maxInter, the two signal peaks are regarded as a signal peak of the vibration signal; if the length of a signal peak of the vibration signal is less than the minimum length threshold minLen, the signal peak is discarded.
  • training the hidden Markov model includes:
  • each observation sequence in the training sample set is composed of a Mel frequency cepstrum coefficient of a facial vibration signal
  • the hidden Markov model that is most likely to generate the pronunciation represented by the observation sequence is evaluated as the trained hidden Markov model.
  • step S3 further includes: using the Viterbi algorithm to calculate the output probability of the test sample for the plurality of hidden Markov models; based on the output probability, displaying the button type and the selectable button type corresponding to the test sample .
  • step S3 further includes: judging whether the classification result is correct according to the button selected by the user; adding test samples with correct classification results to the training sample set, and the corresponding classification label is the classification result; The wrong test sample is added to the training sample set, and the corresponding classification label is the category determined according to the user's selection.
  • the present invention has the advantage of using the facial vibration signal generated when the person speaks to input the text of the smart device, which solves the problem of typing difficulty in the smart device due to the small screen or the occupation of the user's hands. ;
  • text input based on facial vibration signals avoids the impact of ambient noise, and also avoids the impact of replay attacks and imitation attacks; in addition, the present invention also proposes a real-time correction and adaptive mechanism for Correcting incorrect recognition results and updating the training sample set improves the recognition accuracy and robustness of the input text.
  • Fig. 1 shows a flowchart of a method for inputting a smart device based on facial vibration according to an embodiment of the present invention
  • Fig. 2 shows a schematic diagram of a smart watch input method based on facial vibration according to an embodiment of the present invention
  • FIG. 3 shows a signal sensing device of a smart watch input method based on facial vibration according to an embodiment of the present invention
  • Figure 4 shows a schematic circuit diagram of a signal amplifier according to an embodiment of the present invention
  • Fig. 5 shows a schematic diagram of a segment of vibration signal according to an embodiment of the present invention.
  • a method for inputting a smart device based on facial vibration includes collecting facial vibration signals generated when a user speaks; extracting information from the vibration signals that can reflect signal characteristics Mel frequency cepstrum (MFCC) coefficients; take Mel frequency cepstrum coefficients as the observation sequence and use the pre-generated hidden Markov model (HMM) to obtain the text input that the user expects.
  • MFCC Mel frequency cepstrum
  • HMM hidden Markov model
  • the known Mel frequency cepstrum coefficient and the corresponding button type are used as the training sample set and obtained through training.
  • the input method of the embodiment of the present invention can be applied to wearable devices or other types of smart devices. In the following, a smart watch will be used as an example.
  • the method for inputting a smart device based on facial vibration in an embodiment of the present invention includes the following steps:
  • Step S110 Collect facial vibration signals generated when the user speaks.
  • the facial vibration signal generated when the user speaks is collected.
  • Figure 2 shows the principle of the input method of the smart watch.
  • a vibration signal is generated.
  • the vibration signal is transmitted to the smart watch through wireless transmission.
  • the smart watch further processes the vibration signal, extracts the characteristics of the vibration signal from it, and then recognizes different vibration signals Corresponding button category.
  • a signal sensing module installed on the glasses is used to collect facial vibration signals generated when a person speaks.
  • the signal sensing module 310 may be a piezoelectric film vibration sensor, a piezoelectric ceramic vibration sensor, or other vibration sensors capable of detecting signals. For example, if a piezoelectric ceramic vibration sensor is installed on glasses, the glasses will vibrate when a person speaks. At this time, the vibration sensor can collect facial vibration signals generated when a person speaks.
  • the signal processing module 320 provided on the glasses can be used to receive the facial vibration signal, amplify the facial vibration signal and then connect it to an analog-to-digital (AD) converter, thereby converting the facial vibration signal into a digital signal .
  • AD analog-to-digital
  • the signal sensing module 310 and the signal processing module 320 may be arranged outside the glasses or embedded inside the glasses.
  • the vibration sensors, amplifiers, analog-to-digital converters, etc. described herein may use commercially available or customized devices, as long as their functions can achieve the purpose of the present invention.
  • FIG. 4 shows a schematic circuit diagram of an amplifier according to an embodiment of the present invention.
  • the amplifier is implemented by a commercially available LMV358, which is a two-stage amplifier with a maximum amplification factor of 225 and an amplification factor of 15 for each stage.
  • LMV358 is a two-stage amplifier with a maximum amplification factor of 225 and an amplification factor of 15 for each stage.
  • each stage of amplifying circuit has a band-pass filter, the frequency range is 15.9Hz to 12.9kHz.
  • the vibration signal is amplified by the amplifier, it is connected to an AD analog-to-digital converter (such as MCP3008); the next stage of the AD analog-to-digital converter is connected to the Raspberry Pi to control the collection and transmission of facial vibration signals.
  • AD analog-to-digital converter such as MCP3008
  • Step S120 Send the facial vibration signal to the smart device.
  • the facial vibration signal after amplification and analog-to-digital conversion is sent to the smart watch via the wireless module.
  • the wireless module includes a Bluetooth transmission module, a WiFi transmission module or other wireless transmission that can send the signal to the smart watch. Module.
  • the Raspberry Pi is set to control the Bluetooth module, and the digital signal processed in step S110 is sent to the smart watch.
  • Step S130 the smart device detects the valid part of the signal.
  • the smart device intercepts a segment from the received facial vibration signal as an effective part, and by intercepting the effective part, the subsequent processing speed is further improved while retaining the signal characteristics.
  • the energy-based dual-threshold endpoint detection method to detect the effective part of the signal specifically includes:
  • Step S131 After the smart watch receives the facial vibration signal sent by the Bluetooth module, it is filtered by a Butterworth band pass filter.
  • the cut-off frequency of the band-pass filter may be 10 Hz and 1000 Hz, respectively.
  • Step S132 framing the signal, where the frame length is 7ms, the frame shift is 3.2ms, and the window function is Hamming window, and the short-term energy of the facial vibration signal is calculated.
  • the short-term energy calculation formula is expressed as:
  • E is the short-term energy of the frame signal
  • L is the length of the frame signal
  • S(i) is the amplitude of the vibration signal
  • t represents the time index of the frame signal.
  • Step S133 Set a high threshold and a low threshold when the effective part is intercepted based on the short-term energy of the facial vibration signal.
  • the energy standard deviation of the vibration signal can be further calculated, denoted as ⁇ , and the average energy of the background noise, denoted as u.
  • Step S134 Set the maximum interval threshold and the minimum length threshold between signal peaks.
  • maxInter is generally 50 (frames)
  • minLen is generally 30( frame).
  • Step S135 Find a frame signal with the largest energy in the signal and the energy of the frame signal needs to be higher than the set high threshold.
  • Step S136 Extend the frame signal to the left and right respectively until the energy of the next frame signal is lower than the set low threshold, record the frame position at this time, and use the obtained left frame position as the starting point of the signal peak , The frame position on the right is the end of the signal peak.
  • the frame energy at the position of the signal peak needs to be set to zero, so as to process other signal peaks in subsequent iterations.
  • step S137 steps S135 and S136 are repeated until all signal peaks in the entire signal are found.
  • step S138 if the interval between the two signal peaks is less than maxInter, the two signal peaks are combined, that is, the two signal peaks are regarded as one signal peak.
  • the interval between all signal peaks is greater than maxInter.
  • step S139 if the length of the signal peak is less than minLen, the signal peak is directly discarded.
  • the number of signal peaks finally obtained should be 1, and the signal peak is the effective part of the intercepted vibration signal. If the number of signal peaks obtained is greater than 1, then the vibration signal Treat it as an invalid signal and discard it directly.
  • FIG. 6 illustrates a segment of the vibration signal after the above processing.
  • the abscissa indicates the sample value index, and the ordinate indicates the normalized amplitude. It can be seen that this segment of vibration signal includes 10 vibration signals, and each vibration signal corresponds to a signal peak.
  • For the eighth vibration signal it actually contains two small peaks, but since the interval between the two small peaks is less than maxInter, the These two small peaks are treated as one peak, which corresponds to a vibration signal.
  • Step S140 extract the Mel frequency cepstrum coefficient of the signal.
  • the Mel frequency cepstrum coefficient is extracted from the cut effective part as the signal feature.
  • extracting the Mel frequency cepstrum coefficients includes:
  • the pre-emphasis coefficient can be set to 0.96, the frame length is 20ms, the frame shift is 6ms, and the window function is Hamming window;
  • the mel filter frequency range is 10 Hz to 1000 Hz, and the number of filter channels is 28;
  • the extracted Mel frequency cepstral coefficients are not limited to 14, and an appropriate number of Mel frequency cepstral coefficients can be extracted according to the accuracy and execution speed requirements of the training model.
  • this article does not specifically introduce existing technologies such as pre-emphasis, framing, windowing, and Fourier transform.
  • Step S150 using the Mel frequency cepstrum coefficient as the observation sequence to train the hidden Markov model.
  • HMM hidden Markov model
  • MFCC Mel frequency cepstral coefficient
  • the process of training the HMM model using the Baum-Welch algorithm includes: initializing the parameters of the HMM; calculating the front and back probability matrices; calculating the transition probability matrix; calculating the mean and variance of each Gaussian probability density function; calculating each The weight of the Gaussian probability density function; calculate the output probabilities of all observation sequences, and accumulate to get the total output probability.
  • the training process includes:
  • each observation sequence of the number "0" ie, MFCC parameters
  • N the number of states
  • the MFCC parameters belonging to a segment in all observation sequences are formed into a large matrix
  • the k-means algorithm is used for aggregation Class, calculate the mean, variance and weight coefficient of each Gaussian element
  • the embodiment of the present invention is deployed on a smart watch, considering the limited computing resources, the training process can only be iterated once.
  • the embodiment of the present invention generates a corresponding HMM for each key type, and each observation sequence is composed of a Mel frequency cepstrum coefficient of a facial vibration signal, and finally the HMM that is most likely to generate the pronunciation represented by the observation sequence is evaluated.
  • Step S160 Classify and identify the test data.
  • step S150 the hidden Markov model generated in step S150 is used to classify and identify the test sample.
  • the classification and recognition includes: using the Viterbi algorithm to calculate the output probability of the test sample for each hidden Markov model, and give the best state path;
  • the category corresponding to the hidden Markov model with the largest output probability is the classification result of the test sample.
  • Step S170 Correct the classification result.
  • real-time correction and an adaptive mechanism may be used to correct the classification results to optimize the training sample set used in step S150.
  • step S160 in addition to outputting the final classification result, the two most likely candidate keys and the "Delete" key are also given according to the output probability of each hidden Markov model.
  • the classification result is correct, the user does not need to perform any operation; when the classification result is wrong, if the correct classification result appears in the candidate button, the user can click the candidate button to make corrections, if the correct classification result does not appear in the candidate button , The user needs to use the built-in virtual keyboard of the smart watch to input the correct number for correction; if the input itself is wrong due to incorrect pronunciation or wearing glasses, the user can click the "Delete" button to delete The input number.
  • correcting the classification result includes:
  • Step S171 If the user does not click any button or use the built-in virtual keyboard to input, it means that the classification result of this input is correct, and the facial vibration signal corresponding to this input is added to the training sample set once;
  • step S172 if the user clicks the candidate button, it means that the classification result of this input is wrong, and the correct classification result of this input appears in the candidate button, then the facial vibration signal corresponding to this input will be Join the training sample set n i times.
  • n i represents the number of consecutive errors of button i, 1 ⁇ n i ⁇ 3. For example, if the classification result of button 2 is wrong twice in a row, n i is equal to 2. If the number of consecutive errors of key i exceeds 3 times, n i is still set to 3. Once the classification result of key i is correct, n i is reset to 1.
  • Step S173 If the user uses the built-in virtual keyboard of the smart watch to input numbers, it means that the classification result of this input is wrong, and the correct classification result of this input does not appear in the candidate keys, then the input corresponding to this input The facial vibration signal will be added to the training sample set 3 times.
  • step S174 if the user clicks the "Delete” button, it means that the user made an error during the input, and the facial vibration signal corresponding to the input will be directly discarded.
  • Step S175 It is judged whether the hidden Markov model needs to be retrained.
  • the hidden Markov model when N is greater than or equal to 10, the hidden Markov model will be retrained. Once the number of training samples corresponding to a certain key is greater than 35, the earliest training sample of the key added to the training sample set will be discarded, so as to ensure that the maximum number of training samples of the key is 35.
  • the present invention may be a system, a method and/or a computer program product.
  • the computer program product may include a computer-readable storage medium loaded with computer-readable program instructions for enabling a processor to implement various aspects of the present invention.
  • the computer-readable storage medium may be a tangible device that holds and stores instructions used by the instruction execution device.
  • the computer-readable storage medium may include, but is not limited to, an electrical storage device, a magnetic storage device, an optical storage device, an electromagnetic storage device, a semiconductor storage device, or any suitable combination of the foregoing, for example.
  • Computer-readable storage media include: portable computer disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM) Or flash memory), static random access memory (SRAM), portable compact disk read-only memory (CD-ROM), digital versatile disk (DVD), memory stick, floppy disk, mechanical encoding device, such as a printer with instructions stored thereon
  • RAM random access memory
  • ROM read-only memory
  • EPROM erasable programmable read-only memory
  • flash memory flash memory
  • SRAM static random access memory
  • CD-ROM compact disk read-only memory
  • DVD digital versatile disk
  • memory stick floppy disk
  • mechanical encoding device such as a printer with instructions stored thereon

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Optics & Photonics (AREA)
  • Ophthalmology & Optometry (AREA)
  • Otolaryngology (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

一种基于脸部振动的智能设备输入方法,该方法包括采集用户进行语音输入时所产生的脸部振动信号;从所述脸部振动信号中提取梅尔频率倒谱系数;将所述梅尔频率倒谱系数作为观测序列,利用经训练的隐马尔可夫模型获得脸部振动信号对应的文本输入。所述输入方法解决了智能设备由于屏幕太小或由于用户双手占用而导致的打字难问题,并且避免了受重放攻击和模仿攻击的影响。

Description

一种基于脸部振动的智能设备输入方法 技术领域
本发明涉及文本输入领域,尤其涉及一种基于脸部振动的智能设备输入方法。
背景技术
传统的智能设备输入方法是通过键盘进行打字输入或语音识别输入,但随着可穿戴设备的发展,这种方法的局限性逐渐显现。例如,智能手表输入方法是利用触摸屏上的虚拟键盘来进行打字输入,但是由于智能手表的屏幕太小,用户很难进行打字输入,又如,当用户带着手套的时候,也不能进行打字输入。
目前,存在利用手指跟踪进行手写输入的方式,这样用户只需要用手指在空气中画出想要输入的数字或字母即可进行手写输入,但是这种输入方法太慢,而且当用户手上拿着东西的时候,这种手写输入的方式并不适用。还存在的一种方式是,将带着手表的那只手的指关节映射成一个九宫格虚拟键盘,同时使用大拇指来进行敲击打字输入,然而,当用户带着手表的那只手也拿着东西的时候,这种输入方式也不适用。而传统的语音识别技术容易受环境噪声的影响,同时也容易受到重放攻击和模仿攻击。
因此,需要对现有技术进行改进,以提供更精确、有效的文本输入方法。
发明内容
本发明的目的在于克服上述现有技术的缺陷,提供一种基于脸部振动的智能设备输入方法。
根据本发明的第一方面,提供了一种基于脸部振动的智能设备输入方法,包括以下步骤:
步骤S1:采集用户进行语音输入时所产生的脸部振动信号;
步骤S2:从所述脸部振动信号中提取梅尔频率倒谱系数;
步骤S3:将所述梅尔频率倒谱系数作为观测序列,利用经训练的隐马尔可夫模型获得脸部振动信号对应的文本输入。
在一个实施例中,在步骤S1中,通过设置于眼镜上的振动传感器采集所述脸部振动信号。
在一个实施例中,在步骤S2中,对于一个振动信号进行以下处理:将采集到的所述脸部振动信号进行放大;将放大后的脸部振动信号经由无线模块发送至所述智能设备;所述智能设备从接收到的脸部振动信号中截取一段作为有效部分并从所述有效部分提取梅尔频率倒谱系数。
在一个实施例中,从脸部振动信号截取有效部分包括:
基于所述脸部振动信号的短时能量标准差σ设置第一切断门限和第二切断门限,其中,第一切断门限是TL=u+σ,第二切断门限是TH=u+3σ,u是背景噪声的平均能量;
从所述脸部振动信号中找出短时能量最大的一帧信号且该帧信号的能量高于所述第二切断门限;
从该帧信号的前序帧和后序帧,分别找出能量低于所述第一切断门限并且在时序上与该帧信号最近的帧,将获得的前序帧位置作为起点,将获得的后续帧位置作为终点,截取起点和终点之间的部分作为所述脸部振动信号的有效部分。
在一个实施例中,从脸部振动信号截取有效部分还包括:对于一个振动信号,设置信号峰之间的最大间隔门限maxInter和最小长度门限minLen;若该振动信号的两个信号峰之间的间隔小于所述最大间隔门限maxInter,则将该两个信号峰作为该振动信号的一个信号峰;若该振动信号的一个信号峰的长度小于所述最小长度门限minLen,则舍弃该信号峰。
在一个实施例中,训练隐马尔可夫模型包括:
对所述智能设备的每个输入按键类型生成一个对应的隐马尔可夫模型,获得多个隐马尔可夫模型;
为每个隐马尔可夫模型构建相应的训练样本集,其中所述训练样本集中的每个观测序列由一个脸部振动信号的梅尔频率倒谱系数构成;
评估出最有可能产生观测序列所代表的读音的隐马尔可夫模型作为所述经训练的隐马尔可夫模型。
在一个实施例中,步骤S3还包括:利用维特比算法计算测试样本对于所述多个隐马尔可夫模型的输出概率;基于所述输出概率显示该测试样本对应的按键类型和可选按键类型。
在一个实施例中,步骤S3还包括:根据用户所选择的按键情况判断分类结果是否正确;将分类结果正确的测试样本加入所述训练样本集中,对应的分类标签是该分类结果;将分类结果错误的测试样本加入到所述训练样本集中,对应的分类标签是根据用户的选择所确定的类别。
与现有技术相比,本发明的优点在于:利用人说话时产生的脸部振动信号来进行智能设备的文本输入,解决了智能设备由于屏幕太小或由于用户双手占用而导致的打字难问题;同时,基于脸部振动信号进行文本输入,避免了周围环境噪声的影响,也避免了受重放攻击和模仿攻击的影响;此外,本发明还提出了一种实时校正和自适应机制用于校正错误的识别结果和更新训练样本集,提高了输入文本的识别精度和鲁棒性。
附图说明
以下附图仅对本发明作示意性的说明和解释,并不用于限定本发明的范围,其中:
图1示出了根据本发明一个实施例的基于脸部振动的智能设备输入方法的流程图;
图2示出了根据本发明一个实施例的基于脸部振动的智能手表输入方法的原理示意图;
图3示出了根据本发明一个实施例的基于脸部振动的智能手表输入方法的信号感知设备;
图4示出了根据本发明一个实施例的信号放大器的电路原理图;
图5示出了根据本发明一个实施例的一段振动信号的示意图。
具体实施方式
为了使本发明的目的、技术方案、设计方法及优点更加清楚明了,以下结合附图通过具体实施例对本发明进一步详细说明。应当理解,此处所描述的具体实施例仅用于解释本发明,并不用于限定本发明。
在本文示出和讨论的所有例子中,任何具体值应被解释为仅仅是示例性的,而不是作为限制。因此,示例性实施例的其它例子可以具有不同的值。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
为了便于本领域技术人员的理解,下面结合附图和实例对本发明作进一步的描述。
根据本发明的一个实施例,提供了一种基于脸部振动的智能设备输入方法,简言之,该方法包括采集用户说话时产生的脸部振动信号;从振动信号中提取能够反映信号特征的梅尔频率倒谱(MFCC)系数;以梅尔频率倒谱系数作为观测序列,利用预先生成的隐马尔可夫模型(HMM)获得用户期望的文本输入,其中,预先生成的隐马尔可夫模型是以已知的梅 尔频率倒谱系数和对应的按键类型作为训练样本集通过训练获得。本发明实施例的输入方法可应用于可穿戴设备或其他类型的智能设备。在下文,将以智能手表为例进行说明。
参见图1所示,本发明实施例的基于脸部振动的智能设备输入方法包括以下步骤:
步骤S110,采集用户说话时产生的脸部振动信号。
在此步骤中,针对语音输入方式,采集用户说话时产生的脸部振动信号。
图2示意了智能手表的输入方法原理,当用户说话时,产生振动信号,振动信号经无线传输到达智能手表,智能手表对振动信号进一步处理,从中提取振动信号的特征,进而识别不同的振动信号对应的按键类别。
在一个实施例中,利用安装在眼镜上的信号感知模块采集人说话时产生的脸部振动信号,参见图3示意的信号感知模块310。信号感知模块310可以是压电薄膜振动传感器、压电陶瓷振动传感器或者其他能检测信号的振动传感器。例如,将压电陶瓷振动传感器安装在眼镜上,人说话时会带动眼镜振动,此时振动传感器可采集人说话时产生的脸部振动信号。
进一步地,可利用设置在眼镜上的信号处理模块320接收脸部振动信号,对脸部振动信号进行放大处理后接入到模数(AD)转换器,从而将脸部振动信号转换为数字信号。
应理解的是,信号感知模块310、信号处理模块320可设置在眼镜外部或嵌入到眼镜内部。此外,本文所描述的振动传感器、放大器、模数转换器等可使用市售的或定制器件,只要其功能能够实现本发明的目的即可。
图4示出了根据本发明一个实施例的放大器的电路原理图,该放大器采用市售的LMV358实现,其是一个两级放大器,最大放大倍数是225,每一级的放大倍数为15。为了滤除***噪声,每一级放大电路有一个带通滤波器,频率范围为15.9Hz到12.9kHz。
具体地,当振动信号经过放大器放大之后,接入AD模数转换器(例如MCP3008);AD模数转换器的下一级接树莓派,用于控制采集和发送脸部振动信号。
需说明的是,为简洁,未示出AD模数转换器、树莓派和其他的***电路,但应理解的是,本发明实施例所需的这些电路或芯片均可作为信号处理模块320的一部分,设置在眼镜上。
步骤S120,将脸部振动信号发送至智能设备。
在此步骤中,经由无线模块将经过放大、模数转换等处理之后的脸部 振动信号发送给智能手表,无线模块包括蓝牙传输模块、WiFi传输模块或其他能将信号发送给智能手表的无线传输模块。
例如,设置树莓派控制蓝牙模块,将经过步骤S110处理之后的数字信号发送给智能手表。
步骤S130,智能设备检测信号的有效部分。
在此步骤中,智能设备从接收的脸部振动信号中截取一段作为有效部分,通过截取有效部分在保留信号特征的前提下进一步提高了后续的处理速度。
在一个实施例中,基于能量的双门限端点检测法来检测信号的有效部分,具体包括:
步骤S131,智能手表接收蓝牙模块发送来的脸部振动信号之后,使用巴特沃斯带通滤波器对其进行滤波。
带通滤波器的截止频率例如可分别为10Hz和1000Hz。
步骤S132,对信号进行分帧,其中帧长为7ms,帧移为3.2ms,窗函数为Hamming窗,计算脸部振动信号的短时能量。
例如,短时能量的计算公式表示为:
Figure PCTCN2019081676-appb-000001
其中,E是帧信号的短时能量,L是帧信号的长度,S(i)是振动信号的幅度,t表示帧信号的时间索引。
步骤S133,基于脸部振动信号的短时能量设置截取有效部分时的高门限和低门限。
在获得脸部振动信号的短时能量之后,可进一步计算振动信号的能量标准差,记为σ,同时计算背景噪声的平均能量,记为u。
在一个实施例中,将截取时的低门限设置为TL=u+σ,将截取时的高门限设置为TH=u+3σ。
步骤S134,设置信号峰之间的最大间隔门限和最小长度门限。
在此步骤中,对于同一个振动信号,设置信号峰之间的最大间隔门限maxInter和最小长度门限minLen,可根据经验设置这两个参数,例如,maxInter一般是50(帧),minLen一般是30(帧)。
步骤S135,找出信号中能量最大的一帧信号且该帧信号的能量需要高 于所设置的高门限。
步骤S136,从该帧信号分别向左和向右延伸,直到下一帧信号的能量低于所设置的低门限,记录此时的帧位置,将得到的左边的帧位置作为该信号峰的起点,右边的帧位置作为该信号峰的终点。
获得起点和终点之后,在此步骤中还需要将该信号峰所在位置的帧能量设置为零,以便后续迭代处理其他的信号峰。
需说明的是,本文的“左”、“右”反映的是时序方向,例如,“向左延伸”是指搜索帧信号的前序帧,而“向右延伸”指搜索帧信号的后序帧。
步骤S137,重复步骤S135和步骤S136,直到找出整段信号中的所有信号峰。
步骤S138,若两个信号峰的间隔小于maxInter,则合并两个信号峰,即将该两个信号峰当作一个信号峰。
在此步骤中,通过合并信号峰,所有信号峰之间的间隔都大于maxInter。
步骤S139,若信号峰的长度小于minLen,则直接舍弃该信号峰。
经过上述处理之后,对于一个振动信号,最后得到的信号峰的数量应该为1,且该信号峰即为截取的振动信号的有效部分,若得到的信号峰的数量大于1,则将该振动信号视为无效信号,直接舍弃。
图6示意了经过上述处理之后的一段振动信号,横坐标示意的是采样值索引,纵坐标示意的是归一化幅度。可见,该段振动信号包括10个振动信号,每个振动信号对应一个信号峰,对于第8个振动信号,实际上包含两个小峰,但由于这两个小峰之间的间隔小于maxInter,则将这两个小峰作为一个峰处理,即对应一个振动信号。
步骤S140,提取信号的梅尔频率倒谱系数。
在此步骤中,从截取的有效部分提取梅尔频率倒谱系数作为信号特征。
在一个实施例中,提取梅尔频率倒谱系数包括:
对振动信号的有效部分进行预加重、分帧和加窗,例如,预加重的系数可设置为0.96,帧长为20ms,帧移为6ms,窗函数为Hamming窗;
对每一帧信号进行快速傅里叶变换(FFT)得到对应的频谱;
将获得的频谱通过梅尔滤波器组得到梅尔频谱,例如,梅尔滤波频率范围为10Hz到1000Hz,滤波器通道数为28;
对得到的梅尔频率频谱取对数,然后进行离散余弦变换(DCT),最后取前14个系数作为梅尔频率倒谱系数(MFCC)。
应理解的是,所提取的梅尔频率倒谱系数不限于14个,可根据训练模型的精确度和执行速度要求提取适当数量的梅尔频率倒谱系数。此外,本文对预加重、分帧、加窗、傅里叶变换等现有技术不作具体介绍。
步骤S150,以梅尔频率倒谱系数作为观测序列,训练隐马尔可夫模型。
在此步骤中,以提取的振动信号的梅尔频率倒谱系数(MFCC)作为信号特征来训练隐马尔可夫模型(HMM)。
以T9键盘为例,需要对10种数字(分别对应键盘上的数字0,1,2,…,9)进行分类,对每种数字都训练1个HMM模型,共10个HMM模型,最后求出各HMM模型对某个测试样本的输出概率,输出概率最高的HMM模型所对应的数字即是该测试样本的分类结果。
典型地,HMM模型采用λ=(A,B,π)表示,其中,π是初始状态概率矩阵,A是隐含状态转移概率矩阵,B是隐含状态对观测状态的生成矩阵。例如,采用鲍姆-韦尔奇算法训练HMM模型的过程包括:对HMM的参数进行初始化;计算前、后向概率矩阵;计算转移概率矩阵;计算各个高斯概率密度函数的均值和方差;计算各个高斯概率密度函数的权重;计算所有观测序列的输出概率,并进行累加得到总和输出概率。
具体地,以数字“0”对应的HMM模型的训练为例,其中,状态数N为3,每个状态包含的高斯混合的个数M都是2,训练过程包括:
对于数字“0”采集多个(例如10个)振动信号,然后分别求出这10个振动信号所对应的梅尔频率倒谱系数作为信号的特征,即数字“0”对应的训练样本集包括10个样本;
将初始状态概率矩阵π初始化为[1,0,0],将隐含状态转移概率矩阵A初始化为:
Figure PCTCN2019081676-appb-000002
然后,对数字“0”的每个观察序列(即MFCC参数)按状态数N进行平均分段,并将所有观察序列中属于一个段的MFCC参数组成一个大的矩阵,使用k均值算法进行聚类,计算得到各个高斯元的均值、方差和权系数;
对于每一个观察序列(即MFCC参数),计算它的前向概率、后向概率、标定系数数组、过渡概率和混合输出概率;
根据这10个观察序列的过渡概率重新计算HMM模型的转移概率,同时根据混合输出概率重新计算相关的高斯概率密度函数的均值、方差和权系数等;
计算所有观察序列的输出概率,并进行累加得到总和输出概率。
因为本发明实施例是部署在智能手表上,考虑到计算资源有限,所以该训练过程可只迭代1次。
综上,本发明解决的问题是给定一个信号的MFCC特征(即观察序列)和HMM模型λ=(A,B,π),然后计算观察序列对HMM模型的输出概率。本发明实施例为每个按键类型生成一个对应的HMM,每个观测序列由一个脸部振动信号的梅尔频率倒谱系数构成,最终评估出最有可能产生观测序列所代表的读音的HMM。
步骤S160,对测试数据进行分类识别。
在此步骤中,利用步骤S150生成的隐马尔可夫模型对测试样本进行分类识别。
在一个实施例中,分类识别包括:利用维特比算法计算测试样本对于各隐马尔可夫模型的输出概率,并给出最佳的状态路径;
输出概率最大的隐马尔可夫模型所对应的类别即为该测试样本的分类结果。
步骤S170,对分类结果进行校正。
为了提高隐马尔可夫模型的识别精确度,可使用实时校正和自适应机制对分类结果进行校正,以优化步骤S150中使用的训练样本集。
具体地,在步骤S160中除了输出最后的分类结果之外,还根据各隐马尔可夫模型的输出概率给出可能性最高的两个候选按键和“Delete”按键。当分类结果正确时,用户不需要进行任何操作;当分类结果错误时,若是正确的分类结果出现在候选按键中,则用户可以点击候选按键进行校正,若是正确的分类结果没有出现在候选按键中,则用户需要利用智能手表的内置虚拟键盘输入正确的数字来进行校正;若用户在输入时,由于发音错误或者眼镜佩戴等原因造成输入本身就是错误的,则用户可以点击“Delete”按键来删除该输入数字。
在一个实施例中,对分类结果进行校正包括:
步骤S171,若用户没点击任何按键也没有使用内置虚拟键盘来进行输入,则表示该次输入的分类结果是正确的,将该次输入所对应的脸部振动 信号加入训练样本集中1次;
步骤S172,若用户点击了候选按键,则代表该次输入的分类结果是错误的,而且该次输入的正确分类结果出现在候选按键中,则该次输入所对应的脸部振动信号将会被加入训练样本集中n i次。
其中,n i代表按键i连续错误的次数,1≤n i≤3。例如,若是按键2的分类结果连续错误了2次,则n i等于2。若是按键i的连续错误次数超过3次,则n i仍然设置成3。而一旦按键i的分类结果是正确的,则n i被重置为1。
步骤S173,若用户使用智能手表内置的虚拟键盘来输入数字,则代表该次输入的分类结果是错误的,而且该次输入的正确分类结果没有出现在候选按键中,则该次输入所对应的脸部振动信号将会被加入训练样本集中3次。
步骤S174,若用户点击了“Delete”键,代表用户在输入时本身就存在错误,则该次输入所对应的脸部振动信号将会被直接丢弃。
步骤S175,判断是否需要重新训练隐马尔可夫模型。
定义每一个按键总共被加入到训练样本集中的次数为Q i,定义所有按键被加入到训练样本集中的总次数为N,可以得到:
Figure PCTCN2019081676-appb-000003
其中,当N大于等于10时,隐马尔可夫模型将会被重新训练。一旦某一个按键所对应的训练样本个数大于35个,则该按键的最早被加入到训练样本集中的训练样本将会被丢弃,从而保证该按键的最大训练样本个数为35个。
应理解的是,对于本发明实施例中涉及的训练样本个数、按键被加入到训练样本集中的次数等具体值,本领域的技术人员可根据模型训练精度、对文本输入的执行速度要求等设置合适值。
需要说明的是,虽然上文按照特定顺序描述了各个步骤,但是并不意味着必须按照上述特定顺序来执行各个步骤,实际上,这些步骤中的一些可以并发执行,甚至改变顺序,只要能够实现所需要的功能即可。
本发明可以是***、方法和/或计算机程序产品。计算机程序产品可以包括计算机可读存储介质,其上载有用于使处理器实现本发明的各个方面的计算机可读程序指令。
计算机可读存储介质可以是保持和存储由指令执行设备使用的指令的有形设备。计算机可读存储介质例如可以包括但不限于电存储设备、磁存储设备、光存储设备、电磁存储设备、半导体存储设备或者上述的任意合适的组合。计算机可读存储介质的更具体的例子(非穷举的列表)包括:便携式计算机盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、静态随机存取存储器(SRAM)、便携式压缩盘只读存储器(CD-ROM)、数字多功能盘(DVD)、记忆棒、软盘、机械编码设备、例如其上存储有指令的打孔卡或凹槽内凸起结构、以及上述的任意合适的组合。
以上已经描述了本发明的各实施例,上述说明是示例性的,并非穷尽性的,并且也不限于所披露的各实施例。在不偏离所说明的各实施例的范围和精神的情况下,对于本技术领域的普通技术人员来说许多修改和变更都是显而易见的。本文中所用术语的选择,旨在最好地解释各实施例的原理、实际应用或对市场中的技术改进,或者使本技术领域的其它普通技术人员能理解本文披露的各实施例。

Claims (10)

  1. 一种基于脸部振动的智能设备输入方法,包括以下步骤:
    步骤S1:采集用户进行语音输入时所产生的脸部振动信号;
    步骤S2:从所述脸部振动信号中提取梅尔频率倒谱系数;
    步骤S3:将所述梅尔频率倒谱系数作为观测序列,利用经训练的隐马尔可夫模型获得脸部振动信号对应的文本输入。
  2. 根据权利要求1所述的方法,其中,在步骤S1中,通过设置于眼镜上的振动传感器采集所述脸部振动信号。
  3. 根据权利要求1所述的方法,其中,在步骤S2中,对于一个振动信号进行以下处理:
    将采集到的所述脸部振动信号进行放大;
    将放大后的脸部振动信号经由无线模块发送至所述智能设备;
    所述智能设备从接收到的脸部振动信号中截取一段作为有效部分并从所述有效部分提取梅尔频率倒谱系数。
  4. 根据权利要求3所述的方法,其中,从脸部振动信号截取有效部分包括:
    基于所述脸部振动信号的短时能量标准差σ设置第一切断门限和第二切断门限,其中,第一切断门限是TL=u+σ,第二切断门限是TH=u+3σ,u是背景噪声的平均能量;
    从所述脸部振动信号中找出短时能量最大的一帧信号且该帧信号的能量高于所述第二切断门限;
    从该帧信号的前序帧和后序帧,分别找出能量低于所述第一切断门限并且在时序上与该帧信号最近的帧,将获得的前序帧位置作为起点,将获得的后续帧位置作为终点,截取起点和终点之间的部分作为所述脸部振动信号的有效部分。
  5. 根据权利要求4所述的方法,其中,从脸部振动信号截取有效部分还包括:
    对于一个振动信号,设置信号峰之间的最大间隔门限maxInter和最小长度门限minLen;
    若该振动信号的两个信号峰之间的间隔小于所述最大间隔门限maxInter,则将该两个信号峰作为该振动信号的一个信号峰;
    若该振动信号的一个信号峰的长度小于所述最小长度门限minLen,则舍弃该信号峰。
  6. 根据权利要求1所述的方法,其中,训练隐马尔可夫模型包括:
    对所述智能设备的每个输入按键类型生成一个对应的隐马尔可夫模型,获得多个隐马尔可夫模型;
    为每个隐马尔可夫模型构建相应的训练样本集,其中所述训练样本集中的每个观测序列由一个脸部振动信号的梅尔频率倒谱系数构成;
    评估出最有可能产生观测序列所代表的读音的隐马尔可夫模型作为所述经训练的隐马尔可夫模型。
  7. 根据权利要求1所述的方法,其中,步骤S3还包括:
    利用维特比算法计算测试样本对于所述多个隐马尔可夫模型的输出概率;
    基于所述输出概率显示该测试样本对应的按键类型和可选按键类型。
  8. 根据权利要求7所述的方法,其中,还包括:
    根据用户所选择的按键情况判断分类结果是否正确;
    将分类结果正确的测试样本加入所述训练样本集中,对应的分类标签是该分类结果;
    将分类结果错误的测试样本加入到所述训练样本集中,对应的分类标签是根据用户的选择所确定的类别。
  9. 一种计算机可读存储介质,其上存储有计算机程序,其中,该程序被处理器执行时实现根据权利要求1至8中任一项所述方法的步骤。
  10. 一种计算机设备,包括存储器和处理器,在所述存储器上存储有能够在处理器上运行的计算机程序,其特征在于,所述处理器执行所述程序时实现权利要求1至8中任一项所述的方法的步骤。
PCT/CN2019/081676 2019-04-08 2019-04-08 一种基于脸部振动的智能设备输入方法 WO2020206579A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/051,179 US11662610B2 (en) 2019-04-08 2019-04-08 Smart device input method based on facial vibration
PCT/CN2019/081676 WO2020206579A1 (zh) 2019-04-08 2019-04-08 一种基于脸部振动的智能设备输入方法

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2019/081676 WO2020206579A1 (zh) 2019-04-08 2019-04-08 一种基于脸部振动的智能设备输入方法

Publications (1)

Publication Number Publication Date
WO2020206579A1 true WO2020206579A1 (zh) 2020-10-15

Family

ID=72752176

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/081676 WO2020206579A1 (zh) 2019-04-08 2019-04-08 一种基于脸部振动的智能设备输入方法

Country Status (2)

Country Link
US (1) US11662610B2 (zh)
WO (1) WO2020206579A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11699428B2 (en) * 2020-12-02 2023-07-11 National Applied Research Laboratories Method for converting vibration to voice frequency wirelessly

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101950249A (zh) * 2010-07-14 2011-01-19 北京理工大学 默声音符编码字符输入方法和装置
US8082149B2 (en) * 2006-10-26 2011-12-20 Biosensic, Llc Methods and apparatuses for myoelectric-based speech processing
CN104123930A (zh) * 2013-04-27 2014-10-29 华为技术有限公司 喉音识别方法及装置
CN104538041A (zh) * 2014-12-11 2015-04-22 深圳市智美达科技有限公司 异常声音检测方法及***
CN105988768A (zh) * 2015-02-06 2016-10-05 电信科学技术研究院 智能设备控制方法、信号获取方法及相关设备
CN108735219A (zh) * 2018-05-09 2018-11-02 深圳市宇恒互动科技开发有限公司 一种声音识别控制方法及装置

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030233233A1 (en) * 2002-06-13 2003-12-18 Industrial Technology Research Institute Speech recognition involving a neural network
US20050047664A1 (en) * 2003-08-27 2005-03-03 Nefian Ara Victor Identifying a speaker using markov models
US7914468B2 (en) * 2004-09-22 2011-03-29 Svip 4 Llc Systems and methods for monitoring and modifying behavior
US9905240B2 (en) * 2014-10-20 2018-02-27 Audimax, Llc Systems, methods, and devices for intelligent speech recognition and processing
US20170069306A1 (en) * 2015-09-04 2017-03-09 Foundation of the Idiap Research Institute (IDIAP) Signal processing method and apparatus based on structured sparsity of phonological features
CN109923512A (zh) * 2016-09-09 2019-06-21 上海海知智能科技有限公司 人机交互的***及方法
US11004461B2 (en) * 2017-09-01 2021-05-11 Newton Howard Real-time vocal features extraction for automated emotional or mental state assessment
US11127394B2 (en) * 2019-03-29 2021-09-21 Intel Corporation Method and system of high accuracy keyphrase detection for low resource devices

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8082149B2 (en) * 2006-10-26 2011-12-20 Biosensic, Llc Methods and apparatuses for myoelectric-based speech processing
CN101950249A (zh) * 2010-07-14 2011-01-19 北京理工大学 默声音符编码字符输入方法和装置
CN104123930A (zh) * 2013-04-27 2014-10-29 华为技术有限公司 喉音识别方法及装置
CN104538041A (zh) * 2014-12-11 2015-04-22 深圳市智美达科技有限公司 异常声音检测方法及***
CN105988768A (zh) * 2015-02-06 2016-10-05 电信科学技术研究院 智能设备控制方法、信号获取方法及相关设备
CN108735219A (zh) * 2018-05-09 2018-11-02 深圳市宇恒互动科技开发有限公司 一种声音识别控制方法及装置

Also Published As

Publication number Publication date
US11662610B2 (en) 2023-05-30
US20210233533A1 (en) 2021-07-29

Similar Documents

Publication Publication Date Title
EP3806089B1 (en) Mixed speech recognition method and apparatus, and computer readable storage medium
CN107301865B (zh) 一种用于语音输入中确定交互文本的方法和装置
US9966077B2 (en) Speech recognition device and method
CN109036391B (zh) 语音识别方法、装置及***
US9323985B2 (en) Automatic gesture recognition for a sensor system
CN110058689A (zh) 一种基于脸部振动的智能设备输入方法
JP6461308B2 (ja) 音声認識装置およびリスコアリング装置
US9530417B2 (en) Methods, systems, and circuits for text independent speaker recognition with automatic learning features
US9595261B2 (en) Pattern recognition device, pattern recognition method, and computer program product
US11100932B2 (en) Robust start-end point detection algorithm using neural network
JP2000214883A (ja) 音声認識装置
EP3624114B1 (en) Method and apparatus for speech recognition
CN109190521B (zh) 一种基于知识提纯的人脸识别模型的构建方法及应用
Yin et al. Learning to recognize handwriting input with acoustic features
Yin et al. Ubiquitous writer: Robust text input for small mobile devices via acoustic sensing
CN109002803A (zh) 一种基于智能手表的握笔姿势检测和汉字笔顺识别方法
WO2020206579A1 (zh) 一种基于脸部振动的智能设备输入方法
Lakomkin et al. Subword regularization: An analysis of scalability and generalization for end-to-end automatic speech recognition
CN116027911B (zh) 一种基于音频信号的无接触手写输入识别方法
CN112017676B (zh) 音频处理方法、装置和计算机可读存储介质
KR101840363B1 (ko) 오류 발음 검출을 위한 단말 및 음성 인식 장치, 그리고 그의 음향 모델 학습 방법
CN113823326B (zh) 一种高效语音关键词检测器训练样本使用方法
Alshammri IoT‐Based Voice‐Controlled Smart Homes with Source Separation Based on Deep Learning
JP7347750B2 (ja) 照合装置、学習装置、方法、及びプログラム
CN112131541A (zh) 一种基于振动信号的身份验证方法和***

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19924588

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 21/01/2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19924588

Country of ref document: EP

Kind code of ref document: A1