WO2019085914A1 - 终端及其优化语音命令的方法、存储装置 - Google Patents

终端及其优化语音命令的方法、存储装置 Download PDF

Info

Publication number
WO2019085914A1
WO2019085914A1 PCT/CN2018/112804 CN2018112804W WO2019085914A1 WO 2019085914 A1 WO2019085914 A1 WO 2019085914A1 CN 2018112804 W CN2018112804 W CN 2018112804W WO 2019085914 A1 WO2019085914 A1 WO 2019085914A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
terminal
analog
audio
header information
Prior art date
Application number
PCT/CN2018/112804
Other languages
English (en)
French (fr)
Inventor
陈琼
Original Assignee
捷开通讯(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 捷开通讯(深圳)有限公司 filed Critical 捷开通讯(深圳)有限公司
Publication of WO2019085914A1 publication Critical patent/WO2019085914A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • G10L21/0388Details of processing therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/038Speech enhancement, e.g. noise reduction or echo cancellation using band spreading techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application relates to the field of electronic devices and audio technologies, and in particular, to a terminal and a method and a storage device thereof for optimizing voice commands.
  • the present application provides a terminal and a method and a storage device thereof for optimizing voice commands, which can reduce hardware requirements while ensuring a voice command recognition rate, and have low cost and high versatility.
  • a method for optimizing a voice command by a terminal includes:
  • the terminal receives or acquires an audio signal from the current environment
  • the terminal parses the audio signal and acquires file header information of the audio signal
  • the terminal selects an audio processing algorithm according to the file header information
  • the terminal expands the bandwidth of the audio signal by using the selected audio processing algorithm, and performs frequency band compensation on the frequency band of the expanded audio signal.
  • the file header information includes at least one of a sampling rate, a bit rate, a bandwidth, and a data byte number.
  • the method after the frequency band compensation is performed on the frequency band of the extended audio signal, the method further includes: the terminal uploading the audio signal after the frequency band compensation to the cloud, or passing the voice recognition technology based on the voice recognition technology
  • the band-compensated audio signal is converted to a character command.
  • the terminal collects an audio signal through a pickup, the pickup including one of an analog microphone and a digital microphone, the analog microphone collecting an analog audio signal from a current environment, the terminal pairing the analog audio
  • the signal is analog-to-digital converted and the audio signal is obtained.
  • the method wherein the terminal expands a bandwidth of the audio signal from 8 kHz to 16 kHz by a selected audio processing algorithm.
  • a terminal having an audio processing function includes a processor, a digital signal processor DSP, a wireless communicator and a memory connected to the processor, and a pickup connected to the DSP, wherein ,
  • a wireless communicator and a pickup for respectively receiving or acquiring an audio signal from a current environment
  • the processor is configured to parse the audio signal and obtain its header information, and select an audio processing algorithm from the memory according to the file header information;
  • the DSP is used to amplify the bandwidth of the audio signal by the selected audio processing algorithm, and perform frequency band compensation on the frequency band of the expanded audio signal.
  • a storage device stores program data, and the program data can be executed by:
  • the pickup of the terminal moves along the mesh route and collects the audio signal in the current environment
  • the bandwidth of the audio signal is expanded by the selected audio processing algorithm, and the frequency band of the expanded audio signal is band-compensated.
  • the storage device wherein the file header information includes at least one of a sampling rate, a bit rate, a bandwidth, and a data byte number.
  • the storage device after the frequency band compensation is performed on the frequency band of the expanded audio signal, the method further includes: the terminal uploading the audio signal after the frequency band compensation to the cloud, or based on the voice recognition technology
  • the audio signal after the band compensation is converted into a character command.
  • the storage device wherein the terminal collects an audio signal through a pickup, the pickup including one of an analog microphone and a digital microphone, the analog microphone collecting an analog audio signal from a current environment, the terminal pairing the simulation
  • the audio signal is analog-to-digital converted and the audio signal is obtained.
  • the storage device wherein the terminal expands a bandwidth of the audio signal from 8 kHz to 16 kHz by a selected audio processing algorithm.
  • the present invention analyzes and obtains the file header information of the audio signal, and selects an appropriate audio processing algorithm according to the method, and then performs bandwidth expansion and frequency band compensation on the audio signal through the selected audio processing algorithm, and the processing method of the pure algorithm requires hardware. It is low, so it can reduce the hardware requirements while ensuring the voice command recognition rate, and the cost is low and the versatility is strong.
  • FIG. 1 is a schematic flow chart of a first embodiment of a method for optimizing a voice command according to the present application
  • FIG. 2 is a schematic diagram of a circuit for collecting audio signals by a pickup according to an embodiment of the present application
  • FIG. 3 is a schematic structural diagram of a terminal according to an embodiment of the present application.
  • FIG. 4 is a schematic flow chart of a method for optimizing a second embodiment of a voice command according to the present application.
  • the terminal to which the present application is applied may be an electronic consumer device, a smart phone, a portable communication device, or a PDA (Personal Mobile terminals such as Digital Assistant, personal digital assistant or tablet), laptops, etc., can also be wearable devices that are worn on the body or embedded in clothing, jewelry, accessories, and other electronic devices with audio processing functions.
  • PDA Personal Mobile terminals
  • wearable devices that are worn on the body or embedded in clothing, jewelry, accessories, and other electronic devices with audio processing functions.
  • FIG. 1 is a schematic flowchart diagram of a method for optimizing a voice command according to a first embodiment of the present application.
  • the method for optimizing voice commands in this embodiment may include steps S11 to S14.
  • S11 The terminal receives or collects an audio signal from the current environment.
  • the terminal can acquire the audio signal in two ways:
  • the terminal downloads from the network and the cloud, or receives from other devices that establish a connection relationship with the terminal.
  • the terminal can access the network and the cloud through its own modules such as Bluetooth, Wi-Fi, and the network, or establish a connection relationship with other devices, and thereby obtain an audio signal.
  • the audio signal acquired by the terminal is a digital audio signal.
  • the terminal collects audio signals from the current environment through a microphone or the like.
  • the pickup may be an analog microphone, and the audio signal collected by the pickup is an analog audio signal, and the output thereof is also an analog audio signal.
  • the terminal may use the pickup and the analog module.
  • Number converter Analog-to-Digital The converter, ADC
  • ADC Analog-to-Digital The converter
  • the analog audio signal is converted into a digital audio signal by analog-to-digital conversion of the analog-to-digital converter, and continues to be transmitted to subsequent circuits of the terminal for various digital processing.
  • the pickup of the embodiment can also be a digital microphone.
  • the biggest advantage of the digital microphone is that it has strong anti-interference ability, does not need to have a high-frequency filter capacitor and a filter circuit like a conventional microphone, and because the digital microphone outputs a digital audio signal. Therefore, the terminal can directly connect the pickup to subsequent circuits and perform various digital processing.
  • the pickup of the embodiment includes, but is not limited to, the above.
  • the terminal can also acquire audio signals from the current environment by vibrating the motor and based on the back electromotive force principle. Specifically: based on Faraday's law of electromagnetic induction, an AC (Alternating Current) signal in a vibrating motor generates a varying magnetic field on the coil to generate an electromagnetically induced electromotive force, and at the same time, an audio signal generated by a person changes the air pressure. The diaphragm of the vibration motor is vibrated by vibrating the surrounding air.
  • AC Alternating Current
  • the vibration motor Based on Lenz's law, when the vibration caused by the audio signal and the vibration caused by the electromagnetic induction impinge on the same diaphragm, the external force of the diaphragm is opposite, and the vibration motor generates an electromotive force opposite to the electromagnetic induction electromotive force, that is, the counter electromotive force.
  • the digital audio signal is obtained by monitoring the current generated by the counter electromotive force and undergoing electro-acoustic conversion.
  • the vibration motor's diaphragm effective area (the area suitable for sound impact) is larger, and it can capture the audio signal of a wider frequency band, which is more conducive to improving the voice command recognition rate.
  • a target sound source eg, a human located in the current environment can play a sine wave signal of 20 Hz-20 kHz, and the pickup of the terminal can move along the mesh route and acquire an audio analog signal in the current environment.
  • the pickup in the sounding direction of the target sound source, the pickup can be moved line by line or column by column, and an audio signal is collected.
  • S12 The terminal parses the audio signal and acquires file header information of the audio signal.
  • the parsed audio signal is a digital audio signal
  • the acquired file header information includes, but is not limited to, at least one of a sampling rate, a bit rate, a bandwidth, and a data byte number.
  • S13 The terminal selects an audio processing algorithm according to the file header information.
  • the terminal selects an audio processing algorithm that best matches the various data contained in the file header information.
  • the audio processing algorithm processes the audio signal with the best efficiency and quality, such as bandwidth expansion and band compensation efficiency and quality. Based on this, the embodiment does not limit the type of audio processing algorithm and its principle and process of performing bandwidth expansion and band compensation.
  • S14 The terminal expands the bandwidth of the audio signal by using the selected audio processing algorithm, and performs frequency band compensation on the frequency band of the expanded audio signal.
  • the audio processing algorithm can modify the audio signal (the vocal) in the frequency band of 20 Hz -20 kHz to change its audio curve. For example, the audio processing algorithm first expands the collected audio signal from the 8 kHz bandwidth to 16 kHz, compensates for the loss of the part of the human voice, and then performs band compensation on the low sampling rate band, that is, repairs the expanded audio signal. The part of the vocals that make up is more in line with the actual vocal characteristics.
  • the embodiment basically processes the audio signal by a pure algorithm, and the degree of dependence on the hardware is relatively low.
  • the voice acquisition device with high performance is adopted, and the embodiment can ensure the recognition rate of the voice command.
  • the hardware requirements are reduced, the cost is low, and there is no need to redesign the entire hardware system for compatibility, and the versatility is strong.
  • the terminal can convert the algorithm-processed audio signal into a character command based on Automatic Speech Recognition (ASR) technology.
  • Speech recognition technology is a technique for converting speech signals into characters such as characters, which mainly depends on acoustic models, pronunciation fonts, and language type libraries.
  • the acoustic model is a trained statistical model that obtains a corresponding phoneme sequence by identifying the phoneme of the audio signal processed by the algorithm, and then the present application compares the phonemes in the pronunciation font, lists the candidate words, and The possible pronunciations of these candidate words are based on the matched phoneme sequences, and the most probable characters are selected from the candidate words, and the grammar included in the language model is used as a reference to obtain a character instruction.
  • the terminal can also upload the audio signal processed by the algorithm to the cloud.
  • the above functions may be stored in an electronic device readable storage medium if implemented in the form of software functions and sold or used as a stand-alone product, that is, the present application also provides a storage device storing program data.
  • the program data can be executed to implement the method of the above embodiment, and the storage device can be, for example, a USB flash drive, an optical disk, a server, or the like. That is, the above-described embodiments may be embodied in the form of a software product that includes instructions for causing a terminal to perform all or part of the steps of the method.
  • the terminal 30 shown in FIG. 3 will be described below as an example.
  • the terminal 30 may include a pickup 31, an audio decoder 32, and a DSP (Digital Signal).
  • the pickup 31 is connected to the DSP 33, and the DSP 33, the memory 35, and the wireless communicator 36 are connected to the processor 34.
  • the terminal 30 may further include a power management unit that is coupled to the pickup 31, the audio decoder 32, the DSP 33, the processor 34, and the wireless communicator 36, and is used to manage power supply to the various structural elements.
  • the processor 34 is configured to run the operating system of the terminal 30 and perform task management on each structural component, such as power-on of structural components, hardware initialization, and start of playing threads, decoding threads, creating audio tracks, mixing, etc. at appropriate times. operating.
  • the audio decoder 32 is configured to provide at least one interface to support access of the input/output device and to ensure normal operation of the accessed input/output device, for example, the interface of the audio decoder 32 includes a speaker amplifier, a digital/analog microphone. interface.
  • the pickup 31 serves as an input/output device for collecting an audio signal from the current environment.
  • the pickup 31 can be an analog microphone, in which case the audio signal is an analog audio signal, and the audio decoder 32 has an analog-to-digital converter (Analog-to-Digital) Converter, ADC), the analog audio signal is converted into a digital audio signal by analog-to-digital conversion of the analog-to-digital converter, and continues to be transmitted to the DSP. 33.
  • ADC analog-to-digital converter
  • the pickup 31 can also be a digital microphone that directly outputs a digital audio signal.
  • the analog-to-digital conversion of the analog audio signal by the DSP 33 sends the digital audio signal to the processor 34, which is used by the processor 34 to parse the digital audio signal and obtain its header information, and from the memory 35 based on the header information. Select the appropriate audio processing algorithm.
  • the file header information includes, but is not limited to, at least one of a sampling rate, a bit rate, a bandwidth, and a data byte number.
  • the processor 34 burns the message of the selected audio processing algorithm into the DSP through an I2C (Inter-Integrated Circuit). 33.
  • the DSP 33 amplifies the bandwidth of the audio signal through an audio processing algorithm, and performs band compensation on the frequency band of the expanded audio signal.
  • the DSP 33 has a memory buffer pool to avoid resource preemption problems in the audio processing algorithm processing audio signals.
  • the main function of this audio processing algorithm is to expand the collected audio signal from 8 kHz bandwidth to 16 kHz, to compensate for the loss of the part of the human voice, and then to compensate the frequency band of the low sampling rate, that is, the expanded audio signal
  • the repair makes the part of the vocals that make up more in line with the actual vocal characteristics.
  • PCM pulse code modulation, Pulse Code Modulation format data, so the processor 34 does not need to encode the audio signal processed by the algorithm.
  • the memory 35 is used to store various types of audio processing algorithms and audio signals, and to temporarily store the data processed by the steps as a cache to facilitate the call of the processor 34.
  • the processor 34 may call the processed audio signal and convert it to a character command, which is then uploaded to the cloud via the wireless communicator 36, or the processor 34 calls the processed audio signal and uploads it directly to the cloud. .
  • the wireless communicator 36 is configured to transmit and receive data transmitted from the local to the cloud, or to receive audio data that is fed back from the cloud due to locally transmitted commands.
  • the wireless communicator 36 can access the network and the cloud by its own modules such as Bluetooth, Wi-Fi, and the network, or establish a connection relationship with other devices, and thereby obtain an audio signal, and the audio signal acquired at this time is digital audio. signal.
  • the wireless communicator 36 first buffers the received data into the memory 35.
  • FIG. 4 is a specific application example of a method for the terminal 30 to perform an optimized voice command. This embodiment is used to implement the entire process of the foregoing embodiments of the present application, and details are not described herein.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种终端优化语音命令的方法,包括:终端接收或者从当前环境中采集音频信号(S11);终端解析音频信号并获取音频信号的文件头信息(S12);终端根据文件头信息选取音频处理算法(S13);终端通过选取的音频处理算法对音频信号的带宽进行扩充,并对扩充后的音频信号的频段进行频段补偿(S14)。能够在确保语音命令识别率的同时降低硬件要求,成本低且通用性强。

Description

终端及其优化语音命令的方法、存储装置
本申请要求于2017年10月30日提交中国专利局,申请号为201711038813.X,发明名称为“终端及其优化语音命令的方法、存储装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及电子设备和音频技术领域,具体涉及一种终端及其优化语音命令的方法、存储装置。
背景技术
随着各种电子产品的快速普及,用户对终端的智能化、人性化要求越来越高,如何使终端更加智能化、专业化、多样化,以及更加高效的使用于日常生活中,已经成为当前研究方向之一。以基于语音识别技术的AI(Artificial Intelligence, 人工智能)功能为例,为了提高语音命令的识别率,当前很多厂家只限于在终端产品上使用更好的语音采集器件,但是这种很高的硬件要求,不仅会增加成本,而且为了实现兼容需要对整个硬件***进行重新设计,通用性较差。
技术问题
本申请提供一种终端及其优化语音命令的方法、存储装置,能够在确保语音命令识别率的同时降低硬件要求,成本低且通用性强。
技术解决方案
第一方面,本申请一实施例的终端优化语音命令的方法,包括:
终端接收或者从当前环境中采集音频信号;
终端解析音频信号并获取所述音频信号的文件头信息;
终端根据所述文件头信息选取音频处理算法;
终端通过选取的音频处理算法对音频信号的带宽进行扩充,并对扩充后的音频信号的频段进行频段补偿。
所述的方法,其中,所述文件头信息包括采样率、比特率、带宽、以及数据字节位数中的至少一种。
所述的方法,其中,所述对扩充后的音频信号的频段进行频段补偿之后,所述方法还包括:所述终端将经过频段补偿后的音频信号上传至云端,或者基于语音识别技术将经过频段补偿后的音频信号转换为字符指令。
所述的方法,其中,所述终端通过拾音器采集音频信号,所述拾音器包括模拟麦克风和数字麦克风中的一个,所述模拟麦克风从当前环境中采集模拟音频信号,所述终端对所述模拟音频信号进行模数转换并得到所述音频信号。
所述的方法,其中,所述终端通过选取的音频处理算法将所述音频信号的带宽从8kHz扩充为16kHz。
第二方面,本申请一实施例的具有音频处理功能的终端,包括处理器,与所述处理器连接的数字信号处理器DSP、无线通信器和存储器,以及与所述DSP连接的拾音器,其中,
无线通信器和拾音器分别用于接收或者从当前环境中采集音频信号;
处理器用于解析音频信号并获取其文件头信息,以及根据所述文件头信息从存储器中选取音频处理算法;
DSP用于通过选取的音频处理算法对音频信号的带宽进行扩充,并对扩充后的音频信号的频段进行频段补偿。
第三方面,本申请一实施例的存储装置,存储有程序数据,所述程序数据能够被执行方法:
在目标声源的出声方向上,终端的拾音器沿网状路线移动并采集当前环境中的音频信号;
解析所述音频信号并获取所述音频信号的文件头信息;
根据所述文件头信息选取音频处理算法;
通过选取的音频处理算法对所述音频信号的带宽进行扩充,并对扩充后的音频信号的频段进行频段补偿。
所述的存储装置,其中,所述文件头信息包括采样率、比特率、带宽、以及数据字节位数中的至少一种。
所述的存储装置,其中,所述对扩充后的音频信号的频段进行频段补偿之后,所述方法还包括:所述终端将经过频段补偿后的音频信号上传至云端,或者基于语音识别技术将经过频段补偿后的音频信号转换为字符指令。
所述的存储装置,其中,所述终端通过拾音器采集音频信号,所述拾音器包括模拟麦克风和数字麦克风中的一个,所述模拟麦克风从当前环境中采集模拟音频信号,所述终端对所述模拟音频信号进行模数转换并得到所述音频信号。
所述的存储装置,其中,所述终端通过选取的音频处理算法将所述音频信号的带宽从8kHz扩充为16kHz。
有益效果
本申请通过解析获取音频信号的文件头信息,并据此选取合适的音频处理算法,继而通过选取的音频处理算法对音频信号进行带宽扩充及频段补偿,这种纯算法的处理方式对硬件要求较低,因此能够在确保语音命令识别率的同时降低硬件要求,成本低且通用性强。
附图说明
图1是本申请优化语音命令的方法的第一实施例的流程示意图;
图2是本申请一实施例的拾音器采集音频信号的线路示意图;
图3是本申请一实施例的终端的结构示意图;
图4是本申请优化语音命令的第二实施例的方法的流程示意图。
本发明的实施方式
本申请所适用的终端可以为电子消费装置、智能手机、便携式通信装置、PDA(Personal Digital Assistant,个人数字助理或平板电脑)、笔记本电脑等移动终端,也可以是佩戴于肢体或者嵌入于衣物、首饰、配件中的可穿戴设备,还可以是其他具有音频处理功能的电子设备。
下面将结合本申请实施例中的附图,对本申请所提供的各个示例性的实施例的技术方案进行清楚、完整地描述。在不冲突的情况下,下述各个实施例及其技术特征可以相互组合。
图1是本申请第一实施例的优化语音命令的方法的流程示意图。请参阅图1,本实施例的优化语音命令方法可以包括步骤S11~S14。
S11:终端接收或者从当前环境中采集音频信号。
在本实施例中,终端可以通过两种方式获取音频信号:
第一种方式,终端从网络及云端下载,或者从与终端建立连接关系的其他设备接收。例如,终端可以通过自身的蓝牙、Wi-Fi以及网络等模块接入网络及云端,或者与其他设备建立连接关系,并由此获取音频信号。此时,终端获取的该音频信号为数字音频信号。
第二种方式,终端通过麦克风等拾音器从当前环境中采集音频信号。在本实施例中,该拾音器可以为模拟麦克风,拾音器采集到的音频信号是模拟音频信号,其输出的也是模拟音频信号,为了便于后续对音频信号进行各种数字处理,终端可以将拾音器与模数转换器(Analog-to-Digital Converter, ADC)连接,模拟音频信号通过模数转换器的模数转换后变为数字音频信号,并继续传输给终端的后续电路以进行各种数字处理。当然,本实施例的拾音器还可以为数字麦克风,数字麦克风的最大优点是抗干扰能力强,无需像传统传声器那样内置高频滤波电容以及滤波器电路,并且,由于数字麦克风输出的是数字音频信号,因此终端可以直接将拾音器与后续电路连接并进行各种数字处理。
应理解,本实施例的拾音器包括但不限于上述。例如,终端还可以通过振动电机并基于反电动势原理从当前环境中采集音频信号。具体地:基于法拉第电磁感应定律,振动电机中的AC(Alternating Current, 交流电)信号在线圈上产生变化的磁场,产生电磁感应电动势,与此同时,人说话产生的音频信号使空气压力发生变化,通过振动周围空气而引起振动电机的膜片振动。基于伦兹定律,当音频信号引起的振动和电磁感应引起的振动撞击在同一膜片时,膜片受到的外力方向相反,振动电机会产生与电磁感应电动势相反的电动势,即反电动势。通过监测反电动势产生的电流,并经过电声转换即可得到数字音频信号。相比较于麦克风,振动电机的膜片有效区域(适合声音撞击的区域)更大,能够捕捉到更广频段的音频信号,更加有利于提高语音命令识别率。
在本实施例中,位于当前环境中的目标声源(例如人类)可以播放20Hz-20kHz的正弦波信号,终端的拾音器可以沿网状路线移动并采集当前环境中的音频模拟信号。具体地,如图2所示,在目标声源的出声方向上,拾音器可以沿逐行或者逐列移动,并采集音频信号。
S12:终端解析音频信号并获取所述音频信号的文件头信息。
被解析的音频信号为数字音频信号,获取的文件头信息包括但不限于采样率、比特率、带宽以及数据字节位数中的至少一种。
S13:终端根据文件头信息选取音频处理算法。
终端选取得到的是与文件头信息所包含的各种数据最匹配的音频处理算法,该音频处理算法处理音频信号的效率和质量最佳,例如带宽扩充及频段补偿的效率和质量最佳。基于此,本实施例并不限制音频处理算法的类型及其进行带宽扩充和频段补偿的原理和过程。
S14:终端通过选取的音频处理算法对音频信号的带宽进行扩充,并对扩充后的音频信号的频段进行频段补偿。
在一种应用场景中,音频处理算法可以将音频信号(人声)在20 Hz -20kHz的频段内进行频点修改以改变其音频曲线。例如,音频处理算法首先将采集到的音频信号从8 kHz带宽扩充为16kHz,弥补损失的那部分人声,然后对其中低采样率的频段进行频段补偿,即对扩充后的音频信号进行修复,使得弥补的那部分人声更加符合实际人声特点。
由上述可知,本实施例实质上是通过纯算法处理音频信号,对硬件的依赖程度较低,相比较于现有技术采用性能高的语音采集器件,本实施例能够在确保语音命令识别率的同时降低硬件要求,成本低,而且无需为了实现兼容对整个硬件***进行重新设计,通用性强。
在前述基础上,终端可基于语音识别(Automatic Speech Recognition, ASR)技术将算法处理后的音频信号转换为字符指令。语音识别技术即是将语音信号转换为文字等字符的技术,其主要依赖于声学模型、发音字库和语言类型库。其中,声学模型是经过训练有素的统计模型,其通过识别算法处理后的音频信号的音素而得到对应的音素序列,然后本申请将这些音素在发音字库中进行比对,列出候选字以及这些候选字可能的发音,基于匹配的音素序列,从这些候选字中选出最有可能的文字,再结合语言模型所包括的语法为参照,得出字符指令。
当然,终端也可以将算法处理后的音频信号上传至云端。
应该理解到,上述功能如果以软件功能的形式实现并作为独立产品销售或使用时,可存储在一个电子设备可读取存储介质中,即,本申请还提供一种存储有程序数据的存储装置,所述程序数据能够被执行以实现上述实施例的方法,该存储装置可以为如U盘、光盘、服务器等。也就是说,上述实施例可以以软件产品的形式体现出来,其包括若干指令用以使得一台终端执行所述方法的全部或部分步骤。
在实际应用场景中,鉴于终端的结构设计不同,执行上述各个步骤的结构器件也不相同。下面以图3所示的终端30为例进行描述。
请参阅图3,终端30可以包括拾音器31、音频解码器32、DSP(Digital Signal Processing, 数字信号处理器)33、处理器34、存储器35以及无线通信器36,拾音器31与DSP 33连接,DSP 33、存储器35以及无线通信器36与处理器34连接。当然,终端30还可以包括电源管理单元,该电源管理单元与拾音器31、音频解码器32、DSP 33、处理器34以及无线通信器36连接,并用于管理对各个结构元件的供电。
处理器34用于运行终端30的操作***,并对各个结构元件进行任务管理,例如结构元件的上电、硬件初始化之后、以及在适当时间启动播放线程、解码线程、创造音轨、混音等操作。
音频解码器32用于提供至少一个接口以支持输入/输出设备的接入,并保证所接入的输入/输出设备的正常工作,例如音频解码器32的接口包括喇叭功放、数字/模拟麦克风的接口。拾音器31作为一个输入/输出设备,用于从当前环境中采集音频信号。该拾音器31可以为模拟麦克风,此时音频信号是模拟音频信号,音频解码器32内置有模数转换器(Analog-to-Digital Converter, ADC),模拟音频信号通过模数转换器的模数转换后变为数字音频信号,并继续传输给DSP 33。当然,该拾音器31还可以为数字麦克风,其直接输出数字音频信号。
DSP 33对模拟音频信号进行模数转换后会将数字音频信号发送给处理器34,处理器34用于解析所述数字音频信号并获取其文件头信息,以及根据所述文件头信息从存储器35中选取合适的音频处理算法。其中,文件头信息包括但不限于采样率、比特率、带宽以及数据字节位数中的至少一种。处理器34将选取的音频处理算法的消息通过I2C(Inter-Integrated Circuit, 两线式串行总线)烧录入DSP 33中。
DSP 33通过音频处理算法对音频信号的带宽进行扩充,并对扩充后的音频信号的频段进行频段补偿。该DSP 33具有内存缓冲池,用来避免在音频处理算法处理音频信号的过程中出现资源抢占的问题。此音频处理算法的主要作用是将采集到的音频信号从8 kHz带宽扩充为16kHz,弥补损失的那部分人声,然后对其中低采样率的频段进行频段补偿,即对扩充后的音频信号进行修复,使得弥补的那部分人声更加符合实际人声特点。在本实施例中,鉴于该DSP 33处理后的音频信号为PCM(脉冲编码调制, Pulse Code Modulation)格式数据,因此处理器34无需对算法处理后的音频信号进行编码处理。
存储器35用于保存各种类型的音频处理算法以及音频信号,以及作为缓存将各步骤处理完成的数据进行临时存放,以便于处理器34的调用。例如,处理器34可调用处理完成后的音频信号,并将其转换为字符指令,继而通过无线通信器36上传至云端,或者处理器34调用处理完成后的音频信号并将其直接上传至云端。
无线通信器36用于发送和接收从本地传送至云端的数据,或者接收由于本地发送的命令而从云端反馈回来的音频数据。例如,该无线通信器36可以自身的蓝牙、Wi-Fi以及网络等模块接入网络及云端下载,或者与其他设备建立连接关系,并由此获取音频信号,此时获取的音频信号为数字音频信号。为了保证数据的完整以及高效率的本地处理,无线通信器36先将接收的数据缓存至存储器35中。
请参阅图4,为终端30执行优化语音命令的方法一具体应用例。该实施例用以实现本申请前述实施例的整个过程,在此不作赘述。
以上所述仅为本申请的实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,例如各实施例之间技术特征的相互结合,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (14)

  1. 一种终端优化语音命令的方法,其中,所述方法包括:
    终端接收或者从当前环境中采集音频信号;
    所述终端解析所述音频信号并获取所述音频信号的文件头信息;
    所述终端根据所述文件头信息选取音频处理算法;
    所述终端通过选取的音频处理算法对所述音频信号的带宽进行扩充,并对扩充后的音频信号的频段进行频段补偿。
  2. 根据权利要求1所述的方法,其中,所述文件头信息包括采样率、比特率、带宽、以及数据字节位数中的至少一种。
  3. 根据权利要求1所述的方法,其中,所述对扩充后的音频信号的频段进行频段补偿之后,所述方法还包括:
    所述终端将经过频段补偿后的音频信号上传至云端,或者基于语音识别技术将经过频段补偿后的音频信号转换为字符指令。
  4. 根据权利要求1所述的方法,其中,所述终端通过拾音器采集音频信号,所述拾音器包括模拟麦克风和数字麦克风中的一个,所述模拟麦克风从当前环境中采集模拟音频信号,所述终端对所述模拟音频信号进行模数转换并得到所述音频信号。
  5. 根据权利要求1所述的方法,其中,所述终端通过选取的音频处理算法将所述音频信号的带宽从8kHz扩充为16kHz。
  6. 一种具有音频处理功能的终端,其中,所述终端包括处理器,与所述处理器连接的数字信号处理器DSP、无线通信器和存储器,以及与所述DSP连接的拾音器,其中,
    所述无线通信器和所述拾音器分别用于接收或者从当前环境中采集音频信号;
    所述处理器用于解析所述音频信号并获取其文件头信息,以及根据所述文件头信息从所述存储器中选取音频处理算法;
    所述DSP用于通过选取的音频处理算法对所述音频信号的带宽进行扩充,并对扩充后的音频信号的频段进行频段补偿。
  7. 根据权利要求6所述的终端,其中,所述文件头信息包括采样率、比特率、带宽、以及数据字节位数中的至少一种。
  8. 根据权利要求6所述的终端,其中,所述处理器还用于将经过频段补偿后的音频信号上传至云端,或者基于语音识别技术将经过频段补偿后的音频信号转换为字符指令。
  9. 根据权利要求6所述的终端,其中,所述拾音器包括模拟麦克风和数字麦克风中的一个,所述模拟麦克风用于从当前环境中采集模拟音频信号,所述终端还包括模数转换器,所述模数转换器用于对所述模拟音频信号进行模数转换并得到所述音频信号。
  10. 一种存储装置,其中,所述存储装置存储有程序数据,所述程序数据能够被执行方法:
    在目标声源的出声方向上,终端的拾音器沿网状路线移动并采集当前环境中的音频信号;
    解析所述音频信号并获取所述音频信号的文件头信息;
    根据所述文件头信息选取音频处理算法;
    通过选取的音频处理算法对所述音频信号的带宽进行扩充,并对扩充后的音频信号的频段进行频段补偿。
  11. 根据权利要求10所述的存储装置,其中,所述文件头信息包括采样率、比特率、带宽、以及数据字节位数中的至少一种。
  12. 根据权利要求10所述的存储装置,其中,所述对扩充后的音频信号的频段进行频段补偿之后,所述方法还包括:
    所述终端将经过频段补偿后的音频信号上传至云端,或者基于语音识别技术将经过频段补偿后的音频信号转换为字符指令。
  13. 根据权利要求10所述的存储装置,其中,所述终端通过拾音器采集音频信号,所述拾音器包括模拟麦克风和数字麦克风中的一个,所述模拟麦克风从当前环境中采集模拟音频信号,所述终端对所述模拟音频信号进行模数转换并得到所述音频信号。
  14. 根据权利要求10所述的存储装置,其中,所述终端通过选取的音频处理算法将所述音频信号的带宽从8kHz扩充为16kHz。
PCT/CN2018/112804 2017-10-30 2018-10-30 终端及其优化语音命令的方法、存储装置 WO2019085914A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711038813.XA CN107886966A (zh) 2017-10-30 2017-10-30 终端及其优化语音命令的方法、存储装置
CN201711038813.X 2017-10-30

Publications (1)

Publication Number Publication Date
WO2019085914A1 true WO2019085914A1 (zh) 2019-05-09

Family

ID=61782987

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/112804 WO2019085914A1 (zh) 2017-10-30 2018-10-30 终端及其优化语音命令的方法、存储装置

Country Status (2)

Country Link
CN (1) CN107886966A (zh)
WO (1) WO2019085914A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107886966A (zh) * 2017-10-30 2018-04-06 捷开通讯(深圳)有限公司 终端及其优化语音命令的方法、存储装置

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007059437A2 (en) * 2005-11-14 2007-05-24 Motorola Inc. Method and apparatus for improving listener differentiation of talkers during a conference call
US20110282675A1 (en) * 2009-04-09 2011-11-17 Frederik Nagel Apparatus and Method for Generating a Synthesis Audio Signal and for Encoding an Audio Signal
CN103915104A (zh) * 2012-12-31 2014-07-09 华为技术有限公司 信号带宽扩展方法和用户设备
CN103971694A (zh) * 2013-01-29 2014-08-06 华为技术有限公司 带宽扩展频带信号的预测方法、解码设备
CN104981871A (zh) * 2013-02-15 2015-10-14 高通股份有限公司 个人化带宽扩展
CN105847497A (zh) * 2016-03-28 2016-08-10 乐视控股(北京)有限公司 一种语音信号处理方法及装置
CN106960672A (zh) * 2017-03-30 2017-07-18 国家计算机网络与信息安全管理中心 一种立体声音频的带宽扩展方法与装置
CN107886966A (zh) * 2017-10-30 2018-04-06 捷开通讯(深圳)有限公司 终端及其优化语音命令的方法、存储装置

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7519530B2 (en) * 2003-01-09 2009-04-14 Nokia Corporation Audio signal processing
JP4679049B2 (ja) * 2003-09-30 2011-04-27 パナソニック株式会社 スケーラブル復号化装置
US20080300866A1 (en) * 2006-05-31 2008-12-04 Motorola, Inc. Method and system for creation and use of a wideband vocoder database for bandwidth extension of voice
CN101751925B (zh) * 2008-12-10 2011-12-21 华为技术有限公司 一种语音解码方法及装置
CN101763859A (zh) * 2009-12-16 2010-06-30 深圳华为通信技术有限公司 音频数据处理方法、装置和多点控制单元
WO2012033942A2 (en) * 2010-09-10 2012-03-15 Dts, Inc. Dynamic compensation of audio signals for improved perceived spectral imbalances
CN102610231B (zh) * 2011-01-24 2013-10-09 华为技术有限公司 一种带宽扩展方法及装置
US8909539B2 (en) * 2011-12-07 2014-12-09 Gwangju Institute Of Science And Technology Method and device for extending bandwidth of speech signal
FR3007563A1 (fr) * 2013-06-25 2014-12-26 France Telecom Extension amelioree de bande de frequence dans un decodeur de signaux audiofrequences
CN103413557B (zh) * 2013-07-08 2017-03-15 深圳Tcl新技术有限公司 语音信号带宽扩展的方法和装置
US9666202B2 (en) * 2013-09-10 2017-05-30 Huawei Technologies Co., Ltd. Adaptive bandwidth extension and apparatus for the same
CN105188075B (zh) * 2014-06-17 2018-10-12 ***通信集团公司 语音质量优化方法及装置、终端
JP6451136B2 (ja) * 2014-08-05 2019-01-16 沖電気工業株式会社 音声帯域拡張装置及びプログラム、並びに、音声特徴量抽出装置及びプログラム
CN105118514A (zh) * 2015-08-17 2015-12-02 惠州Tcl移动通信有限公司 一种播放无损音质声音的方法及耳机
CN107221334B (zh) * 2016-11-01 2020-12-29 武汉大学深圳研究院 一种音频带宽扩展的方法及扩展装置
CN107087069B (zh) * 2017-04-19 2020-02-28 维沃移动通信有限公司 一种语音通话方法及移动终端

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2007059437A2 (en) * 2005-11-14 2007-05-24 Motorola Inc. Method and apparatus for improving listener differentiation of talkers during a conference call
US20110282675A1 (en) * 2009-04-09 2011-11-17 Frederik Nagel Apparatus and Method for Generating a Synthesis Audio Signal and for Encoding an Audio Signal
CN103915104A (zh) * 2012-12-31 2014-07-09 华为技术有限公司 信号带宽扩展方法和用户设备
CN103971694A (zh) * 2013-01-29 2014-08-06 华为技术有限公司 带宽扩展频带信号的预测方法、解码设备
CN104981871A (zh) * 2013-02-15 2015-10-14 高通股份有限公司 个人化带宽扩展
CN105847497A (zh) * 2016-03-28 2016-08-10 乐视控股(北京)有限公司 一种语音信号处理方法及装置
CN106960672A (zh) * 2017-03-30 2017-07-18 国家计算机网络与信息安全管理中心 一种立体声音频的带宽扩展方法与装置
CN107886966A (zh) * 2017-10-30 2018-04-06 捷开通讯(深圳)有限公司 终端及其优化语音命令的方法、存储装置

Also Published As

Publication number Publication date
CN107886966A (zh) 2018-04-06

Similar Documents

Publication Publication Date Title
JP7354110B2 (ja) オーディオ処理システム及び方法
US20190355354A1 (en) Method, apparatus and system for speech interaction
JP2016526331A (ja) Vad検出マイク及びその動作方法
CN105338459A (zh) 一种mems麦克风及其信号处理方法
US20210241768A1 (en) Portable audio device with voice capabilities
WO2013182118A1 (zh) 一种语音数据的传输方法及装置
JP2015501450A (ja) オーディオ特徴データの抽出と分析
WO2019233228A1 (zh) 电子设备及设备控制方法
WO2020057624A1 (zh) 语音识别的方法和装置
CN109545216A (zh) 一种语音识别方法和语音识别***
CN109712623A (zh) 语音控制方法、装置及计算机可读存储介质
JP2009178783A (ja) コミュニケーションロボット及びその制御方法
JP6549009B2 (ja) 通信端末及び音声認識システム
CN111276150A (zh) 一种基于麦克风阵列的智能语音转文字及同声翻译***
WO2019085914A1 (zh) 终端及其优化语音命令的方法、存储装置
JP6448950B2 (ja) 音声対話装置及び電子機器
US11908464B2 (en) Electronic device and method for controlling same
CN107357174A (zh) 一种分布式智能音箱语音控制***
CN113053371A (zh) 语音控制***和方法、语音套件、骨传导及语音处理装置
CN111404998A (zh) 语音交互方法、第一电子设备及可读存储介质
CN105828252A (zh) 一种麦克风控制权的分配方法及电子设备
CN107544769B (zh) 基于振动电机采集语音命令的方法及音频组件、音频终端
CN110310635B (zh) 语音处理电路及电子设备
CN107450499A (zh) 一种智能家居控制***
WO2020102979A1 (zh) 语音信息的处理方法、装置、存储介质及电子设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18874383

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18874383

Country of ref document: EP

Kind code of ref document: A1