WO2022105392A1 - Method and apparatus for performing speech processing in electronic device, electronic device, and chip - Google Patents

Method and apparatus for performing speech processing in electronic device, electronic device, and chip Download PDF

Info

Publication number
WO2022105392A1
WO2022105392A1 PCT/CN2021/118033 CN2021118033W WO2022105392A1 WO 2022105392 A1 WO2022105392 A1 WO 2022105392A1 CN 2021118033 W CN2021118033 W CN 2021118033W WO 2022105392 A1 WO2022105392 A1 WO 2022105392A1
Authority
WO
WIPO (PCT)
Prior art keywords
microphone
electronic device
microphones
target
processing
Prior art date
Application number
PCT/CN2021/118033
Other languages
French (fr)
Chinese (zh)
Inventor
吴义孝
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022105392A1 publication Critical patent/WO2022105392A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/08Mouthpieces; Microphones; Attachments therefor
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10KSOUND-PRODUCING DEVICES; METHODS OR DEVICES FOR PROTECTING AGAINST, OR FOR DAMPING, NOISE OR OTHER ACOUSTIC WAVES IN GENERAL; ACOUSTICS NOT OTHERWISE PROVIDED FOR
    • G10K11/00Methods or devices for transmitting, conducting or directing sound in general; Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • G10K11/16Methods or devices for protecting against, or for damping, noise or other acoustic waves in general
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04SSTEREOPHONIC SYSTEMS 
    • H04S7/00Indicating arrangements; Control arrangements, e.g. balance control

Definitions

  • the present invention relates to the technical field of human-computer voice interaction, and more particularly, to a method, an apparatus, an electronic device and a chip for performing voice processing in an electronic device.
  • microphone array technology As microphone array technology becomes more and more mature, it has become an important part of sound source localization of speech signal.
  • a certain number and size of microphones are installed on an electronic device (such as a mobile phone), and these microphones can form a microphone array.
  • installing too many microphones on an electronic device will inevitably increase the power consumption of the electronic device. How to balance the positioning effect and the power consumption is an urgent problem to be solved.
  • the environment in which electronic devices are located may change, and there may be different requirements for positioning and noise reduction. How to adapt the microphone array to changes in the surrounding environment is also an urgent problem to be solved.
  • the present application provides a method, device, electronic device and chip for performing voice processing in an electronic device. Users can select a suitable microphone mode from a variety of microphone modes, so that the positioning effect and power consumption can be weighed, and the surrounding environment can be adapted to changes. , to improve the user experience.
  • a method for performing voice processing in an electronic device including:
  • a target microphone mode is selected from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction, wherein the first microphone mode includes M microphones of the electronic device and an earphone paired and connected to the electronic device the microphone array composed of N microphones, the second microphone mode includes a microphone array composed of M microphones of the electronic device, the third microphone mode includes a microphone array composed of N microphones of the headset, M and N are positive integers ;
  • the M microphones are part or all of the microphones of the electronic device.
  • the N microphones are part or all of the microphones of the headset.
  • 2 ⁇ M ⁇ 4, 2 ⁇ N ⁇ 6 In some possible implementations, 2 ⁇ M ⁇ 4, 2 ⁇ N ⁇ 6.
  • the first user instruction is determined according to at least one of microphone power consumption, positioning effect, and noise reduction effect.
  • the positioning process includes at least voiceprint recognition.
  • the method further includes:
  • the cloud sound effect processing includes at least one of the following:
  • the method further includes:
  • Noise reduction processing is performed on the target speech signal.
  • the method further includes:
  • an apparatus for performing voice processing in an electronic device including:
  • the selection unit is used to select a target microphone mode from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction, wherein the first microphone mode includes M microphones of the electronic device and the electronic device.
  • an activation unit for activating the microphone array in the target microphone mode
  • an acquisition unit configured to acquire the voice signal of the surrounding environment of the microphone array in the target microphone mode
  • the processing unit is used for positioning and processing the voice signal to obtain the target voice signal.
  • the M microphones are part or all of the microphones of the electronic device.
  • the N microphones are part or all of the microphones of the headset.
  • 2 ⁇ M ⁇ 4, 2 ⁇ N ⁇ 6 In some possible implementations, 2 ⁇ M ⁇ 4, 2 ⁇ N ⁇ 6.
  • the first user instruction is determined according to at least one of microphone power consumption, positioning effect, and noise reduction effect.
  • the positioning process includes at least voiceprint recognition.
  • the selection unit is further configured to select whether to perform cloud sound effect processing on the target voice signal according to the second user instruction.
  • the processing unit is further configured to perform cloud sound effect processing on the target voice signal.
  • the cloud sound effect processing includes at least one of the following:
  • the processing unit is further configured to perform noise reduction processing on the target speech signal.
  • the processing unit is further configured to perform blind source separation processing on the target speech signal to determine the sound source of the target speech signal.
  • an electronic device comprising: a processor and a memory, where the memory is used for storing a computer program, the processor is used for calling and running the computer program stored in the memory, and executing the above-mentioned first aspect or each implementation thereof method in method.
  • a chip including: a processor for calling and running a computer program from a memory, so that the processor executes the method in the first aspect or each of its implementations.
  • a computer-readable storage medium for storing a computer program, and the computer program causes a computer to execute the method in the above-mentioned first aspect or each implementation manner thereof.
  • a sixth aspect provides an electronic device, characterized in that it includes:
  • an earphone paired with the electronic device comprising a second number of second microphones, wherein the second number is greater than or equal to the first number
  • the first microphone and the second microphone are configured as corresponding microphone arrays to acquire the voice signal of the surrounding environment, and perform localization processing on the voice signal to obtain the target voice signal.
  • the target microphone mode is selected from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction, and based on the microphone array in the target microphone mode, and the voice signal of the surrounding environment of the microphone array is acquired . That is, the user can select a suitable microphone mode from a variety of microphone modes, so that the positioning effect and power consumption can be weighed, and the user experience can be improved by adapting to changes in the surrounding environment.
  • FIG. 1 is a schematic flowchart of a method for performing voice processing in an electronic device according to an embodiment of the present application.
  • FIG. 2 is a flowchart of speech processing according to an embodiment of the present application.
  • FIG. 3 is a frame diagram of speech processing according to an embodiment of the present application.
  • FIG. 4 is a schematic block diagram of an apparatus for performing voice processing in an electronic device according to an embodiment of the present application.
  • FIG. 5 shows a schematic structural diagram of a computer system suitable for implementing the electronic device according to the embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a chip provided according to an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of an electronic device and an earphone provided according to an embodiment of the present application.
  • the electronic device may be a mobile phone (Mobile Phone), a tablet computer (Pad), a computer with a wireless transceiver function, a virtual reality (Virtual Reality, VR) terminal device, and an augmented reality (Augmented Reality, AR) terminal Equipment, wireless terminal equipment in industrial control, wireless terminal equipment in self driving, wireless terminal equipment in remote medical, wireless terminal equipment in smart grid , wireless terminal equipment in transportation safety, wireless terminal equipment in smart city or wireless terminal equipment in smart home, etc.
  • a mobile phone Mobile Phone
  • a tablet computer Pad
  • a computer with a wireless transceiver function a virtual reality (Virtual Reality, VR) terminal device
  • augmented reality (Augmented Reality, AR) terminal Equipment wireless terminal equipment in industrial control, wireless terminal equipment in self driving, wireless terminal equipment in remote medical, wireless terminal equipment in smart grid , wireless terminal equipment in transportation safety, wireless terminal equipment in smart city or wireless terminal equipment in smart home, etc.
  • the electronic device may also be a wearable device.
  • Wearable devices can also be called wearable smart devices, which are the general term for the intelligent design of daily wear and the development of wearable devices using wearable technology, such as glasses, gloves, watches, clothing and shoes.
  • a wearable device is a portable device that is worn directly on the body or integrated into the user's clothing or accessories. Wearable device is not only a hardware device, but also realizes powerful functions through software support, data interaction, and cloud interaction.
  • wearable smart devices include full-featured, large-scale, complete or partial functions without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, which needs to cooperate with other devices such as smart phones.
  • the headset may be paired and connected to the electronic device in a wired or wireless manner.
  • a certain number and size of microphones are installed on both the electronic device and the earphone, and these microphones can form a microphone array.
  • FIG. 1 is a schematic flowchart of a method 100 for performing voice processing in an electronic device according to an embodiment of the present application. As shown in FIG. 1 , the method 100 may include but is not limited to the following contents:
  • the first microphone mode includes M microphones of the electronic device and paired connection with the electronic device A microphone array composed of N microphones of the headset
  • the second microphone mode includes a microphone array composed of M microphones of the electronic device
  • the third microphone mode includes a microphone array composed of N microphones of the headset, M and N are positive integer;
  • FIG. 1 shows steps or operations of the method, but these steps or operations are only examples, and the embodiments of the present application may also perform other operations or variations of the respective operations in FIG. 1 .
  • the method 100 may be executed by an electronic device, for example, the method 100 may be executed by a central processing unit (central processing unit, CPU) or a microprocessor (Microprocessor) in the electronic device.
  • CPU central processing unit
  • Microprocessor Microprocessor
  • the microphone array in the first microphone mode combines M microphones of the electronic device and N microphones of the earphone, and has excellent positioning and noise reduction performance. Therefore, the positioning and noise reduction performance of the microphone array in the first microphone mode is better than that of the microphone array in the second microphone mode, and the positioning and noise reduction performance of the microphone array in the first microphone mode is better than that of the third microphone mode. Microphone array.
  • the microphone array in the second microphone mode includes fewer microphones, which greatly reduces the computing power required by the microphone array in terms of algorithms and engineering, thereby reducing the power consumption of the microphone array.
  • the microphone array in the third microphone mode also includes fewer microphones, which greatly reduces the computing power required by the microphone array in terms of algorithms and engineering, thereby reducing the power consumption of the microphone array.
  • the application of sound effects in a microphone array is more excellent than that used in a single microphone, because the voice obtained after signal processing from the microphone array is the speaker's voice after removing the ambient noise. After applying the sound effect algorithm Astable noise does not need to be considered.
  • the electronic device may acquire user instructions through a user interface (User Interface, UI), or the electronic device may present a UI interface, so that the user can input user instructions.
  • UI User Interface
  • the function of a single microphone is to convert the sound wave into a current signal as a sensor, and the microphone array can form a directional beam in the direction of the microphone, that is, the sound signal in the main lobe direction of the beam is enhanced, and the signal in the side lobe direction is enhanced. will be suppressed, and the Direction of Arrival (DOA) operation can be performed through algorithms such as delay estimation.
  • DOA Direction of Arrival
  • the earphone paired with the electronic device may be a single-ear earphone or a binaural earphone, which is not limited in the present application.
  • the positioning process in S130 includes at least voiceprint recognition. That is, at least the voiceprint recognition is performed on the voice signal to obtain the target voice signal.
  • the positioning process in S130 may also include, but is not limited to, at least one of the following:
  • AEC Acoustic echo cancellation
  • DER Dereverberation
  • VAD Voice activity detection
  • BF Beamforming
  • GSC Generalized Sidelobe Canceller
  • DOA post filtering
  • the M microphones are part or all of the microphones of the electronic device.
  • M 2.
  • M 4.
  • the N microphones are part or all of the microphones of the headset.
  • N 2.
  • N 4.
  • the embodiment of the present application does not limit the specific installation position of the microphone in the earphone.
  • 2 ⁇ M ⁇ 4, 2 ⁇ N ⁇ 6 In some embodiments, 2 ⁇ M ⁇ 4, 2 ⁇ N ⁇ 6.
  • the sizes and specifications of the microphones used in the embodiments of the present application can be kept the same, so there is no way for the main and auxiliary microphones to pick up different sound sources respectively.
  • automatic speech recognition (Automatic speech recognition, ASR) may be performed on the target speech signal in the cloud. Thereby improving the accuracy of speech recognition.
  • the cloud can perform some complex or computationally intensive processing, which can be implemented through deep learning models, Long Short Term Memory (LSTM) network models, and the like.
  • LSTM Long Short Term Memory
  • Cloud processing can be implemented based on cloud services, and cloud services can be combined with artificial intelligence (Artificial Intelligence, AI), that is, artificial intelligence cloud services, also generally referred to as AI as a Service (AIaaS).
  • AI Artificial Intelligence
  • AIaaS artificial intelligence cloud services
  • This service model is similar to opening an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through the Application Programming Interface (API) interface.
  • API Application Programming Interface
  • Some experienced developers can also use the AI framework and AI infrastructure provided by the platform to deploy and operate their own cloud AI services.
  • the first user instruction is determined according to at least one of microphone power consumption, positioning effect, and noise reduction effect. That is, the user may determine the first user instruction according to at least one of the power consumption of the microphone, the positioning effect, and the noise reduction effect.
  • the user can determine the first user instruction according to the power consumption of the microphone, and instruct the electronic device to select the second microphone mode or the third microphone mode as the target microphone mode through the first user instruction, so as to reduce the microphone power consumption.
  • Array power consumption thereby increasing the standby time of electronic devices and improving user experience.
  • the user can determine the first user instruction according to the power consumption of the microphone, and instruct the electronic device to select the first microphone mode as the target microphone mode through the first user instruction.
  • the microphone array combines the M microphones of the electronic device and the N microphones of the earphone, and has excellent positioning and noise reduction performance, thereby improving the positioning and noise reduction performance and improving the user experience.
  • the user may determine the first user instruction according to the positioning effect and/or the noise reduction effect, and instruct the electronic device to select the second microphone mode or the third microphone mode as the target microphone mode through the first user instruction. , to reduce the power consumption of the microphone array and improve the user experience.
  • the user can determine the first user instruction according to the positioning effect and/or the noise reduction effect, and instruct the electronic device to select the first microphone mode as the target microphone mode through the first user instruction.
  • the microphone array in the mode combines the M microphones of the electronic device and the N microphones of the earphone, and has excellent positioning and noise reduction performance, thereby improving the positioning and noise reduction performance and improving the user experience.
  • the method 100 further includes:
  • the target voice signal is directly output.
  • the electronic device can select whether to perform cloud sound effect processing on the target voice signal according to the second user instruction. That is, cloud sound processing can be performed based on the user's needs.
  • the cloud sound effect processing includes at least one of the following:
  • the target person may be, for example, a singer, a comedian, a hero, or the like.
  • the target group may be, for example, men, women, the elderly, children, and the like.
  • the electronic device may further perform local sound effect processing on the target speech signal, wherein the local sound effect processing is to modify the fundamental frequency and formant of the target speech signal, use a filter to convolve the room impulse response, etc., Specifically, it can include effects such as pitch shift, variable speed, room reverb, and echo.
  • the local sound effect processing may be performed synchronously with the above cloud sound effect processing, and the local sound effect processing may also be performed before the above cloud sound effect processing, which is not limited in this application.
  • the method 100 further includes:
  • Noise reduction processing is performed on the target speech signal. Thereby optimizing the noise reduction effect and improving the user experience.
  • the method 100 further includes:
  • Blind source separation (BSS) processing is performed on the target speech signal to determine the sound source of the target speech signal.
  • blind source separation processing can be performed on the target speech signal in the cloud to determine the sound source of the target speech signal.
  • FIG. 2 and FIG. 3 are for helping those skilled in the art to better understand the embodiments of the present application, but are not intended to limit the scope of the embodiments of the present application. Those skilled in the art can obviously make various equivalent modifications or changes according to the given FIGS. 2 and 3 , and such modifications or changes also fall within the scope of the embodiments of the present application.
  • FIG. 2 is a flowchart of speech processing according to an embodiment of the present application.
  • the user determines a first user instruction according to at least one of the power consumption of the microphone, the positioning effect, and the noise reduction effect.
  • the electronic device selects a target microphone mode from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction.
  • the first microphone mode includes a microphone array composed of M microphones of the electronic device and N microphones of an earphone paired with the electronic device;
  • the second microphone mode includes a microphone array composed of M microphones of the electronic device;
  • the third microphone mode A microphone array consisting of N microphones of the headset; M and N are positive integers.
  • the electronic device activates the microphone array in the target microphone mode, and acquires a voice signal of the environment around the microphone array in the target microphone mode.
  • the electronic device uses an acoustic front-end signal processing module to perform positioning processing on the acquired voice signal to obtain a target voice signal.
  • the positioning processing includes but is not limited to at least one of the following:
  • AEC Echo Cancellation
  • DER De-Reverberation
  • VAD Voice Activity Detection
  • BF Beamforming
  • GSC Generalized Sidelobe Canceller
  • DOA Direction of Arrival
  • the electronic device performs noise reduction processing on the target voice signal to obtain a voice signal after noise reduction.
  • the noise reduction process may be, for example, post-filtering (PF).
  • the electronic device performs local sound effect processing on the noise-reduced voice signal.
  • the local sound effect processing is to modify the fundamental frequency and formant of the target speech signal, and use a filter to convolve the room impulse response.
  • the electronic device selects whether to perform cloud sound effect processing on the noise-reduced voice signal according to the second user instruction;
  • the local sound effect processing (S206) may be performed synchronously with the cloud sound effect processing (S208).
  • FIG. 3 is a frame diagram of speech processing according to an embodiment of the present application. It is mainly divided into two parts: local processing and cloud processing.
  • the signal required by the local processing algorithm comes from the microphone on the electronic device and the microphone on the earphone, and the signal required by the cloud processing algorithm comes from the voice signal after the local processing.
  • the voice signal after local sound effect processing and cloud sound effect processing is played back through the headset.
  • the target microphone mode is selected from the first microphone mode, the second microphone mode, and the third microphone mode according to the first user instruction, and based on the microphone array in the target microphone mode, and the surrounding area of the microphone array is obtained Ambient speech signals. That is, the user can select a suitable microphone mode from a variety of microphone modes, so that the positioning effect and power consumption can be weighed, and the user experience can be improved by adapting to changes in the surrounding environment.
  • whether to perform cloud sound effect processing on the target voice signal may be selected based on the second user instruction, so as to improve user experience.
  • FIG. 4 shows a schematic block diagram of an apparatus 300 for performing speech processing in an electronic device according to an embodiment of the present application.
  • the apparatus 300 for performing voice processing in the electronic device includes:
  • the selection unit 310 is configured to select a target microphone mode from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction, wherein the first microphone mode includes the M microphones of the electronic device and the a microphone array composed of N microphones of an earphone paired and connected by an electronic device, the second microphone mode includes a microphone array composed of M microphones of the electronic device, and the third microphone mode includes a microphone array composed of N microphones of the earphone, M and N are positive integers;
  • an activation unit 320 for activating the microphone array in the target microphone mode
  • an acquisition unit 330 configured to acquire the voice signal of the surrounding environment of the microphone array in the target microphone mode
  • the processing unit 340 is configured to perform positioning processing on the voice signal to obtain a target voice signal.
  • the M microphones are part or all of the microphones of the electronic device.
  • the N microphones are part or all of the microphones of the headset.
  • the first user instruction is determined according to at least one of microphone power consumption, positioning effect, and noise reduction effect.
  • the positioning process includes at least voiceprint recognition.
  • the selection unit 310 is further configured to select whether to perform cloud sound effect processing on the target voice signal according to the second user instruction;
  • the processing unit 340 is further configured to perform cloud sound effect processing on the target voice signal.
  • the cloud sound effect processing includes at least one of the following:
  • the processing unit 340 is further configured to perform noise reduction processing on the target speech signal.
  • the processing unit 340 is further configured to perform blind source separation processing on the target speech signal to determine the sound source of the target speech signal.
  • the apparatus 300 for performing voice processing in the electronic device may correspond to the electronic device in the method embodiment of the present application, and the above-mentioned and other operations of each unit in the apparatus 300 for performing voice processing in the electronic device
  • the and/or functions are respectively in order to implement the corresponding flow of the electronic device in the method 100 shown in FIG. 1 , and for brevity, details are not described here.
  • FIG. 5 shows a schematic structural diagram of a computer system implementing an electronic device according to an embodiment of the present application. It should be noted that the computer system 400 of the electronic device shown in FIG. 5 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
  • the computer system 400 includes a central processing unit (Central Processing Unit, CPU) 401, which can be loaded into a random device according to a program stored in a read-only memory (Read-Only Memory, ROM) 402 or from a storage part 408 Various appropriate actions and processes are performed by accessing programs in the memory (Random Access Memory, RAM) 403 . In the RAM 403, various programs and data required for system operation are also stored.
  • the CPU 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404.
  • An Input/Output (I/O) interface 405 is also connected to the bus 404 .
  • the following components are connected to the I/O interface 405: an input section 404 including a keyboard, a mouse, etc.; an output section 407 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc. ; a storage portion 408 including a hard disk, etc.; and a communication portion 409 including a network interface card such as a local area network (Local Area Network, LAN) card, a modem, and the like.
  • the communication section 409 performs communication processing via a network such as the Internet.
  • a drive 410 is also connected to the I/O interface 405 as needed.
  • a removable medium 411 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 410 as needed so that a computer program read therefrom is installed into the storage section 408 as needed.
  • embodiments of the present application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method shown in the above flow chart.
  • the computer program may be downloaded and installed from the network via the communication portion 409 and/or installed from the removable medium 411 .
  • CPU central processing unit
  • various functions defined in the apparatus of the present application are executed.
  • FIG. 6 is a schematic structural diagram of a chip according to an embodiment of the present application.
  • the chip 500 shown in FIG. 6 includes a processor 510, and the processor 510 can call and run a computer program from a memory, so as to implement the method in this embodiment of the present application.
  • the chip 500 may further include a memory 520 .
  • the processor 510 may call and run a computer program from the memory 520 to implement the methods in the embodiments of the present application.
  • the memory 520 may be a separate device independent of the processor 510 , or may be integrated in the processor 510 .
  • the chip 500 may further include an input interface 530 .
  • the processor 510 may control the input interface 530 to communicate with other devices or chips, and specifically, may acquire information or data sent by other devices or chips.
  • the chip 500 may further include an output interface 540 .
  • the processor 510 may control the output interface 540 to communicate with other devices or chips, and specifically, may output information or data to other devices or chips.
  • the chip can be applied to the electronic device in the embodiment of the present application, and the chip can implement the corresponding processes implemented by the electronic device in each method of the embodiment of the present application, which is not repeated here for brevity.
  • the above-mentioned chip may be, for example, a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-chip.
  • an electronic device 600 comprising:
  • the headset 700 paired with the electronic device 600 includes a second number of second microphones 710, wherein the second number is greater than or equal to the first number;
  • the first microphone 610 and the second microphone 710 are configured as corresponding microphone arrays to obtain the voice signal of the surrounding environment, and perform localization processing on the voice signal to obtain the target voice signal. Specifically, it can be shown in FIG. 7 .
  • an electronic device including: a processor and a memory, where the memory is used for storing a computer program, the processor is used for calling and running the computer program stored in the memory, and executing the above method embodiments A step of.
  • a computer-readable storage medium which stores a computer program, and when the computer program is executed by a processor, implements the steps in the foregoing method embodiments.
  • the processor in this embodiment of the present application may be an integrated circuit chip, which has a signal processing capability.
  • each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software.
  • the above-mentioned processor can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other available Programming logic devices, discrete gate or transistor logic devices, discrete hardware components.
  • DSP Digital Signal Processor
  • ASIC Application Specific Integrated Circuit
  • FPGA Field Programmable Gate Array
  • a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • the steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor.
  • the software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art.
  • the storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
  • the memory in this embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM). Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be Random Access Memory (RAM), which acts as an external cache.
  • RAM Static RAM
  • DRAM Dynamic RAM
  • SDRAM Synchronous DRAM
  • SDRAM double data rate synchronous dynamic random access memory
  • Double Data Rate SDRAM DDR SDRAM
  • enhanced SDRAM ESDRAM
  • synchronous link dynamic random access memory Synchlink DRAM, SLDRAM
  • Direct Rambus RAM Direct Rambus RAM
  • the memory in the embodiment of the present application may also be a static random access memory (static RAM, SRAM), a dynamic random access memory (dynamic RAM, DRAM), Synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection Dynamic random access memory (synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM) and so on. That is, the memory in the embodiments of the present application is intended to include but not limited to these and any other suitable types of memory.
  • the disclosed system, apparatus and method may be implemented in other manners.
  • the apparatus embodiments described above are only illustrative.
  • the division of the units is only a logical function division. In actual implementation, there may be other division methods.
  • multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium.
  • the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution.
  • the computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .

Landscapes

  • Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Otolaryngology (AREA)
  • Telephone Function (AREA)

Abstract

The present application provides a method and apparatus for performing speech processing in an electronic device, an electronic device, and a chip, capable of balancing positioning effects and power consumption and adapting to changes of surrounding environments, thereby improving user experience. The method for performing speech processing in an electronic device comprises: selecting a target microphone mode from a first microphone mode, a second microphone mode, and a third microphone mode according to a first user instruction, wherein the first microphone mode comprises a microphone array consisting of M microphones of an electronic device and N microphones of an earphone which is in paired connection with the electronic device, the second microphone mode comprises a microphone array consisting of the M microphones of the electronic device, and the third microphone mode comprises a microphone array consisting of the N microphones of the earphone, M and N being positive integers; activating the microphone array in the target microphone mode, and acquiring a speech signal of a surrounding environment; and performing positioning processing on the speech signal to obtain a target speech signal.

Description

电子设备中执行语音处理的方法、装置、电子设备及芯片Method, device, electronic device and chip for performing voice processing in electronic equipment
本申请要求于2020年11月17日提交中国专利局、申请号为2020112881852、发明名称为“电子设备中执行语音处理的方法、装置、电子设备及芯片”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of the Chinese patent application filed on November 17, 2020 with the application number 2020112881852 and the invention titled "Method, Apparatus, Electronic Equipment and Chip for Executing Speech Processing in Electronic Equipment", all of which The contents are incorporated herein by reference.
技术领域technical field
本发明涉及人机语音交互技术领域,并且更具体地,涉及一种电子设备中执行语音处理的方法、装置、电子设备及芯片。The present invention relates to the technical field of human-computer voice interaction, and more particularly, to a method, an apparatus, an electronic device and a chip for performing voice processing in an electronic device.
背景技术Background technique
随着麦克风阵列技术愈趋成熟,其已成为语音信号声源定位的重要部分。电子设备(如手机)上安装有一定数量和尺寸规格的麦克风,这些麦克风即可组成麦克风阵列。然而,电子设备上安装过多的麦克风势必会增大电子设备的功耗,如何权衡定位效果与功耗,是一个亟待解决的问题。此外,电子设备所处的环境可能会发生改变,对定位和降噪效果可能会有不同的需求,麦克风阵列如何适应周围环境变换,也是一个亟待解决的问题。As microphone array technology becomes more and more mature, it has become an important part of sound source localization of speech signal. A certain number and size of microphones are installed on an electronic device (such as a mobile phone), and these microphones can form a microphone array. However, installing too many microphones on an electronic device will inevitably increase the power consumption of the electronic device. How to balance the positioning effect and the power consumption is an urgent problem to be solved. In addition, the environment in which electronic devices are located may change, and there may be different requirements for positioning and noise reduction. How to adapt the microphone array to changes in the surrounding environment is also an urgent problem to be solved.
发明内容SUMMARY OF THE INVENTION
本申请提供了一种电子设备中执行语音处理的方法、装置、电子设备及芯片,用户能够从多种麦克风模式中选择适合的麦克风模式,从而可以权衡定位效果与功耗,以及适应周围环境变换,提升用户体验。The present application provides a method, device, electronic device and chip for performing voice processing in an electronic device. Users can select a suitable microphone mode from a variety of microphone modes, so that the positioning effect and power consumption can be weighed, and the surrounding environment can be adapted to changes. , to improve the user experience.
第一方面,提供了一种电子设备中执行语音处理的方法,包括:In a first aspect, a method for performing voice processing in an electronic device is provided, including:
根据第一用户指令从第一麦克风模式、第二麦克风模式和第三麦克风模式中选择目标麦克风模式,其中,该第一麦克风模式包括该电子设备的M个麦克风和与该电子设备配对连接的耳机的N个麦克风组成的麦克风阵列,该第二麦克风模式包括该电子设备的M个麦克风组成的麦克风阵列,该第三麦克风模式包括该耳机的N个麦克风组成的麦克风阵列,M和N为正整数;A target microphone mode is selected from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction, wherein the first microphone mode includes M microphones of the electronic device and an earphone paired and connected to the electronic device the microphone array composed of N microphones, the second microphone mode includes a microphone array composed of M microphones of the electronic device, the third microphone mode includes a microphone array composed of N microphones of the headset, M and N are positive integers ;
激活该目标麦克风模式中的麦克风阵列,以及获取周围环境的语音信号;以及activating the microphone array in the target microphone pattern, and acquiring ambient speech signals; and
对该语音信号进行定位处理,以得到目标语音信号。Perform localization processing on the voice signal to obtain the target voice signal.
在一些可能的实现方式中,该M个麦克风为该电子设备的部分或者全部麦克风。In some possible implementations, the M microphones are part or all of the microphones of the electronic device.
在一些实施例中,该N个麦克风为该耳机的部分或者全部麦克风。In some embodiments, the N microphones are part or all of the microphones of the headset.
在一些可能的实现方式中,2≤M≤4,2≤N≤6。In some possible implementations, 2≤M≤4, 2≤N≤6.
在一些可能的实现方式中,该第一用户指令是根据麦克风功耗、定位效果、降噪效果中的至少一种确定的。In some possible implementations, the first user instruction is determined according to at least one of microphone power consumption, positioning effect, and noise reduction effect.
在一些可能的实现方式中,该定位处理至少包括声纹识别。In some possible implementations, the positioning process includes at least voiceprint recognition.
在一些可能的实现方式中,该方法还包括:In some possible implementations, the method further includes:
根据第二用户指令选择是否对该目标语音信号进行云端声效处理;以及Selecting whether to perform cloud sound effect processing on the target voice signal according to the second user instruction; and
若是,则对该目标语音信号进行云端声效处理。If so, perform cloud sound effect processing on the target voice signal.
在一些可能的实现方式中,该云端声效处理包括以下中的至少一种:In some possible implementations, the cloud sound effect processing includes at least one of the following:
变调、变速、房间混响、回声、转换为目标人物的声音,转换为目标人群的声音。Pitch shift, variable speed, room reverb, echo, convert to the voice of the target person, convert to the voice of the target group.
在一些可能的实现方式中,该方法还包括:In some possible implementations, the method further includes:
对该目标语音信号进行降噪处理。Noise reduction processing is performed on the target speech signal.
在一些可能的实现方式中,该方法还包括:In some possible implementations, the method further includes:
对该目标语音信号进行盲源分离处理,以确定该目标语音信号的声源。Perform blind source separation processing on the target speech signal to determine the sound source of the target speech signal.
第二方面,提供了一种电子设备中执行语音处理的装置,包括:In a second aspect, an apparatus for performing voice processing in an electronic device is provided, including:
选择单元,用于根据第一用户指令从第一麦克风模式、第二麦克风模式和第三麦克风模式中选择目标麦克风模式,其中,该第一麦克风模式包括该电子设备的M个麦克风和与该电子设备配对连接的耳机的N个麦克风组成的麦克风阵列,该第二麦克风模式包括该电子设备的M个麦克风组成的麦克风阵列,该第三麦克风模式包括该耳机的N个麦克风组成的麦克风阵列,M和N为正整数;The selection unit is used to select a target microphone mode from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction, wherein the first microphone mode includes M microphones of the electronic device and the electronic device. A microphone array composed of N microphones of a paired and connected headset, the second microphone mode includes a microphone array composed of M microphones of the electronic device, the third microphone mode includes a microphone array composed of N microphones of the headset, M and N is a positive integer;
激活单元,用于激活该目标麦克风模式中的麦克风阵列;an activation unit for activating the microphone array in the target microphone mode;
获取单元,用于获取该目标麦克风模式中的麦克风阵列周围环境的语音信号;以及an acquisition unit, configured to acquire the voice signal of the surrounding environment of the microphone array in the target microphone mode; and
处理单元,用于对该语音信号进行定位处理,以得到目标语音信号。The processing unit is used for positioning and processing the voice signal to obtain the target voice signal.
在一些可能的实现方式中,该M个麦克风为该电子设备的部分或者全部麦克风。In some possible implementations, the M microphones are part or all of the microphones of the electronic device.
在一些实施例中,该N个麦克风为该耳机的部分或者全部麦克风。In some embodiments, the N microphones are part or all of the microphones of the headset.
在一些可能的实现方式中,2≤M≤4,2≤N≤6。In some possible implementations, 2≤M≤4, 2≤N≤6.
在一些可能的实现方式中,该第一用户指令是根据麦克风功耗、定位效果、降噪效果中的至少一种确定的。In some possible implementations, the first user instruction is determined according to at least one of microphone power consumption, positioning effect, and noise reduction effect.
在一些可能的实现方式中,该定位处理至少包括声纹识别。In some possible implementations, the positioning process includes at least voiceprint recognition.
在一些可能的实现方式中,该选择单元还用于根据第二用户指令选择是否对该目标语音信号进行云端声效处理;以及In some possible implementations, the selection unit is further configured to select whether to perform cloud sound effect processing on the target voice signal according to the second user instruction; and
若是,则该处理单元还用于对该目标语音信号进行云端声效处理。If so, the processing unit is further configured to perform cloud sound effect processing on the target voice signal.
在一些可能的实现方式中,该云端声效处理包括以下中的至少一种:In some possible implementations, the cloud sound effect processing includes at least one of the following:
变调、变速、房间混响、回声、转换为目标人物的声音,转换为目标人群的声音。Pitch shift, variable speed, room reverb, echo, convert to the voice of the target person, convert to the voice of the target group.
在一些可能的实现方式中,该处理单元还用于对该目标语音信号进行降噪处理。In some possible implementations, the processing unit is further configured to perform noise reduction processing on the target speech signal.
在一些可能的实现方式中,该处理单元还用于对该目标语音信号进行盲源分离处理,以确定该目标语音信号的声源。In some possible implementations, the processing unit is further configured to perform blind source separation processing on the target speech signal to determine the sound source of the target speech signal.
第三方面,提供了一种电子设备,包括:处理器和存储器,该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,执行上述第一方面或其各实现方式中的方法。In a third aspect, an electronic device is provided, comprising: a processor and a memory, where the memory is used for storing a computer program, the processor is used for calling and running the computer program stored in the memory, and executing the above-mentioned first aspect or each implementation thereof method in method.
第四方面,提供了一种芯片,包括:处理器,用于从存储器中调用并运行计算机程序,使得该处理器执行上述第一方面或其各实现方式中的方法。In a fourth aspect, a chip is provided, including: a processor for calling and running a computer program from a memory, so that the processor executes the method in the first aspect or each of its implementations.
第五方面,提供了一种计算机可读存储介质,用于存储计算机程序,该计算机程序使得计算机执行上述第一方面或其各实现方式中的方法。In a fifth aspect, a computer-readable storage medium is provided for storing a computer program, and the computer program causes a computer to execute the method in the above-mentioned first aspect or each implementation manner thereof.
第六方面,提供了一种电子设备,其特征在于,包括:A sixth aspect provides an electronic device, characterized in that it includes:
第一数量的第一麦克风;以及a first number of first microphones; and
与该电子设备配对连接的耳机,其包括第二数量的第二麦克风,其中该第二数量大于等于该第一数量;an earphone paired with the electronic device, comprising a second number of second microphones, wherein the second number is greater than or equal to the first number;
其中,根据第一用户指令,该第一麦克风和该第二麦克风配置为相应的麦克风阵列来获取周围环境的语音信号,并且对该语音信号进行定位处理,以得到目标语音信号。Wherein, according to the first user instruction, the first microphone and the second microphone are configured as corresponding microphone arrays to acquire the voice signal of the surrounding environment, and perform localization processing on the voice signal to obtain the target voice signal.
通过上述技术方案,根据第一用户指令从第一麦克风模式、第二麦克风模式和第三麦克风模式中选择目标麦克风模式,以及基于目标麦克风模式中的麦克风阵列,以及获取麦克风阵列周围环境的语音信号。也即,用户能够从多种麦克风模式中选择适合的麦克风模式,从而可以权衡定位效果与功耗,以及适应周围环境变换,提升用户体验。Through the above technical solution, the target microphone mode is selected from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction, and based on the microphone array in the target microphone mode, and the voice signal of the surrounding environment of the microphone array is acquired . That is, the user can select a suitable microphone mode from a variety of microphone modes, so that the positioning effect and power consumption can be weighed, and the user experience can be improved by adapting to changes in the surrounding environment.
附图说明Description of drawings
图1是根据本申请实施例的一种电子设备中执行语音处理的方法的示意性流程图。FIG. 1 is a schematic flowchart of a method for performing voice processing in an electronic device according to an embodiment of the present application.
图2是根据本申请实施例的语音处理的流程图。FIG. 2 is a flowchart of speech processing according to an embodiment of the present application.
图3是根据本申请实施例的语音处理的框架图。FIG. 3 is a frame diagram of speech processing according to an embodiment of the present application.
图4是根据本申请实施例的一种电子设备中执行语音处理的装置的示意性框图。FIG. 4 is a schematic block diagram of an apparatus for performing voice processing in an electronic device according to an embodiment of the present application.
图5示出了适于用来实现本申请实施例的电子设备的计算机***的结构示意图。FIG. 5 shows a schematic structural diagram of a computer system suitable for implementing the electronic device according to the embodiment of the present application.
图6是根据本申请实施例提供的一种芯片的示意性框图。FIG. 6 is a schematic block diagram of a chip provided according to an embodiment of the present application.
图7是根据本申请实施例提供的一种电子设备与耳机的示意性框图。FIG. 7 is a schematic block diagram of an electronic device and an earphone provided according to an embodiment of the present application.
具体实施方式Detailed ways
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。针对本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。The technical solutions in the embodiments of the present application will be described below with reference to the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are part of the embodiments of the present application, not all of the embodiments. With regard to the embodiments in the present application, all other embodiments obtained by persons of ordinary skill in the art without creative efforts shall fall within the protection scope of the present application.
在本申请实施例中,电子设备可以是手机(Mobile Phone)、平板电脑(Pad)、带无线收发功能的电脑、虚拟现实(Virtual Reality,VR)终端设备、增强现实(Augmented Reality,AR)终端设备、工业控制(industrial control)中的无线终端设备、无人驾驶(self driving)中的无线终端设备、远程医疗(remote medical)中的无线终端设备、智能电网(smart grid)中的无线终端设备、运输安全(transportation safety)中的无线终端设备、智慧城市(smart city)中的无线终端设备或智慧家庭(smart home)中的无线终端设备等。In the embodiment of the present application, the electronic device may be a mobile phone (Mobile Phone), a tablet computer (Pad), a computer with a wireless transceiver function, a virtual reality (Virtual Reality, VR) terminal device, and an augmented reality (Augmented Reality, AR) terminal Equipment, wireless terminal equipment in industrial control, wireless terminal equipment in self driving, wireless terminal equipment in remote medical, wireless terminal equipment in smart grid , wireless terminal equipment in transportation safety, wireless terminal equipment in smart city or wireless terminal equipment in smart home, etc.
作为示例而非限定,在本申请实施例中,电子设备还可以是可穿戴设备。可穿戴设备也可以称为穿戴式智能设备,是应用穿戴式技术对日常穿戴进行智能化设计、开发出可以穿戴的设备的总称,如眼镜、手套、手表、服饰及鞋等。可穿戴设备即直接穿在身上,或是整合到用户的衣服或配件的一种便携式设备。可穿戴设备不仅仅是一种硬件设备,更是通过软件支持以及数据交互、云端交互来实现强大的功能。广义穿戴式智能设备包括功能全、尺寸大、可不依赖智能手机实现完整或者部分的功能,例如:智能手表或智能眼镜等,以及只专注于某一类应用功能,需要和其它设备如智能手机配合使用,如各类进行体征监测的智能手环、智能首饰等。As an example and not a limitation, in this embodiment of the present application, the electronic device may also be a wearable device. Wearable devices can also be called wearable smart devices, which are the general term for the intelligent design of daily wear and the development of wearable devices using wearable technology, such as glasses, gloves, watches, clothing and shoes. A wearable device is a portable device that is worn directly on the body or integrated into the user's clothing or accessories. Wearable device is not only a hardware device, but also realizes powerful functions through software support, data interaction, and cloud interaction. In a broad sense, wearable smart devices include full-featured, large-scale, complete or partial functions without relying on smart phones, such as smart watches or smart glasses, and only focus on a certain type of application function, which needs to cooperate with other devices such as smart phones. Use, such as all kinds of smart bracelets, smart jewelry, etc. for physical sign monitoring.
在本申请实施例中,耳机可以通过有线或者无线方式与电子设备配对连接。In this embodiment of the present application, the headset may be paired and connected to the electronic device in a wired or wireless manner.
在本申请实施例中,电子设备和耳机都上安装有一定数量和尺寸规格的麦克风,这些麦克风可以组成麦克风阵列。In the embodiment of the present application, a certain number and size of microphones are installed on both the electronic device and the earphone, and these microphones can form a microphone array.
以下,结合图1至图3,详细介绍本申请实施例的电子设备中执行语音处理的方法。Hereinafter, with reference to FIG. 1 to FIG. 3 , the method for performing speech processing in the electronic device according to the embodiment of the present application will be described in detail.
图1是根据本申请实施例的电子设备中执行语音处理的方法100的示意性流程图,如图1所示,该方法100可以包括但不限于如下内容:FIG. 1 is a schematic flowchart of a method 100 for performing voice processing in an electronic device according to an embodiment of the present application. As shown in FIG. 1 , the method 100 may include but is not limited to the following contents:
S110,根据第一用户指令从第一麦克风模式、第二麦克风模式和第三麦克风模式中选择目标麦克风模式,其中,该第一麦克风模式包括该电子设备的M个麦克风和与该电子设备配对连接的耳机的N个麦克风组成的麦克风阵列,该第二麦克风模式包括该电子设备的M个麦克风组成的麦克风阵列,该第三麦克风模式包括该耳机的N个麦克风组成的麦克风阵列,M和N为正整数;S110, select a target microphone mode from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction, wherein the first microphone mode includes M microphones of the electronic device and paired connection with the electronic device A microphone array composed of N microphones of the headset, the second microphone mode includes a microphone array composed of M microphones of the electronic device, and the third microphone mode includes a microphone array composed of N microphones of the headset, M and N are positive integer;
S120,激活该目标麦克风模式中的麦克风阵列,以及获取周围环境的语音信号;以及S120, activate the microphone array in the target microphone mode, and acquire the voice signal of the surrounding environment; and
S130,对该语音信号进行定位处理,以得到目标语音信号。S130, performing positioning processing on the voice signal to obtain a target voice signal.
应理解,图1示出了该方法的步骤或操作,但这些步骤或操作仅是示例,本申请实施例还可以执行其他操作或者图1中的各个操作的变形。该方法100可以由电子设备执行,具体例如,该方法100由电子设备中的中央处理器(central processing unit,CPU)或者微处理器(Microprocessor)执行。It should be understood that FIG. 1 shows steps or operations of the method, but these steps or operations are only examples, and the embodiments of the present application may also perform other operations or variations of the respective operations in FIG. 1 . The method 100 may be executed by an electronic device, for example, the method 100 may be executed by a central processing unit (central processing unit, CPU) or a microprocessor (Microprocessor) in the electronic device.
在本申请实施例中,第一麦克风模式中的麦克风阵列结合了电子设备的M个麦克风和耳机的N个麦克风,定位和降噪性能优异。因此,第一麦克风模式中的麦克风阵列的定位和降噪性能优于第二麦克风模式中的麦克风阵列,以及第一麦克风模式中的麦克风阵列的定位和降噪性能优于第三麦克风模式中的麦克风阵列。In the embodiment of the present application, the microphone array in the first microphone mode combines M microphones of the electronic device and N microphones of the earphone, and has excellent positioning and noise reduction performance. Therefore, the positioning and noise reduction performance of the microphone array in the first microphone mode is better than that of the microphone array in the second microphone mode, and the positioning and noise reduction performance of the microphone array in the first microphone mode is better than that of the third microphone mode. Microphone array.
在本申请实施例中,第二麦克风模式中的麦克风阵列所包括的麦克风较少,从算法和工程上极大减少了麦克风阵列所需算力,从而可以降低麦克风阵列的功耗。第三麦克 风模式中的麦克风阵列所包括的麦克风也较少,从算法和工程上极大减少了麦克风阵列所需算力,从而可以降低麦克风阵列的功耗。In the embodiment of the present application, the microphone array in the second microphone mode includes fewer microphones, which greatly reduces the computing power required by the microphone array in terms of algorithms and engineering, thereby reducing the power consumption of the microphone array. The microphone array in the third microphone mode also includes fewer microphones, which greatly reduces the computing power required by the microphone array in terms of algorithms and engineering, thereby reducing the power consumption of the microphone array.
在本申请实施例中,声音效果在麦克风阵列中的应用,比在单麦克风上使用更加优异,因为从麦克风阵列信号处理之后获取的语音是去除了环境噪音后的说话人语音,应用声效算法后不需要考虑非稳态噪声。In the embodiment of the present application, the application of sound effects in a microphone array is more excellent than that used in a single microphone, because the voice obtained after signal processing from the microphone array is the speaker's voice after removing the ambient noise. After applying the sound effect algorithm Astable noise does not need to be considered.
在本申请实施例中,电子设备可以通过用户界面(User Interface,UI)获取用户指令,或者,电子设备可以呈现UI界面,以便用户输入用户指令。In this embodiment of the present application, the electronic device may acquire user instructions through a user interface (User Interface, UI), or the electronic device may present a UI interface, so that the user can input user instructions.
需要说明的是,单个麦克风的作用是作为传感器将声波转换为电流信号,麦克风阵列能够形成麦克风方向上的指向性波束,即波束的主瓣方向上的声音信号得到增强,旁瓣方向上的信号会被抑制,同时通过时延估计等算法可以进行到达方向估计(Direction of Arrival,DOA)操作。It should be noted that the function of a single microphone is to convert the sound wave into a current signal as a sensor, and the microphone array can form a directional beam in the direction of the microphone, that is, the sound signal in the main lobe direction of the beam is enhanced, and the signal in the side lobe direction is enhanced. will be suppressed, and the Direction of Arrival (DOA) operation can be performed through algorithms such as delay estimation.
在本申请实施例中,与电子设备配对连接的耳机可以是单耳式耳机,也可以是双耳式耳机,本申请对此并不限定。In the embodiment of the present application, the earphone paired with the electronic device may be a single-ear earphone or a binaural earphone, which is not limited in the present application.
在一些实施例中,S130中的定位处理至少包括声纹识别。也即,至少对该语音信号进行声纹识别,得到目标语音信号。In some embodiments, the positioning process in S130 includes at least voiceprint recognition. That is, at least the voiceprint recognition is performed on the voice signal to obtain the target voice signal.
可选地,S130中的定位处理还可以包括但不限于以下中的至少一种:Optionally, the positioning process in S130 may also include, but is not limited to, at least one of the following:
回声消除(Acoustic echo cancellation,AEC)、去混响(Dereverberation,DER)、语音活动检测(Voice activity detection,VAD)、波束形成(Beamforming,BF)、广义旁瓣抵消器(Generalized Sidelobe Canceller,GSC),DOA,后置滤波(Post filtering,PF)。Acoustic echo cancellation (AEC), Dereverberation (DER), Voice activity detection (VAD), Beamforming (BF), Generalized Sidelobe Canceller (GSC) , DOA, post filtering (Post filtering, PF).
在一些实施例中,该M个麦克风为该电子设备的部分或者全部麦克风。In some embodiments, the M microphones are part or all of the microphones of the electronic device.
假设电子设备上安装有2个麦克风,此种情况下,例如,M=1,或者,M=2。优选地,M=2。It is assumed that two microphones are installed on the electronic device, in this case, for example, M=1, or M=2. Preferably, M=2.
假设电子设备上安装有4个麦克风,此种情况下,例如,M=1,或者,M=2,或者,M=3,或者,M=4。优选地,M=4。It is assumed that four microphones are installed on the electronic device, in this case, for example, M=1, or M=2, or M=3, or M=4. Preferably, M=4.
需要说明的是,本申请实施例对电子设备中麦克风的具体安装位置不作限定。It should be noted that the embodiment of the present application does not limit the specific installation position of the microphone in the electronic device.
在一些实施例中,该N个麦克风为该耳机的部分或者全部麦克风。In some embodiments, the N microphones are part or all of the microphones of the headset.
假设耳机上安装有2个麦克风,此种情况下,例如,N=1,或者,N=2。优选地,N=2。It is assumed that two microphones are installed on the earphone, in this case, for example, N=1, or N=2. Preferably, N=2.
假设耳机上安装有4个麦克风,此种情况下,例如,N=1,或者,N=2,或者,N=3,或者,N=4。优选地,N=4。It is assumed that four microphones are installed on the earphone. In this case, for example, N=1, or N=2, or N=3, or N=4. Preferably, N=4.
假设耳机上安装有6个麦克风,此种情况下,例如,N=1,或者,N=2,或者,N=3,或者,N=4,或者,N=5,或者,N=6。优选地,N=6。It is assumed that 6 microphones are installed on the earphone, in this case, for example, N=1, or N=2, or N=3, or N=4, or N=5, or N=6. Preferably, N=6.
需要说明的是,本申请实施例对耳机中麦克风的具体安装位置不作限定。It should be noted that the embodiment of the present application does not limit the specific installation position of the microphone in the earphone.
在一些实施例中,2≤M≤4,2≤N≤6。In some embodiments, 2≤M≤4, 2≤N≤6.
可选地,本申请实施例中采用的麦克风的尺寸和规格可以保持一致,所以不存在主副麦克风分别拾取不同声源的方式。Optionally, the sizes and specifications of the microphones used in the embodiments of the present application can be kept the same, so there is no way for the main and auxiliary microphones to pick up different sound sources respectively.
在一些实施例中,在得到目标语音信号之后,可以在云端对该目标语音信号进行自动语音识别(Automatic speech recognition,ASR)。从而提高语音识别的准确性。In some embodiments, after the target speech signal is obtained, automatic speech recognition (Automatic speech recognition, ASR) may be performed on the target speech signal in the cloud. Thereby improving the accuracy of speech recognition.
需要说明的是,云端可以执行一些较为复杂或者运算量较大的处理,具体可以通过诸如深度学习模型、长短记忆(Long Short Term Memory,LSTM)网络模型等实现。It should be noted that the cloud can perform some complex or computationally intensive processing, which can be implemented through deep learning models, Long Short Term Memory (LSTM) network models, and the like.
云端处理可以基于云服务实现,云服务可以与人工智能(Artificial Intelligence,AI)结合,即人工智能云服务,一般也被称作是AI即服务(AI as a Service,AIaaS)。这是目前主流的一种人工智能平台的服务方式,具体来说AIaaS平台会把几类常见的AI服务进行拆分,并在云端提供独立或者打包的服务。这种服务模式类似于开了一个AI主题商城:所有的开发者都可以通过应用程序接口(Application Programming Interface,API) 接口的方式来接入使用平台提供的一种或者是多种人工智能服务,部分资深的开发者还可以使用平台提供的AI框架和AI基础设施来部署和运维自已专属的云人工智能服务。Cloud processing can be implemented based on cloud services, and cloud services can be combined with artificial intelligence (Artificial Intelligence, AI), that is, artificial intelligence cloud services, also generally referred to as AI as a Service (AIaaS). This is the current mainstream service method of artificial intelligence platforms. Specifically, the AIaaS platform will split several types of common AI services and provide independent or packaged services in the cloud. This service model is similar to opening an AI theme mall: all developers can access one or more artificial intelligence services provided by the platform through the Application Programming Interface (API) interface. Some experienced developers can also use the AI framework and AI infrastructure provided by the platform to deploy and operate their own cloud AI services.
可选地,在本申请实施例中,该第一用户指令是根据麦克风功耗、定位效果、降噪效果中的至少一种确定的。也即,用户可以根据麦克风功耗、定位效果、降噪效果中的至少一种确定该第一用户指令。Optionally, in this embodiment of the present application, the first user instruction is determined according to at least one of microphone power consumption, positioning effect, and noise reduction effect. That is, the user may determine the first user instruction according to at least one of the power consumption of the microphone, the positioning effect, and the noise reduction effect.
例如,在电子设备的待电量不足时,用户可以根据麦克风功耗确定第一用户指令,以及通过第一用户指令指示电子设备选择第二麦克风模式或者第三麦克风模式作为目标麦克风模式,以降低麦克风阵列功耗,从而增加电子设备的待机时间,提升用户体验。For example, when the standby power of the electronic device is insufficient, the user can determine the first user instruction according to the power consumption of the microphone, and instruct the electronic device to select the second microphone mode or the third microphone mode as the target microphone mode through the first user instruction, so as to reduce the microphone power consumption. Array power consumption, thereby increasing the standby time of electronic devices and improving user experience.
又例如,在电子设备的待电量充足时,用户可以根据麦克风功耗确定第一用户指令,以及通过第一用户指令指示电子设备选择第一麦克风模式作为目标麦克风模式,由于第一麦克风模式中的麦克风阵列结合了电子设备的M个麦克风和耳机的N个麦克风,定位和降噪性能优异,从而提升了定位和降噪性能,提升用户体验。For another example, when the standby power of the electronic device is sufficient, the user can determine the first user instruction according to the power consumption of the microphone, and instruct the electronic device to select the first microphone mode as the target microphone mode through the first user instruction. The microphone array combines the M microphones of the electronic device and the N microphones of the earphone, and has excellent positioning and noise reduction performance, thereby improving the positioning and noise reduction performance and improving the user experience.
再例如,在较为安静的环境中,用户可以根据定位效果和/或降噪效果确定第一用户指令,以及通过第一用户指令指示电子设备选择第二麦克风模式或者第三麦克风模式作为目标麦克风模式,以降低麦克风阵列功耗,提升用户体验。For another example, in a relatively quiet environment, the user may determine the first user instruction according to the positioning effect and/or the noise reduction effect, and instruct the electronic device to select the second microphone mode or the third microphone mode as the target microphone mode through the first user instruction. , to reduce the power consumption of the microphone array and improve the user experience.
再例如,在较为嘈杂的环境中,用户可以根据定位效果和/或降噪效果确定第一用户指令,以及通过第一用户指令指示电子设备选择第一麦克风模式作为目标麦克风模式,由于第一麦克风模式中的麦克风阵列结合了电子设备的M个麦克风和耳机的N个麦克风,定位和降噪性能优异,从而提升了定位和降噪性能,提升用户体验。For another example, in a relatively noisy environment, the user can determine the first user instruction according to the positioning effect and/or the noise reduction effect, and instruct the electronic device to select the first microphone mode as the target microphone mode through the first user instruction. The microphone array in the mode combines the M microphones of the electronic device and the N microphones of the earphone, and has excellent positioning and noise reduction performance, thereby improving the positioning and noise reduction performance and improving the user experience.
可选地,在一些实施例中,该方法100还包括:Optionally, in some embodiments, the method 100 further includes:
根据第二用户指令选择是否对该目标语音信号进行云端声效处理;以及Selecting whether to perform cloud sound effect processing on the target voice signal according to the second user instruction; and
若是,则对该目标语音信号进行云端声效处理。If so, perform cloud sound effect processing on the target voice signal.
可选地,若根据该第二用户指令选择不对该目标语音信号进行云端声效处理,则直接输出该目标语音信号。Optionally, if it is selected not to perform cloud sound effect processing on the target voice signal according to the second user instruction, the target voice signal is directly output.
也即,电子设备可以根据第二用户指令选择是否对目标语音信号进行云端声效处理。也就是说,云端声效处理可以基于用户的需求执行。That is, the electronic device can select whether to perform cloud sound effect processing on the target voice signal according to the second user instruction. That is, cloud sound processing can be performed based on the user's needs.
可选地,该云端声效处理包括以下中的至少一种:Optionally, the cloud sound effect processing includes at least one of the following:
变调、变速、房间混响、回声、转换为目标人物的声音,转换为目标人群的声音。Pitch shift, variable speed, room reverb, echo, convert to the voice of the target person, convert to the voice of the target group.
目标人物例如可以是某一歌手、某一喜剧演员、某一英雄人物等。The target person may be, for example, a singer, a comedian, a hero, or the like.
目标人群例如可以是男人、女人、老人、小孩等。The target group may be, for example, men, women, the elderly, children, and the like.
在一些实施例中,电子设备还可以对该目标语音信号进行本地声效处理,其中,本地声效处理是对目标语音信号的基频、共振峰进行改动、使用滤波器卷积房间冲激响应等,具体可以包含变调、变速、房间混响、回声等效果。In some embodiments, the electronic device may further perform local sound effect processing on the target speech signal, wherein the local sound effect processing is to modify the fundamental frequency and formant of the target speech signal, use a filter to convolve the room impulse response, etc., Specifically, it can include effects such as pitch shift, variable speed, room reverb, and echo.
需要说明的是,本地声效处理可以与上述云端声效处理同步进行,本地声效处理也可以在上述云端声效处理之前进行,本申请对此并不限定。It should be noted that the local sound effect processing may be performed synchronously with the above cloud sound effect processing, and the local sound effect processing may also be performed before the above cloud sound effect processing, which is not limited in this application.
可选地,在一些实施例中,该方法100还包括:Optionally, in some embodiments, the method 100 further includes:
对该目标语音信号进行降噪处理。从而优化降噪效果,提升用户体验。Noise reduction processing is performed on the target speech signal. Thereby optimizing the noise reduction effect and improving the user experience.
可选地,在一些实施例中,该方法100还包括:Optionally, in some embodiments, the method 100 further includes:
对该目标语音信号进行盲源分离(Blind source separation,BSS)处理,以确定该目标语音信号的声源。Blind source separation (BSS) processing is performed on the target speech signal to determine the sound source of the target speech signal.
进一步地,可以在云端对该目标语音信号进行盲源分离处理,以确定该目标语音信号的声源。Further, blind source separation processing can be performed on the target speech signal in the cloud to determine the sound source of the target speech signal.
以下,结合图2和图3,详细说明本申请实施例的语音处理的流程。Hereinafter, with reference to FIG. 2 and FIG. 3 , the flow of speech processing in the embodiment of the present application will be described in detail.
应理解,图2和图3所示的例子是为了帮助本领域技术人员更好地理解本申请实施例,而非要限制本申请实施例的范围。本领域技术人员根据所给出的图2和图3,显然可以进行各种等价的修改或变化,这样的修改或变化也落入本申请实施例的范围内。It should be understood that the examples shown in FIG. 2 and FIG. 3 are for helping those skilled in the art to better understand the embodiments of the present application, but are not intended to limit the scope of the embodiments of the present application. Those skilled in the art can obviously make various equivalent modifications or changes according to the given FIGS. 2 and 3 , and such modifications or changes also fall within the scope of the embodiments of the present application.
图2是根据本申请实施例的语音处理的流程图。FIG. 2 is a flowchart of speech processing according to an embodiment of the present application.
S201,用户根据麦克风功耗、定位效果、降噪效果中的至少一种确定第一用户指令。S201, the user determines a first user instruction according to at least one of the power consumption of the microphone, the positioning effect, and the noise reduction effect.
S202,电子设备根据第一用户指令从第一麦克风模式、第二麦克风模式和第三麦克风模式中选择目标麦克风模式。S202, the electronic device selects a target microphone mode from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction.
其中,第一麦克风模式包括电子设备的M个麦克风和与电子设备配对连接的耳机的N个麦克风组成的麦克风阵列;第二麦克风模式包括电子设备的M个麦克风组成的麦克风阵列;第三麦克风模式包括耳机的N个麦克风组成的麦克风阵列;M和N为正整数。Wherein, the first microphone mode includes a microphone array composed of M microphones of the electronic device and N microphones of an earphone paired with the electronic device; the second microphone mode includes a microphone array composed of M microphones of the electronic device; the third microphone mode A microphone array consisting of N microphones of the headset; M and N are positive integers.
S203,电子设备激活目标麦克风模式中的麦克风阵列,以及获取目标麦克风模式中的麦克风阵列周围环境的语音信号。S203, the electronic device activates the microphone array in the target microphone mode, and acquires a voice signal of the environment around the microphone array in the target microphone mode.
S204,电子设备使用声学前端信号处理模块对获取的语音信号进行定位处理,得到目标语音信号。S204, the electronic device uses an acoustic front-end signal processing module to perform positioning processing on the acquired voice signal to obtain a target voice signal.
其中,定位处理包括但不限于以下至少一种:Wherein, the positioning processing includes but is not limited to at least one of the following:
回声消除(AEC)、去混响(DER)、语音活动检测(VAD)、波束形成(BF)、广义旁瓣抵消器(GSC)、到达方向估计(DOA)。Echo Cancellation (AEC), De-Reverberation (DER), Voice Activity Detection (VAD), Beamforming (BF), Generalized Sidelobe Canceller (GSC), Direction of Arrival (DOA).
S205,电子设备对目标语音信号进行降噪处理,得到降噪后的语音信号。S205, the electronic device performs noise reduction processing on the target voice signal to obtain a voice signal after noise reduction.
其中,降噪处理例如可以是后置滤波(PF)。The noise reduction process may be, for example, post-filtering (PF).
S206,电子设备对降噪后的语音信号进行本地声效处理。S206, the electronic device performs local sound effect processing on the noise-reduced voice signal.
其中,本地声效处理是对目标语音信号的基频、共振峰进行改动、使用滤波器卷积房间冲激响应等,具体可以包含变调、变速、房间混响、回声等效果。Among them, the local sound effect processing is to modify the fundamental frequency and formant of the target speech signal, and use a filter to convolve the room impulse response.
S207,电子设备根据第二用户指令选择是否对降噪后的语音信号进行云端声效处理;S207, the electronic device selects whether to perform cloud sound effect processing on the noise-reduced voice signal according to the second user instruction;
若是,则对降噪后的语音信号进行云端声效处理,即执行S208;If so, perform cloud sound effect processing on the noise-reduced voice signal, that is, perform S208;
若否,则直接输出降噪后的语音信号。If not, directly output the voice signal after noise reduction.
其中,云端声效处理包括但不限于以下中的至少一种:The cloud sound effect processing includes but is not limited to at least one of the following:
转换为目标人物的声音,转换为目标人群的声音。Convert to the voice of the target person, convert to the voice of the target group.
S208,对降噪后的语音信号进行云端声效处理。S208, performing cloud sound effect processing on the noise-reduced voice signal.
可选地,本地声效处理(S206)可以与云端声效处理(S208)同步进行。Optionally, the local sound effect processing (S206) may be performed synchronously with the cloud sound effect processing (S208).
图3是根据本申请实施例的语音处理的框架图。主要分为本地处理和云端处理两部分,本地的处理算法所需的信号来自电子设备端麦克风和耳机端麦克风,云端的处理算法所需的信号来自本地处理结束之后的语音信号。本地声效处理和云端声效处理之后的语音信号再经过耳机回放。FIG. 3 is a frame diagram of speech processing according to an embodiment of the present application. It is mainly divided into two parts: local processing and cloud processing. The signal required by the local processing algorithm comes from the microphone on the electronic device and the microphone on the earphone, and the signal required by the cloud processing algorithm comes from the voice signal after the local processing. The voice signal after local sound effect processing and cloud sound effect processing is played back through the headset.
因此,在本申请实施例中,根据第一用户指令从第一麦克风模式、第二麦克风模式和第三麦克风模式中选择目标麦克风模式,以及基于目标麦克风模式中的麦克风阵列,以及获取麦克风阵列周围环境的语音信号。也即,用户能够从多种麦克风模式中选择适合的麦克风模式,从而可以权衡定位效果与功耗,以及适应周围环境变换,提升用户体验。Therefore, in this embodiment of the present application, the target microphone mode is selected from the first microphone mode, the second microphone mode, and the third microphone mode according to the first user instruction, and based on the microphone array in the target microphone mode, and the surrounding area of the microphone array is obtained Ambient speech signals. That is, the user can select a suitable microphone mode from a variety of microphone modes, so that the positioning effect and power consumption can be weighed, and the user experience can be improved by adapting to changes in the surrounding environment.
进一步地,在本申请实施例中,可以基于第二用户指令选择是否对目标语音信号进行云端声效处理,提升用户体验。Further, in this embodiment of the present application, whether to perform cloud sound effect processing on the target voice signal may be selected based on the second user instruction, so as to improve user experience.
上文结合图1至图3,详细描述了本申请的方法实施例,下文结合图4至图7,详细描述本申请的装置实施例,应理解,装置实施例与方法实施例相互对应,类似的描述可以参照方法实施例。The method embodiments of the present application are described in detail above with reference to FIGS. 1 to 3 , and the device embodiments of the present application are described in detail below with reference to FIGS. 4 to 7 . It should be understood that the device embodiments and the method embodiments correspond to each other, and are similar to each other. For the description, refer to the method embodiment.
图4示出了根据本申请实施例的电子设备中执行语音处理的装置300的示意性框图。如图4所示,该电子设备中执行语音处理的装置300包括:FIG. 4 shows a schematic block diagram of an apparatus 300 for performing speech processing in an electronic device according to an embodiment of the present application. As shown in FIG. 4 , the apparatus 300 for performing voice processing in the electronic device includes:
选择单元310,用于根据第一用户指令从第一麦克风模式、第二麦克风模式和第三麦克风模式中选择目标麦克风模式,其中,该第一麦克风模式包括该电子设备的M个麦克风和与该电子设备配对连接的耳机的N个麦克风组成的麦克风阵列,该第二麦克风模式包括该电子设备的M个麦克风组成的麦克风阵列,该第三麦克风模式包括该耳机的N个 麦克风组成的麦克风阵列,M和N为正整数;The selection unit 310 is configured to select a target microphone mode from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction, wherein the first microphone mode includes the M microphones of the electronic device and the a microphone array composed of N microphones of an earphone paired and connected by an electronic device, the second microphone mode includes a microphone array composed of M microphones of the electronic device, and the third microphone mode includes a microphone array composed of N microphones of the earphone, M and N are positive integers;
激活单元320,用于激活该目标麦克风模式中的麦克风阵列;an activation unit 320, for activating the microphone array in the target microphone mode;
获取单元330,用于获取该目标麦克风模式中的麦克风阵列周围环境的语音信号;an acquisition unit 330, configured to acquire the voice signal of the surrounding environment of the microphone array in the target microphone mode;
处理单元340,用于对该语音信号进行定位处理,得到目标语音信号。The processing unit 340 is configured to perform positioning processing on the voice signal to obtain a target voice signal.
可选地,该M个麦克风为该电子设备的部分或者全部麦克风。Optionally, the M microphones are part or all of the microphones of the electronic device.
可选地,该N个麦克风为该耳机的部分或者全部麦克风。Optionally, the N microphones are part or all of the microphones of the headset.
可选地,2≤M≤4,2≤N≤6。Optionally, 2≤M≤4, 2≤N≤6.
可选地,该第一用户指令是根据麦克风功耗、定位效果、降噪效果中的至少一种确定的。Optionally, the first user instruction is determined according to at least one of microphone power consumption, positioning effect, and noise reduction effect.
可选地,该定位处理至少包括声纹识别。Optionally, the positioning process includes at least voiceprint recognition.
可选地,该选择单元310还用于根据第二用户指令选择是否对该目标语音信号进行云端声效处理;Optionally, the selection unit 310 is further configured to select whether to perform cloud sound effect processing on the target voice signal according to the second user instruction;
若是,则该处理单元340还用于对该目标语音信号进行云端声效处理。If so, the processing unit 340 is further configured to perform cloud sound effect processing on the target voice signal.
可选地,该云端声效处理包括以下中的至少一种:Optionally, the cloud sound effect processing includes at least one of the following:
变调、变速、房间混响、回声、转换为目标人物的声音,转换为目标人群的声音。Pitch shift, variable speed, room reverb, echo, convert to the voice of the target person, convert to the voice of the target group.
可选地,该处理单元340还用于对该目标语音信号进行降噪处理。Optionally, the processing unit 340 is further configured to perform noise reduction processing on the target speech signal.
可选地,该处理单元340还用于对该目标语音信号进行盲源分离处理,以确定该目标语音信号的声源。Optionally, the processing unit 340 is further configured to perform blind source separation processing on the target speech signal to determine the sound source of the target speech signal.
应理解,根据本申请实施例的电子设备中执行语音处理的装置300可对应于本申请方法实施例中的电子设备,并且电子设备中执行语音处理的装置300中的各个单元的上述和其它操作和/或功能分别为了实现图1所示方法100中电子设备的相应流程,为了简洁,在此不再赘述。It should be understood that the apparatus 300 for performing voice processing in the electronic device according to the embodiment of the present application may correspond to the electronic device in the method embodiment of the present application, and the above-mentioned and other operations of each unit in the apparatus 300 for performing voice processing in the electronic device The and/or functions are respectively in order to implement the corresponding flow of the electronic device in the method 100 shown in FIG. 1 , and for brevity, details are not described here.
图5示出了实现本申请实施例的电子设备的计算机***的结构示意图。需要说明的是,图5示出的电子设备的计算机***400仅是一个示例,不应该对本申请实施例的功能和使用范围带来任何限制。FIG. 5 shows a schematic structural diagram of a computer system implementing an electronic device according to an embodiment of the present application. It should be noted that the computer system 400 of the electronic device shown in FIG. 5 is only an example, and should not impose any limitations on the functions and scope of use of the embodiments of the present application.
如图5所示,计算机***400包括中央处理单元(Central Processing Unit,CPU)401,其可以根据存储在只读存储器(Read-Only Memory,ROM)402中的程序或者从存储部分408加载到随机访问存储器(Random Access Memory,RAM)403中的程序而执行各种适当的动作和处理。在RAM 403中,还存储有***操作所需的各种程序和数据。CPU401、ROM 402以及RAM 403通过总线404彼此相连。输入/输出(Input/Output,I/O)接口405也连接至总线404。As shown in FIG. 5 , the computer system 400 includes a central processing unit (Central Processing Unit, CPU) 401, which can be loaded into a random device according to a program stored in a read-only memory (Read-Only Memory, ROM) 402 or from a storage part 408 Various appropriate actions and processes are performed by accessing programs in the memory (Random Access Memory, RAM) 403 . In the RAM 403, various programs and data required for system operation are also stored. The CPU 401, the ROM 402, and the RAM 403 are connected to each other through a bus 404. An Input/Output (I/O) interface 405 is also connected to the bus 404 .
以下部件连接至I/O接口405:包括键盘、鼠标等的输入部分404;包括诸如阴极射线管(Cathode Ray Tube,CRT)、液晶显示器(Liquid Crystal Display,LCD)等以及扬声器等的输出部分407;包括硬盘等的存储部分408;以及包括诸如局域网(Local Area Network,LAN)卡、调制解调器等的网络接口卡的通信部分409。通信部分409经由诸如因特网的网络执行通信处理。驱动器410也根据需要连接至I/O接口405。可拆卸介质411,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器410上,以便于从其上读取的计算机程序根据需要被安装入存储部分408。The following components are connected to the I/O interface 405: an input section 404 including a keyboard, a mouse, etc.; an output section 407 including a cathode ray tube (CRT), a liquid crystal display (LCD), etc., and a speaker, etc. ; a storage portion 408 including a hard disk, etc.; and a communication portion 409 including a network interface card such as a local area network (Local Area Network, LAN) card, a modem, and the like. The communication section 409 performs communication processing via a network such as the Internet. A drive 410 is also connected to the I/O interface 405 as needed. A removable medium 411, such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc., is mounted on the drive 410 as needed so that a computer program read therefrom is installed into the storage section 408 as needed.
特别地,根据本申请实施例,上文流程图描述的过程可以被实现为计算机软件程序。例如,本申请的实施例包括一种计算机程序产品,其包括承载在计算机可读介质上的计算机程序,该计算机程序包含用于执行上述流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分409从网络上被下载和安装,和/或从可拆卸介质411被安装。在该计算机程序被中央处理器(CPU)401执行时,执行本申请的装置中限定的各种功能。In particular, according to the embodiments of the present application, the processes described in the above flowcharts may be implemented as computer software programs. For example, embodiments of the present application include a computer program product comprising a computer program carried on a computer-readable medium, the computer program containing program code for performing the method shown in the above flow chart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 409 and/or installed from the removable medium 411 . When the computer program is executed by the central processing unit (CPU) 401, various functions defined in the apparatus of the present application are executed.
图6是本申请实施例的芯片的示意性结构图。图6所示的芯片500包括处理器510,处理器510可以从存储器中调用并运行计算机程序,以实现本申请实施例中的方法。FIG. 6 is a schematic structural diagram of a chip according to an embodiment of the present application. The chip 500 shown in FIG. 6 includes a processor 510, and the processor 510 can call and run a computer program from a memory, so as to implement the method in this embodiment of the present application.
可选地,如图6所示,芯片500还可以包括存储器520。其中,处理器510可以从存储器520中调用并运行计算机程序,以实现本申请实施例中的方法。Optionally, as shown in FIG. 6 , the chip 500 may further include a memory 520 . The processor 510 may call and run a computer program from the memory 520 to implement the methods in the embodiments of the present application.
其中,存储器520可以是独立于处理器510的一个单独的器件,也可以集成在处理器510中。The memory 520 may be a separate device independent of the processor 510 , or may be integrated in the processor 510 .
可选地,该芯片500还可以包括输入接口530。其中,处理器510可以控制该输入接口530与其他设备或芯片进行通信,具体地,可以获取其他设备或芯片发送的信息或数据。Optionally, the chip 500 may further include an input interface 530 . The processor 510 may control the input interface 530 to communicate with other devices or chips, and specifically, may acquire information or data sent by other devices or chips.
可选地,该芯片500还可以包括输出接口540。其中,处理器510可以控制该输出接口540与其他设备或芯片进行通信,具体地,可以向其他设备或芯片输出信息或数据。Optionally, the chip 500 may further include an output interface 540 . The processor 510 may control the output interface 540 to communicate with other devices or chips, and specifically, may output information or data to other devices or chips.
可选地,该芯片可应用于本申请实施例中的电子设备,并且该芯片可以实现本申请实施例的各个方法中由电子设备实现的相应流程,为了简洁,在此不再赘述。Optionally, the chip can be applied to the electronic device in the embodiment of the present application, and the chip can implement the corresponding processes implemented by the electronic device in each method of the embodiment of the present application, which is not repeated here for brevity.
上述芯片例如可以是***级芯片,***芯片,芯片***或片上***芯片等。The above-mentioned chip may be, for example, a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-chip.
在一个实施例中,提供了一种电子设备600,包括:In one embodiment, an electronic device 600 is provided, comprising:
第一数量的第一麦克风610;以及a first number of first microphones 610; and
与该电子设备600配对连接的耳机700,其包括第二数量的第二麦克风710,其中该第二数量大于等于该第一数量;The headset 700 paired with the electronic device 600 includes a second number of second microphones 710, wherein the second number is greater than or equal to the first number;
其中,根据第一用户指令,该第一麦克风610和该第二麦克风710配置为相应的麦克风阵列来获取周围环境的语音信号,并且对该语音信号进行定位处理,以得到目标语音信号。具体可以如图7所示。Wherein, according to the first user instruction, the first microphone 610 and the second microphone 710 are configured as corresponding microphone arrays to obtain the voice signal of the surrounding environment, and perform localization processing on the voice signal to obtain the target voice signal. Specifically, it can be shown in FIG. 7 .
在一个实施例中,提供了一种电子设备,包括:处理器和存储器,该存储器用于存储计算机程序,该处理器用于调用并运行该存储器中存储的计算机程序,执行上述各方法实施例中的步骤。In one embodiment, an electronic device is provided, including: a processor and a memory, where the memory is used for storing a computer program, the processor is used for calling and running the computer program stored in the memory, and executing the above method embodiments A step of.
在一个实施例中,提供了一种计算机可读存储介质,存储有计算机程序,该计算机程序被处理器执行时实现上述各方法实施例中的步骤。In one embodiment, a computer-readable storage medium is provided, which stores a computer program, and when the computer program is executed by a processor, implements the steps in the foregoing method embodiments.
应理解,本申请实施例的处理器可能是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法实施例的各步骤可以通过处理器中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器可以是通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器,处理器读取存储器中的信息,结合其硬件完成上述方法的步骤。It should be understood that the processor in this embodiment of the present application may be an integrated circuit chip, which has a signal processing capability. In the implementation process, each step of the above method embodiments may be completed by a hardware integrated logic circuit in a processor or an instruction in the form of software. The above-mentioned processor can be a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), an off-the-shelf programmable gate array (Field Programmable Gate Array, FPGA) or other available Programming logic devices, discrete gate or transistor logic devices, discrete hardware components. The methods, steps, and logic block diagrams disclosed in the embodiments of this application can be implemented or executed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in conjunction with the embodiments of the present application may be directly embodied as executed by a hardware decoding processor, or executed by a combination of hardware and software modules in the decoding processor. The software modules may be located in random access memory, flash memory, read-only memory, programmable read-only memory or electrically erasable programmable memory, registers and other storage media mature in the art. The storage medium is located in the memory, and the processor reads the information in the memory, and completes the steps of the above method in combination with its hardware.
可以理解,本申请实施例中的存储器可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(Read-Only Memory,ROM)、可编程只读存储器(Programmable ROM,PROM)、可擦除可编程只读存储器(Erasable PROM,EPROM)、电可擦除可编程只读存储器(Electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(Random Access Memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的RAM可用,例如静态随机存取存储器(Static RAM,SRAM)、动态随机存取存储器(Dynamic RAM,DRAM)、同步动态随机存取存储器(Synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(Double Data Rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(Enhanced SDRAM,ESDRAM)、同步连接动态随机 存取存储器(Synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)。应注意,本文描述的***和方法的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It can be understood that the memory in this embodiment of the present application may be a volatile memory or a non-volatile memory, or may include both volatile and non-volatile memory. Wherein, the non-volatile memory may be a read-only memory (Read-Only Memory, ROM), a programmable read-only memory (Programmable ROM, PROM), an erasable programmable read-only memory (Erasable PROM, EPROM), an electrically programmable read-only memory (Erasable PROM, EPROM). Erase programmable read-only memory (Electrically EPROM, EEPROM) or flash memory. Volatile memory may be Random Access Memory (RAM), which acts as an external cache. By way of illustration and not limitation, many forms of RAM are available, such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (Double Data Rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (Enhanced SDRAM, ESDRAM), synchronous link dynamic random access memory (Synchlink DRAM, SLDRAM) ) and direct memory bus random access memory (Direct Rambus RAM, DR RAM). It should be noted that the memory of the systems and methods described herein is intended to include, but not be limited to, these and any other suitable types of memory.
应理解,上述存储器为示例性但不是限制性说明,例如,本申请实施例中的存储器还可以是静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(dynamic RAM,DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synch link DRAM,SLDRAM)以及直接内存总线随机存取存储器(Direct Rambus RAM,DR RAM)等等。也就是说,本申请实施例中的存储器旨在包括但不限于这些和任意其它适合类型的存储器。It should be understood that the above memory is an example but not a limitative description, for example, the memory in the embodiment of the present application may also be a static random access memory (static RAM, SRAM), a dynamic random access memory (dynamic RAM, DRAM), Synchronous dynamic random access memory (synchronous DRAM, SDRAM), double data rate synchronous dynamic random access memory (double data rate SDRAM, DDR SDRAM), enhanced synchronous dynamic random access memory (enhanced SDRAM, ESDRAM), synchronous connection Dynamic random access memory (synch link DRAM, SLDRAM) and direct memory bus random access memory (Direct Rambus RAM, DR RAM) and so on. That is, the memory in the embodiments of the present application is intended to include but not limited to these and any other suitable types of memory.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art can realize that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Skilled artisans may implement the described functionality using different methods for each particular application, but such implementations should not be considered beyond the scope of this application.
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的***、装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。Those skilled in the art can clearly understand that, for the convenience and brevity of description, the specific working process of the above-described systems, devices and units may refer to the corresponding processes in the foregoing method embodiments, which will not be repeated here.
在本申请所提供的几个实施例中,应该理解到,所揭露的***、装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个***,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed system, apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented. On the other hand, the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。针对这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。The functions, if implemented in the form of software functional units and sold or used as independent products, may be stored in a computer-readable storage medium. For such understanding, the technical solution of the present application can be embodied in the form of a software product in essence, or the part that contributes to the prior art or the part of the technical solution. The computer software product is stored in a storage medium, including Several instructions are used to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute all or part of the steps of the methods described in the various embodiments of the present application. The aforementioned storage medium includes: U disk, mobile hard disk, read-only memory (Read-Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other media that can store program codes .
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应所述以权利要求的保护范围为准。The above are only specific embodiments of the present application, but the protection scope of the present application is not limited to this. should be covered within the scope of protection of this application. Therefore, the protection scope of the present application should be based on the protection scope of the claims.

Claims (15)

  1. 一种电子设备中执行语音处理的方法,其特征在于,包括:A method for performing voice processing in an electronic device, comprising:
    根据第一用户指令从第一麦克风模式、第二麦克风模式和第三麦克风模式中选择目标麦克风模式,其中,所述第一麦克风模式包括所述电子设备的M个麦克风和与所述电子设备配对连接的耳机的N个麦克风组成的麦克风阵列,所述第二麦克风模式包括所述电子设备的M个麦克风组成的麦克风阵列,所述第三麦克风模式包括所述耳机的N个麦克风组成的麦克风阵列,M和N为正整数;A target microphone mode is selected from the first microphone mode, the second microphone mode and the third microphone mode according to a first user instruction, wherein the first microphone mode includes M microphones of the electronic device and pairing with the electronic device A microphone array composed of N microphones of the connected headset, the second microphone mode includes a microphone array composed of M microphones of the electronic device, and the third microphone mode includes a microphone array composed of N microphones of the headset , M and N are positive integers;
    激活所述目标麦克风模式中的麦克风阵列,以及获取周围环境的语音信号;以及activating the microphone array in the target microphone pattern, and acquiring ambient speech signals; and
    对所述语音信号进行定位处理,以得到目标语音信号。Positioning processing is performed on the voice signal to obtain a target voice signal.
  2. 根据权利要求1所述的方法,其特征在于,所述M个麦克风为所述电子设备的部分或者全部麦克风。The method according to claim 1, wherein the M microphones are part or all of the microphones of the electronic device.
  3. 根据权利要求1所述的方法,其特征在于,所述N个麦克风为所述耳机的部分或者全部麦克风。The method according to claim 1, wherein the N microphones are part or all of the microphones of the earphone.
  4. 根据权利要求1所述的方法,其特征在于,2≤M≤4,2≤N≤6。The method according to claim 1, wherein 2≤M≤4, 2≤N≤6.
  5. 根据权利要求1所述的方法,其特征在于,所述第一用户指令是根据麦克风功耗、定位效果、降噪效果中的至少一种确定的。The method according to claim 1, wherein the first user instruction is determined according to at least one of microphone power consumption, positioning effect, and noise reduction effect.
  6. 根据权利要求1所述的方法,其特征在于,所述定位处理至少包括声纹识别。The method according to claim 1, wherein the location processing at least includes voiceprint recognition.
  7. 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 6, wherein the method further comprises:
    根据第二用户指令选择是否对所述目标语音信号进行云端声效处理;以及Selecting whether to perform cloud sound effect processing on the target voice signal according to the second user instruction; and
    若是,则对所述目标语音信号进行云端声效处理。If so, perform cloud sound effect processing on the target voice signal.
  8. 根据权利要求7所述的方法,其特征在于,所述云端声效处理包括以下中的至少一种:The method according to claim 7, wherein the cloud sound effect processing comprises at least one of the following:
    变调、变速、房间混响、回声、转换为目标人物的声音,转换为目标人群的声音。Pitch shift, variable speed, room reverb, echo, convert to the voice of the target person, convert to the voice of the target group.
  9. 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 6, wherein the method further comprises:
    对所述目标语音信号进行降噪处理。Noise reduction processing is performed on the target speech signal.
  10. 根据权利要求1至6中任一项所述的方法,其特征在于,所述方法还包括:The method according to any one of claims 1 to 6, wherein the method further comprises:
    对所述目标语音信号进行盲源分离处理,以确定所述目标语音信号的声源。Perform blind source separation processing on the target speech signal to determine the sound source of the target speech signal.
  11. 一种电子设备中执行语音处理的装置,其特征在于,包括:A device for performing voice processing in an electronic device, comprising:
    选择单元,用于根据第一用户指令从第一麦克风模式、第二麦克风模式和第三麦克风模式中选择目标麦克风模式,其中,所述第一麦克风模式包括所述电子设备的M个麦克风和与所述电子设备配对连接的耳机的N个麦克风组成的麦克风阵列,所述第二麦克风模式包括所述电子设备的M个麦克风组成的麦克风阵列,所述第三麦克风模式包括所述耳机的N个麦克风组成的麦克风阵列,M和N为正整数;The selection unit is configured to select a target microphone mode from the first microphone mode, the second microphone mode and the third microphone mode according to the first user instruction, wherein the first microphone mode includes M microphones of the electronic device and A microphone array consisting of N microphones of the headset paired and connected to the electronic device, the second microphone mode includes a microphone array consisting of M microphones of the electronic device, and the third microphone mode includes N microphones of the headset Microphone array composed of microphones, M and N are positive integers;
    激活单元,用于激活所述目标麦克风模式中的麦克风阵列;an activation unit for activating the microphone array in the target microphone mode;
    获取单元,用于获取所述目标麦克风模式中的麦克风阵列周围环境的语音信号;以及an acquisition unit, configured to acquire the voice signal of the surrounding environment of the microphone array in the target microphone mode; and
    处理单元,用于对所述语音信号进行定位处理,以得到目标语音信号。The processing unit is configured to perform positioning processing on the voice signal to obtain a target voice signal.
  12. 一种电子设备,其特征在于,包括:处理器和存储器,所述存储器用于存储计算机程序,所述处理器用于调用并运行所述存储器中存储的计算机程序,执行如权利要求1至10中任一项所述的方法。An electronic device, characterized in that it comprises: a processor and a memory, wherein the memory is used for storing a computer program, and the processor is used for calling and running the computer program stored in the memory, and executes the program as described in claims 1 to 10. The method of any one.
  13. 一种芯片,其特征在于,包括:处理器,用于从存储器中调用并运行计算机程序,使得所述处理器执行如权利要求1至10中任一项所述的方法。A chip, characterized by comprising: a processor for calling and running a computer program from a memory, so that the processor executes the method according to any one of claims 1 to 10.
  14. 一种计算机可读存储介质,其特征在于,用于存储计算机程序,所述计算机程序使得计算机执行如权利要求1至10中任一项所述的方法。A computer-readable storage medium, characterized by being used for storing a computer program, the computer program causing a computer to execute the method according to any one of claims 1 to 10.
  15. 一种电子设备,其特征在于,包括:An electronic device, comprising:
    第一数量的第一麦克风;以及a first number of first microphones; and
    与所述电子设备配对连接的耳机,其包括第二数量的第二麦克风,其中所述第二数量大于等于所述第一数量;an earphone paired with the electronic device, comprising a second number of second microphones, wherein the second number is greater than or equal to the first number;
    其中,根据第一用户指令,所述第一麦克风和所述第二麦克风配置为相应的麦克风阵列来获取周围环境的语音信号,并且对所述语音信号进行定位处理,以得到目标语音信号。Wherein, according to the first user instruction, the first microphone and the second microphone are configured as corresponding microphone arrays to acquire the voice signal of the surrounding environment, and the positioning processing is performed on the voice signal to obtain the target voice signal.
PCT/CN2021/118033 2020-11-17 2021-09-13 Method and apparatus for performing speech processing in electronic device, electronic device, and chip WO2022105392A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011288185.2A CN114513715A (en) 2020-11-17 2020-11-17 Method and device for executing voice processing in electronic equipment, electronic equipment and chip
CN202011288185.2 2020-11-17

Publications (1)

Publication Number Publication Date
WO2022105392A1 true WO2022105392A1 (en) 2022-05-27

Family

ID=81546828

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/118033 WO2022105392A1 (en) 2020-11-17 2021-09-13 Method and apparatus for performing speech processing in electronic device, electronic device, and chip

Country Status (2)

Country Link
CN (1) CN114513715A (en)
WO (1) WO2022105392A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115474117A (en) * 2022-11-03 2022-12-13 深圳黄鹂智能科技有限公司 Sound reception method and sound reception device based on three microphones

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103999488A (en) * 2011-12-19 2014-08-20 高通股份有限公司 Automated user/sensor location recognition to customize audio performance in a distributed multi-sensor environment
US20140314242A1 (en) * 2013-04-19 2014-10-23 Plantronics, Inc. Ambient Sound Enablement for Headsets
CN109155130A (en) * 2016-05-13 2019-01-04 伯斯有限公司 Handle the voice from distributed microphone
CN111479180A (en) * 2019-01-24 2020-07-31 Oppo广东移动通信有限公司 Pickup control method and related product

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102800323B (en) * 2012-06-25 2014-04-02 华为终端有限公司 Method and device for reducing noises of voice of mobile terminal
CN107205196A (en) * 2017-05-19 2017-09-26 歌尔科技有限公司 Method of adjustment and device that microphone array is pointed to
CN108012217A (en) * 2017-11-30 2018-05-08 出门问问信息科技有限公司 The method and device of joint noise reduction

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103999488A (en) * 2011-12-19 2014-08-20 高通股份有限公司 Automated user/sensor location recognition to customize audio performance in a distributed multi-sensor environment
US20140314242A1 (en) * 2013-04-19 2014-10-23 Plantronics, Inc. Ambient Sound Enablement for Headsets
CN109155130A (en) * 2016-05-13 2019-01-04 伯斯有限公司 Handle the voice from distributed microphone
CN111479180A (en) * 2019-01-24 2020-07-31 Oppo广东移动通信有限公司 Pickup control method and related product

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115474117A (en) * 2022-11-03 2022-12-13 深圳黄鹂智能科技有限公司 Sound reception method and sound reception device based on three microphones
CN115474117B (en) * 2022-11-03 2023-01-10 深圳黄鹂智能科技有限公司 Sound reception method and sound reception device based on three microphones

Also Published As

Publication number Publication date
CN114513715A (en) 2022-05-17

Similar Documents

Publication Publication Date Title
CN108475502B (en) For providing the method and system and computer readable storage medium of environment sensing
WO2020143652A1 (en) Keyword detection method and related device
US11308977B2 (en) Processing method of audio signal using spectral envelope signal and excitation signal and electronic device including a plurality of microphones supporting the same
EP4060658A1 (en) Voice wake-up method, apparatus, and system
US9961443B2 (en) Microphone signal fusion
WO2020103703A1 (en) Audio data processing method and apparatus, device and storage medium
KR101614790B1 (en) Camera driven audio spatialization
US20160227336A1 (en) Contextual Switching of Microphones
TW201401269A (en) Adjusting audio beamforming settings based on system state
JP2023532078A (en) Headset noise processing method, device and headset
KR102565882B1 (en) the Sound Outputting Device including a plurality of microphones and the Method for processing sound signal using the plurality of microphones
US20240163612A1 (en) Method of waking a device using spoken voice commands
CN111863020B (en) Voice signal processing method, device, equipment and storage medium
WO2022160715A1 (en) Voice signal processing method and electronic device
EP4354900A1 (en) Audio information processing method, electronic device, system, product, and medium
WO2022105392A1 (en) Method and apparatus for performing speech processing in electronic device, electronic device, and chip
US20210090548A1 (en) Translation system
TWI818493B (en) Methods, systems, and devices for speech enhancement
WO2022068694A1 (en) Electronic device and wake-up method thereof
CN113409805B (en) Man-machine interaction method and device, storage medium and terminal equipment
CN116405589B (en) Sound processing method and related device
CN113496699A (en) Voice processing method, device, storage medium and terminal
CN111565346B (en) Method and device for determining parameters of line speaker array, storage medium and terminal
KR20170096445A (en) Electronic device and method for converting call type thereof
CN112740219A (en) Method and device for generating gesture recognition model, storage medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21893543

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21893543

Country of ref document: EP

Kind code of ref document: A1