WO2022227655A1 - 一种声音播放方法、装置、电子设备及可读存储介质 - Google Patents

一种声音播放方法、装置、电子设备及可读存储介质 Download PDF

Info

Publication number
WO2022227655A1
WO2022227655A1 PCT/CN2021/141559 CN2021141559W WO2022227655A1 WO 2022227655 A1 WO2022227655 A1 WO 2022227655A1 CN 2021141559 W CN2021141559 W CN 2021141559W WO 2022227655 A1 WO2022227655 A1 WO 2022227655A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
real
time
playback
playing
Prior art date
Application number
PCT/CN2021/141559
Other languages
English (en)
French (fr)
Inventor
赵祥军
Original Assignee
歌尔股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 歌尔股份有限公司 filed Critical 歌尔股份有限公司
Publication of WO2022227655A1 publication Critical patent/WO2022227655A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1083Reduction of ambient noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1091Details not provided for in groups H04R1/1008 - H04R1/1083

Definitions

  • the present invention relates to the technical field of audio processing, and in particular, to a sound playback method, device, electronic device and readable storage medium.
  • Using an electronic device with a noise reduction function can eliminate the interference of ambient sounds, but at the same time, the user cannot hear the ambient sound while reducing the noise.
  • the purpose of the present invention is to provide a sound playback method, device, electronic equipment and readable storage medium, which can collect real-time environmental sound during the normal use of electronic equipment with audio playback function, and match the real-time environmental sound with the playback sign.
  • playing the real-time ambient sound or turning off the audio playback function enables the user not to miss important ambient sounds when using the electronic device.
  • the present invention provides the following technical solutions:
  • a sound playback method, applied to an electronic device with playback function includes:
  • the playback flag If the sound content matches the playback flag, the real-time ambient sound is played or the audio playback function is turned off.
  • the playback mark is a semantic mark or a scene mark
  • content detection is performed on the real-time environmental sound to obtain the sound content, including:
  • the semantic recognition result or the scene recognition result is determined as the sound content.
  • the real-time ambient sound is enhanced and played.
  • the electronic device includes at least two collection ends arranged at intervals, and the collection of real-time ambient sound includes:
  • the sound content is matched with the playback sign, then compare the audio sizes of the sound content collected by the two described collection terminals respectively, and play the real-time environment collected by the collection terminal corresponding to the high-frequency audio after the comparison is completed. sound.
  • the electronic device includes two playback terminals corresponding to the two acquisition terminals, and if the sound content matches the playback flag, the high audio frequency is played by the playback terminal corresponding to the acquisition terminal corresponding to the high audio frequency. Corresponding to the real-time ambient sound collected by the collecting terminal.
  • the method further includes:
  • the method further includes:
  • the semantic recognition result includes a preset conversation end marker, stop playing the real-time ambient sound.
  • a sound playback device applied to electronic equipment with playback function, includes:
  • Sound acquisition module used to collect real-time ambient sound
  • a speech recognition module for performing content detection on the real-time environmental sound to obtain the sound content
  • a playback control module configured to play the real-time ambient sound or disable the audio playback function if the sound content matches the playback flag.
  • An electronic device comprising:
  • the acquisition terminal is used to collect real-time ambient sound
  • a playback end for playing a sound, the sound including the real-time environmental sound
  • the processor is configured to implement the steps of the above-mentioned sound playing method when executing the computer program.
  • a readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the above sound playback method.
  • an electronic device with a playback function collects real-time environmental sound; performs content detection on the real-time environmental sound to obtain the sound content; if the sound content matches the playback flag, the real-time environmental sound is played or the audio playback is turned off Function.
  • a play flag can be set in the electronic device in advance.
  • Electronic equipment captures real-time ambient sound. Then, content detection is performed on the real-time ambient sound to obtain the corresponding sound content.
  • the real-time ambient sound can be played or the audio playback function can be turned off. If the real-time ambient sound is played, the user of the electronic device can hear the played real-time ambient sound, avoiding the wrong and important ambient sound; if the audio playback function is turned off, the user of the electronic device can play the audio without the interference of the audio playback. to hear the ambient sound directly.
  • the electronic device collects and recognizes the ambient sound and plays the ambient sound under certain conditions or directly turns off the audio playback function, so that the earphone wearer can use the earphone normally. Ambient sounds corresponding to play marks can also be heard.
  • the embodiments of the present invention also provide a sound playing device, an electronic device, and a readable storage medium corresponding to the above-mentioned sound playing method, which have the above-mentioned technical effects, and are not repeated here.
  • Fig. 1 is the implementation flow chart of a kind of sound playing method in the embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a sound playback device according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of an electronic device in an embodiment of the present invention.
  • the sound playback method provided by the present invention can be applied to an electronic device with a sound playback function, and the electronic device can be specifically a wearable device or a non-wearable device.
  • the sound playback method can be applied to headphones, wearable devices such as VR, AR, Bluetooth glasses, etc.; in addition, it can also be applied to non-wearable devices that require active noise reduction, such as home theaters, car audio, etc. other scenes.
  • FIG. 1 is a flowchart of a method for playing a sound in an embodiment of the present invention.
  • the method can be applied to an earphone, and the method includes the following steps:
  • the earphone When the earphone is worn by the user, due to the structure of the earphone itself and the special function of sound insulation and noise reduction, the wearer of the earphone cannot hear or hear the ambient sound clearly. In order to prevent the earphone wearer from erroneously making an important environmental sound, in this embodiment, the earphone can collect and wait for an opportunity to play the environmental sound, so that the earphone wearer does not miss the important environmental sound.
  • the specific model and type of the earphone are not limited. That is to say, the earphones can be wired earphones, wireless earphones (such as ordinary Bluetooth earphones or TWS earphones), ordinary earphones without special noise reduction function, or earphones with special noise reduction function (such as active noise cancelling headphones).
  • the sound playback method provided by the embodiment of the present invention is applied to the earphone that can isolate the ambient sound more, and the effect is better.
  • the headset can detect its own working state or the state of the sensors on the headset to determine whether it is in a wearing state. For example, if the headset detects that music or a call is currently being played, it is determined that it is currently in a wearing state; or it is determined that it is in a wearing state by detecting the signals collected by sensors related to wearing detection in the headset, such as a Hall sensor and a pressure sensor.
  • the device that collects the sound of the implementation link may be a device on an earphone, such as a microphone on the earphone, or a device not on the earphone, such as a host or earphone corresponding to a communication connection with the earphone. microphone.
  • the microphone on the earphone is used for real-time environmental sound collection
  • the microphone in the current earphone can be multiplexed, for example, the sound collected by the active noise reduction MIC (microphone) in the earphone can be directly used for multiplexing.
  • only one microphone can be used for acquisition or two microphones can be used for acquisition.
  • S102 Perform content detection on real-time ambient sound to obtain sound content.
  • the sound content of the real-time ambient sound can be detected.
  • the sound content may be concrete semantic content in the sound, or a corresponding scene situation, or a special sound (such as a siren).
  • Content detection for real-time ambient sound can be performed according to actual application requirements. For example, if the semantic content needs to be detected, the content detection of the real-time environmental sound refers to the semantic recognition of the real-time environmental sound.
  • the following takes the specific implementation of the separate semantic recognition and the separate scene recognition as an example to describe how to perform content detection on the real-time environmental sound to obtain the sound content, and to describe the distance.
  • the playback sign is a semantic sign
  • step S102 performs content detection on the real-time environmental sound to obtain the sound content, which may specifically include:
  • Step 1 Perform semantic recognition on the real-time environmental sound to obtain a semantic recognition result
  • Step 2 Determining the semantic recognition result as sound content.
  • the implementation environment sound can be input into the semantic recognition module in the earphone for recognition, and the speech recognition result in the real-time environmental sound can be obtained, for example, the text content corresponding to the speech sound of the person talking is recognized.
  • the real-time environmental sound can also be transmitted to the host computer or mobile phone connected to the headset, and then use the trained semantic recognition model or recognition algorithm to semantically recognize the real-time environmental sound on these devices, and then obtain the semantic recognition result.
  • the speech recognition result is used as the sound content of the real-time ambient sound.
  • model and technology to be used to perform semantic recognition on real-time environmental sound reference may be made to relevant solutions for semantic recognition, which will not be repeated here.
  • the playback flag is a scene flag
  • step S102 performs content detection on the real-time environmental sound to obtain the sound content, which may specifically include:
  • Step 1 Perform scene recognition on the real-time environmental sound to obtain a scene recognition result
  • Step 2 Determine the scene recognition result as the sound content.
  • the scene recognition module that implements the ambient sound input into the earphone can be used for identification, such as identifying the current indoor scene, outdoor scene, or some special scene (such as a warning scene of whistle and alarm sound), and get Scene recognition results in real-time ambient sound.
  • identification such as identifying the current indoor scene, outdoor scene, or some special scene (such as a warning scene of whistle and alarm sound)
  • Scene recognition results in real-time ambient sound.
  • devices such as the host or mobile phone connected to the headset, and then use the trained scene recognition model or recognition algorithm to perform scene recognition on the real-time environmental sound on these devices, and then obtain the scene recognition result.
  • the scene recognition result is used as the sound content of the real-time ambient sound.
  • model and technology to be used for scene recognition of real-time environmental sound reference may be made to the relevant scheme of scene recognition, which will not be repeated here.
  • the above-mentioned specific embodiment 1 and specific embodiment 2 can be used in combination, which can solve the problem of not being able to distinguish danger signals at the first time when wearing headphones, resulting in the problem of endangering personal safety, and wearing headphones with others.
  • the user When the user is talking, they cannot hear and answer the first time, causing the user to have social misunderstandings.
  • the content detection performed can detect the left real-time ambient sound and the right real-time ambient sound respectively.
  • the left sound content and the right sound content are obtained respectively.
  • only the sound content corresponding to the left real-time ambient sound or the right real-time ambient sound matches the playback flag, only the corresponding one-sided real-time ambient sound is played, so as to keep one earphone and continue to play the original sound content, such as Music or call sound.
  • the real-time ambient sound is enhanced for playback.
  • enhanced playback refers to highlighting the real-time environmental sound, such as playing the real-time environmental sound only after increasing the volume, or playing it in a loop, or playing the real-time environmental sound after shielding the current audio sound (such as the sound of music being played). sound.
  • the collected real-time environmental sounds include danger signals such as car roaring, wind whistling, and car roaring (corresponding to the environmental signs that need to play real-time environmental sounds), pause the working state of a certain headset, and when the dangerous sound of a certain headset
  • danger signals such as car roaring, wind whistling, and car roaring
  • one or more playback flags may be preset, and when there are multiple playback flags, the types of playback flags may be different.
  • the play flags that can be set include semantic flags and scene flags.
  • the semantic mark can be a preset wearer's name, or other titles.
  • the scene flag may specifically be a scene in which the wearer needs to hear the ambient sound, such as an outdoor scene, a traffic condition scene, or the like.
  • the sound content can be compared with the play mark. If any play mark is included in the sound content, it is determined that the sound content matches the play mark, and the real-time ambient sound can be played or closed. Audio playback function. Specifically, if the electronic device itself does not have a noise reduction function, that is, after the audio playback function is turned off, the user can clearly hear the ambient sound, then the audio playback can be directly turned off when it is determined that the sound content matches the playback logo. Function. If the electronic device itself has a certain noise reduction function, even if the audio playback function is turned off, it is difficult for the user to hear the external sound. In order to ensure that the user does not miss the important environmental sound, the real-time environmental sound can be played.
  • the real-time ambient sound can be played only in one earphone for use, or the real-time ambient sound can be simultaneously played in two earphones.
  • the real-time ambient sound is played in two headphones at the same time, the currently final playing sound content can be directly interrupted, and the real-time ambient sound can be played instead; After adding a prompt tone on the basis of the currently playing sound, the real-time environmental sound can be superimposed and played to remind the wearer to pay attention.
  • the power of the earphone can also be checked, and then the state of the earphone is adjusted based on the circuit detection result, thereby ensuring the normal operation of the earphone. Specifically, when the power of one earphone is too lower than another, for example, the power difference exceeds 10%.
  • the host mobile phone, tablet, computer
  • the stored headset state information is switched, and the switched state is returned to the two headsets.
  • the master-slave headset status can be switched, which can avoid that when the power of the master headset with low power is insufficient, it will cause the other headset to suspend work at the same time, avoiding the pause of music playback and HFP (Hands-free Profile, Bluetooth transmission control, That is, the Bluetooth device can control the phone, such as answering, hanging up, rejecting, voice dialing, etc. The rejection and voice dialing depend on whether the Bluetooth headset and the phone support it. It is a settable mode for the Bluetooth headset to enter a high-fidelity call) 's suspension. It can avoid the problem that HFP transmission is interrupted and music playback is suspended when any one of the TWS earphones is insufficient in power, causing inconvenience to users.
  • HFP High-free Profile
  • an electronic device with a playback function collects real-time environmental sound; performs content detection on the real-time environmental sound to obtain the sound content; if the sound content matches the playback flag, the real-time environmental sound is played or the audio playback is turned off Function.
  • a play flag can be set in the electronic device in advance.
  • Electronic equipment captures real-time ambient sound. Then, content detection is performed on the real-time ambient sound to obtain the corresponding sound content.
  • the real-time ambient sound can be played or the audio playback function can be turned off. If the real-time ambient sound is played, the user of the electronic device can hear the played real-time ambient sound, avoiding the wrong and important ambient sound; if the audio playback function is turned off, the user of the electronic device can play the audio without the interference of the audio playback. to hear the ambient sound directly.
  • the electronic device collects and recognizes the ambient sound and plays the ambient sound under certain conditions or directly turns off the audio playback function, so that the earphone wearer can use the earphone normally. Ambient sounds corresponding to play marks can also be heard.
  • the embodiments of the present invention also provide corresponding improvement solutions.
  • the same steps or corresponding steps in the above-mentioned embodiments can be referred to each other, and corresponding beneficial effects can also be referred to each other, which will not be repeated in the preferred/improved embodiments herein.
  • the electronic device in order to better simulate the real real-time ambient sound, that is, in order to present the stereo effect (binaural effect), the electronic device may specifically include at least two acquisition ends of the spacer device.
  • the real-time ambient sound can be captured on both the left and right sides of the wearer, so that when the real-time ambient sound is played, the corresponding real-time ambient sound can be played in both earphones.
  • the real-time ambient sound includes the left real-time ambient sound and the right-side real-time ambient sound
  • the above step S101 may specifically include: using at least two acquisition terminals to collect the corresponding real-time ambient sound respectively; If the signs match, compare the audio sizes of the sound content collected by the two collection terminals respectively, and play the real-time ambient sound collected by the collection terminal corresponding to the high-frequency audio after the comparison is completed.
  • a sound direction judgment terminal can be specially set, and then the corresponding direction playback terminal can be opened according to the direction.
  • the playback terminal on the other side is used to play the sound of the high-frequency acquisition terminal, and the near terminal is paused, or the transparent mode is turned on or played, so that the effect of transmitting the remote environmental sound can be achieved.
  • the left microphone is used to collect the left real-time ambient sound
  • the right microphone is used to collect the right real-time ambient sound.
  • the collected left real-time ambient sound and right real-time ambient sound are both real-time ambient sounds.
  • the left earphone will play the left real-time ambient sound
  • the right earphone will play the real-time ambient sound, which can be close to the effect of listening to the ambient sound normally without wearing the earphone, which is helpful for the wearer to quickly locate the sound.
  • Source direction quick response.
  • the electronic device includes two playing ends corresponding to the two collecting ends. If the sound content matches the playing mark, the high-frequency audio corresponding to the playing end corresponding to the collecting end is used to play the high-frequency sound.
  • the audio corresponds to the real-time ambient sound collected by the collecting terminal.
  • step S103 if the sound content matches the playback flag, the real-time ambient sound is played, which may specifically include:
  • Step 1 If the sound content matches the playback sign, compare the audio size of the left real-time ambient sound and the right real-time ambient sound, and determine the high-frequency side from the left and right sides;
  • Step 2 Play the corresponding real-time ambient sound by using the earphone corresponding to the high-frequency side.
  • the left side is determined to be the high-frequency side; if the audio of the left real-time ambient sound is smaller than that of the right real-time ambient sound, then the right The side is the high-frequency side; if the audio of the left real-time ambient sound is the same as the right-side real-time ambient sound, the left or right side can be determined as the high-frequency side, and of course, both the left and right sides can be determined. for the high-frequency side.
  • the earphone corresponding to the high-frequency side can be directly used to play real-time ambient sound. Specifically, if the left side is the high audio side, the left earphone is used to play the left real-time ambient sound, and if the right side is the high audio side, the right earphone is used to play the right real-time ambient sound. In this way, the corresponding real-time ambient sound can be played only on the louder side, the binaural effect of the wearer can be basically maintained, and at the same time, another earphone can be used to continue playing the original sound.
  • the ambient sound playback exit operation may also be performed.
  • Ambient sound playback and exit operations including:
  • Step 1 Collect motion data
  • Step 2 using the motion data to identify the current motion state
  • Step 3 Stop playing the real-time ambient sound when the motion state change between the current motion state and the preset motion state satisfies the preset external sound stopping condition.
  • motion data can be collected by using motion sensors inside or outside the electronic device.
  • the motion sensor may specifically be a motion sensor integrated in the earphone, or may be a motion sensor worn by the earphone wearer alone.
  • the motion sensor may specifically be motion data capable of collecting subtle movements of the wearer, such as head rotation, or may be motion data of home appliances capable of collecting the wearer's range of motion, such as a treadmill.
  • the specific model and type of the motion sensor are not limited.
  • the motion data is acquired for identification, so as to obtain the current motion state. Specifically, how to identify the motion data can be identified according to the specific principle and usage scheme strategy of the specific motion sensor, which will not be repeated here.
  • the preset motion state may specifically be the motion state collected last time, or may specifically be a state corresponding to a set standard static posture (such as a standing state). Of course, you can also customize the settings according to actual needs, which will not be listed here.
  • the preset external sound stopping conditions may be specifically that the user's head rotation angle changes at +75 degrees or -75 degrees, and the head travels a distance of 2 meters away, the conversation is deemed to be over, and the real-time ambient sound is stopped; Conversely, it can be determined that the conversation is continuing, and the real-time ambient sound can continue to be played.
  • the ambient sound playback exit operation can also be performed, specifically including:
  • Step 1 Perform semantic recognition on the real-time environmental sound to obtain a semantic recognition result
  • Step 2 If the semantic recognition result includes a preset conversation end sign, stop playing the real-time ambient sound.
  • semantic recognition of the real-time environmental sound can be performed to obtain a semantic recognition result.
  • the semantic recognition result includes a preset conversation end sign (such as the preset words such as goodbye, Bye, bye, etc.)
  • the real-time ambient sound can be stopped immediately, or after a period of time (such as 5 seconds), the playback will be stopped. Real-time ambient sound and reverts to normal music playback or call sound playback.
  • a preset conversation end sign such as the preset words such as goodbye, Bye, bye, etc.
  • the state of playing the ambient sound can be automatically exited based on the user's dialogue content, and the state of the earphone can be freely switched according to the actual application situation.
  • the corresponding processing operation is to turn off the audio playback function.
  • the audio can be turned on. playback function.
  • the embodiments of the present invention also provide a sound playback device applied to an electronic device with a playback function.
  • the sound playback device described below and the sound playback method described above may refer to each other correspondingly.
  • the device includes the following modules:
  • the speech recognition module 102 is used to perform content detection on the real-time environmental sound to obtain the sound content
  • the playback control module 103 is configured to play the real-time ambient sound or disable the audio playback function if the sound content matches the playback flag.
  • the device provided by the embodiment of the present invention is applied to collect real-time environmental sound; content detection is performed on the real-time environmental sound to obtain the sound content; if the sound content matches the playback flag, the real-time environmental sound is played or the audio playback function is turned off.
  • a play flag can be set in the electronic device in advance.
  • Electronic equipment captures real-time ambient sound. Then, content detection is performed on the real-time ambient sound to obtain the corresponding sound content.
  • the real-time ambient sound can be played or the audio playback function can be turned off. If the real-time ambient sound is played, the user of the electronic device can hear the played real-time ambient sound, avoiding the wrong and important ambient sound; if the audio playback function is turned off, the user of the electronic device can play the audio without the interference of the audio playback. to hear the ambient sound directly. That is to say, the device and the electronic device can enable the wearer of the earphone to use the earphone normally while still using the earphone. Ambient sounds corresponding to play marks can be heard.
  • the playback sign is a semantic sign or a scene sign
  • the speech recognition module 102 is specifically configured to perform semantic recognition on real-time environmental sounds to obtain a semantic recognition result; or, perform scene recognition on real-time environmental sounds , get the scene recognition result;
  • the semantic recognition result or the scene recognition result is determined as the sound content.
  • the playback control module 103 is specifically configured to perform enhanced playback of the real-time ambient sound if the sound content matches the danger signal in the playback flag.
  • the electronic device includes at least two collection ends arranged at intervals, and the sound collection module 101 is specifically configured to use the at least two collection ends to collect corresponding real-time environmental sounds respectively;
  • the playback control module 103 is specifically configured to, if the sound content matches the playback flag, compare the audio sizes of the sound content collected by the two collection terminals respectively, and play the audio data collected by the collection terminal corresponding to the high-frequency audio after the comparison is completed. Real-time ambient sound.
  • the electronic device includes two playback terminals corresponding to the two acquisition terminals.
  • the playback control module 103 is specifically configured to use high-frequency audio to correspond to the acquisition terminal if the sound content matches the playback flag.
  • the corresponding playback end plays the real-time ambient sound collected by the corresponding acquisition end of the high-frequency audio. It also includes: after playing the real-time environmental sound, a state recovery module for collecting motion data; using the motion data to identify the current motion state; the motion state change between the current motion state and the preset motion state satisfies the preset external sound stop broadcasting condition, stop playing the real-time ambient sound.
  • a state recovery module which is specifically configured to perform semantic recognition on the real-time environmental sound after playing the real-time environmental sound, and obtain a semantic recognition result; if the semantic recognition result includes a preset conversation End the sign to stop playing the real-time ambient sound.
  • the embodiments of the present invention further provide an electronic device, and an electronic device described below and a sound playback method described above may refer to each other correspondingly.
  • the electronic device includes:
  • the collection terminal 301 is used to collect real-time ambient sound
  • the playback terminal 302 is used for playing sound, and the sound includes real-time ambient sound;
  • the processor 304 is configured to implement the steps of the sound playing method of the above method embodiments when executing the computer program.
  • the collecting end may be a device capable of collecting sound, such as a microphone, and the playing end may be a device having a sound playback function, such as a speaker.
  • the steps in the sound playing method described above can be implemented by the structure of the electronic device.
  • the embodiments of the present invention further provide a readable storage medium, and a readable storage medium described below and a sound playback method described above may refer to each other correspondingly.
  • a readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, implements the steps of the sound playing method in the above method embodiment.
  • the readable storage medium may specifically be a USB flash drive, a removable hard disk, a read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk, or an optical disk, etc. Readable storage medium.
  • the steps of a method or algorithm described in connection with the embodiments disclosed herein may be directly implemented in hardware, a software module executed by a processor, or a combination of the two.
  • the software module can be placed in random access memory (RAM), internal memory, read only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, removable disk, CD-ROM, or any other in the technical field. in any other known form of storage medium.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

本申请一些实施例公开了一种声音播放方法、装置、电子设备及可读存储介质,该方法包括:具有音频播放功能的电子设备采集实时环境声音;对实时环境声音进行内容检测,得到声音内容;若声音内容与播放标志匹配,则播放实时环境声音或关闭音频播放功能。该方法中,电子设备通过对环境声音的采集,识别以及在满足特定条件下进行环境声音的播放,或关闭音频播放功能,能够使得电子设备的使用者在使用该电子设备的情况下,也能够听见与播放标志对应的环境声音。

Description

一种声音播放方法、装置、电子设备及可读存储介质
本申请要求于2021年4月29日提交中国专利局、申请号为202110475845.6、发明名称为“一种声音播放方法、装置、电子设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及音频处理技术领域,特别是涉及一种声音播放方法、装置、电子设备及可读存储介质。
背景技术
使用具有降噪功能的电子设备能够排除环境声音的干扰,但是降噪的同时,也使得用户无法听到环境声音。
举例说明:用户一旦佩戴上耳机,由于耳机本身的结构所限制,会导致佩戴者难以听清环境声音。特别地,对于一些具有降噪功能的耳机(如主动降噪耳机),会帮助佩戴者更加严格的隔离环境声音。这就会导致出现一个问题,在诸如有人通过说话的方式进行打招呼或交谈,路过马路有人鸣笛示警等情况下,佩戴者需要听见环境声音,但是由于耳机的隔绝,导致佩戴者无法听见环境声音,可能会导致错过重要谈话内容,重要警示信息。
综上所述,如何在保障具有降噪功能电子设备的正常合理功能的情况下,解决环境声音隔绝等问题,是目前本领域技术人员急需解决的技术问题。
发明内容
本发明的目的是提供一种声音播放方法、装置、电子设备及可读存储介质,能够在具有音频播放功能的电子设备正常使用过程中,通过采集实时环境声音,在实时环境声音与播放标志匹配的情况下,对实时环境声音进行播放或关闭音频播放功能,能够使用户在使用该电子设备的情况下,不错过重要的环境声音。
为解决上述技术问题,本发明提供如下技术方案:
一种声音播放方法,应用于具有播放功能的电子设备中,包括:
采集实时环境声音;
对所述实时环境声音进行内容检测,得到声音内容;
若所述声音内容与播放标志匹配,则播放所述实时环境声音或关闭音频播放功能。
优选地,所述播放标志为语义标志或场景标志,对所述实时环境声音进行内容检测,得到声音内容,包括:
对所述实时环境声音进行语义识别,得到语义识别结果;
或,对所述实时环境声音进行场景识别,得到场景识别结果;
相应地,将所述语义识别结果或所述场景识别结果确定为所述声音内容。
优选地,若所述声音内容为所述播放标志中的危险信号匹配,则对所述实时环境声音进行加强播放。
优选地,所述电子设备包括间隔设置的至少两个采集端,所述采集实时环境声音,包括:
利用至少两个所述采集端分别采集对应的所述实时环境声音;
相应地,若所述声音内容与所述播放标志匹配,则比对两个所述采集端分别采集的声音内容的音频大小,并在完成比对后播放高音频对应的采集端采集的实时环境声音。
优选地,所述电子设备包括与两个所述采集端对应设置的两个播放端,若所述声音内容与所述播放标志匹配,则利用高音频对应采集端所对应的播放端播放高音频对应采集端所采集的实时环境声音。
优选地,在播放所述实时环境声音之后,还包括:
采集运动数据;
利用所述运动数据识别当前运动状态;
在所述当前运动状态与预设运动状态之间的运动状态变化满足预设外音停播条件的情况下,停止播放所述实时环境声音。
优选地,在播放所述实时环境声音之后,还包括:
对所述实时环境声音进行语义识别,得到语义识别结果;
若所述语义识别结果中包括预设交谈结束标志语,则停止播放所述实时环境声音。
一种声音播放装置,应用于具有播放功能的电子设备,包括:
声音采集模块,用于采集实时环境声音;
语音识别模块,用于对所述实时环境声音进行内容检测,得到声音内容;
播放控制模块,用于若所述声音内容与播放标志匹配,则播放所述实时环境声音或关闭音频播放功能。
一种电子设备,包括:
采集端,用于采集实时环境声音;
播放端,用于播放声音,所述声音包括所述实时环境声音;
存储器,用于存储计算机程序;
处理器,用于执行所述计算机程序时实现如上述声音播放方法的步骤。
一种可读存储介质,所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现上述声音播放方法的步骤。
应用本发明实施例所提供的方法,具有播放功能的电子设备采集实时环境声音;对实时环境声音进行内容检测,得到声音内容;若声音内容与播放标志匹配,则播放实时环境声音或关闭音频播放功能。
在该方法中,可以预先在电子设备中设置播放标志。电子设备采集实时环境声音。然后对实时环境声音进行内容检测,得到对应的声音内容。在发现声音内容与播放标志匹配的情况下,便可对实时环境声音进行播放或关闭音频播放功能。若对实时环境声音进行播放,则电子设备的使用者便可以听见所播放的实时环境声音,避免错误重要的环境声音;若关闭音频播放功能,则电子设备的使用者可以在无音频播放的干扰下,直接听见环境声音。也就是说,该方法中,电子设备通过对环境声音的采集,识别以及在满足特定条件下进行环境声音的播放或直接关闭音频播放功能,能够使得耳机佩戴者能够在正常使用耳机的情况下,也能够听见与播放标志对应的环境声音。
相应地,本发明实施例还提供了与上述声音播放方法相对应的声音播放装置、电子设备和可读存储介质,具有上述技术效果,在此不再赘述。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面 描述中的附图仅仅是本申请的一部分附图,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据提供的附图获得其他的附图。
图1为本发明实施例中一种声音播放方法的实施流程图;
图2为本发明实施例中一种声音播放装置的结构示意图;
图3为本发明实施例中一种电子设备的结构示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本发明所提供的声音播放方法,可以应用于具有声音播放功能的电子设备中,该电子设备可以具体为穿戴设备,也可以为非穿戴设备。例如,该声音播放方法可以应用于耳机中,也可应用于如VR、AR、蓝牙眼镜等穿戴设备;此外,还可应用于需具备主动降噪的非穿戴设备,如家庭剧院、车载音响等其他场景。
下面以在耳机中用于为例,对该声音播放方法进行详细说明。对于应用于其他类型的电子设备中,可以参照与此,在此不再一一赘述。
请参考图1,图1为本发明实施例中一种声音播放方法的流程图,该方法可以应用于耳机中,该方法包括以下步骤:
S101、采集实时环境声音。
当耳机被用户佩戴上,由于耳机本身的结构以及隔音降噪的特殊功能,会导致耳机的佩戴者无法听见或无法听清环境声音。为了避免耳机佩戴者错误重要的环境声音,在本实施例中,耳机可通过采集并伺机播放环境声音,从而使得耳机佩戴者不错过重要的环境声音。
具体的,在本实施例中,对于耳机的具体型号和类型不做限定。也就是说,该耳机可以是有线耳机,也可以是无线耳机(如普通蓝牙耳机或TWS耳机),也可以为普通不具体特殊降噪功能的耳机,也可以为具有特殊降噪功能的耳机(如主动降噪耳机)。考虑到,耳机的隔绝效果越好,佩戴者越不容易听见环境声音,因此,在越能隔绝环境声音的耳机上应用本发明实施 例所提供的声音播放方法,其效果越佳。
耳机可以对自身的工作状态或耳机上的传感器状态进行检测,从而确定是否处于佩戴状态。例如,耳机自检到当前正在播放音乐或通话,则确定当前处于佩戴状态;或通过检测耳机中关于佩戴检测的传感器,如霍尔传感器,压力传感器等采集的信号,从而确定处于佩戴状态。
需要注意的是,在本实施例中,采集实施环节声音的器件可以为耳机上的器件,如耳机上的麦克风,也可以为非耳机上的器件,如与耳机具有通信连接的主机或耳机对应的麦克风。特别的,当采用耳机上的麦克风进行实时环境声音采集时,可以复用当前耳机中的麦克风,如可以直接将耳机中的主动降噪MIC(麦克风)所采集的声音拿来进行复用。
当利用耳机上的麦克风采集实时环境声音时,可以仅使用一个麦克风进行采集也可以采用两个麦克风进行采集。
S102、对实时环境声音进行内容检测,得到声音内容。
在采集到实时环境声音之后,便可检测实时环境声音的声音内容。一般地,声音内容可以具体为声音中语义内容,或对应的场景情况,或特殊声音(如警笛声)。对实时环境声音进行内容检测可以根据实际的应用需求进行对应的检测处理。例如,需要检测语义内容,则对实时环境声音进行内容检测,即指对实时环境声音进行语义识别。
下面以单独进行语义识别和单独进行场景识别的具体实施方式为例,对如何对实时环境声音进行内容检测,得到声音内容,进行距离说明。
具体实施方式1:播放标志为语义标志,步骤S102对实时环境声音进行内容检测,得到声音内容,可以具体包括:
步骤一、对实时环境声音进行语义识别,得到语义识别结果;
步骤二、将语义识别结果确定为声音内容。
为便于描述,下面将上述两个步骤结合起来进行说明。
具体的,可以将实施环境声音输入到耳机中的语义识别模块进行识别,得到实时环境声音中的语音识别结果,如识别出人谈话的谈话声音对应的文字内容。当然,也可以将实时环境声音传输给与耳机相连接的主机或手机等设备,进而在这些设备上对采用训练好的语义识别模型或识别算法对实时环境声音进行语义识别,进而得到语义识别结果。将该语音识别结果作为实时 环境声音的声音内容。在本实施例中,对于具体采用何种算法、模型和技术对实时环境声音进行语义识别,可参照语义识别的相关方案,在此不再一一赘述。
具体实施方式2:播放标志为场景标志,步骤S102对实时环境声音进行内容检测,得到声音内容,可以具体包括:
步骤一、对实时环境声音进行场景识别,得到场景识别结果;
步骤二、将场景识别结果确定为声音内容。
为便于描述,下面将上述两个步骤结合起来进行说明。
具体的,可以将实施环境声音输入到耳机中的场景识别模块进行识别,如识别出当前处于室内场景、还是室外场景、还是某些特殊场景(如有鸣笛、警报声的警示场景),得到实时环境声音中的场景识别结果。当然,也可以将实时环境声音传输给与耳机相连接的主机或手机等设备,进而在这些设备上对采用训练好的场景识别模型或识别算法对实时环境声音进行场景识别,进而得到场景识别结果。将该场景识别结果作为实时环境声音的声音内容。在本实施例中,对于具体采用何种算法、模型和技术对实时环境声音进行场景识别,可参照场景识别的相关方案,在此不再一一赘述。
在实际应用中,可以将上述具体实施方式1和具体实施方式2结合起来进行使用,能够解决佩戴耳机的情况下,无法第一时间分辨危险信号,导致危害人身安全的问题,以及和别人与佩戴者交谈时,无法第一时间听清和回答,导致使用者产生社交误解的问题。
需要注意的是,若该实时环境声音对应有左侧实时环境声音和右侧实时环境声音,相应的,所进行的内容检测,可分别对左侧实时环境声音和右侧实时环境声音进行检测,分别得到左侧声音内容和右侧声音内容。如此,在仅在左侧实时环境声音或右侧实时环境声音对应的声音内容与播放标志匹配的情况下,仅播放对应的单侧实时环境声音,以继续保留一个耳机继续播放原声音内容,如音乐或通话声音。
优选地,若声音内容为播放标志中的危险信号匹配,则对实时环境声音进行加强播放。其中,加强播放指对实时环境声音进行重点凸出播放,如仅加大音量后播放该实时环境声音,或循环播放,或将当前音频声音(如正在播放的音乐声音)屏蔽后播放该实时环境声音。举例说明:在采集到的实时 环境声音有车鸣,风啸,车轰鸣的危险信号(对应需要播放实时环境声音的环境标志)时,暂停某个耳机的工作状态,当某个耳机的危险声音强度明显高于另一个声音时,可仅暂停此耳机的工作状态,而且要把环境音加强播放于此耳机,可以让佩戴者下意识正确判定危险信号的来源,进行快速躲避;而在发现无威胁信号之后,便可恢复为原始的状态。
S103、若声音内容与播放标志匹配,则播放实时环境声音或关闭音频播放功能。
在本实施例中,可以预先设置一个或多个播放播放标志,当为多个播放标志时,播放标志的种类可以不同。具体来说,可以设置的播放标志包括语义标志和场景标志。例如,语义标志可以为预先设置的佩戴者的名字,或其他称呼。场景标志可以具体为室外场景、交通路况场景等佩戴者需要听见环境声音的场景。
在检测得到声音内容之后,便可将声音内容与播放标志进行比对,在声音内容中包括了任意一个播放标志的情况下,确定声音内容与播放标志匹配,便可播放该实时环境声音或者关闭音频播放功能。具体的,若电子设备本身不具降噪功能,即在关闭了音频播放功能之后,便可使得使用者能够清晰听见环境声音,则可在确定声音内容与播放标志匹配的情况下,直接关闭音频播放功能。若电子设备本身具有一定的降噪功能,即便关闭音频播放功能,也难以使得使用者听见外界声音,为了保障使用者不错过重要的环境声音,可播放该实时环境声音。
在播放实时环境声音时,可以仅在一个耳机中播放该实时环境使用,也可以在两个耳机中同时播放该实时环境声音。在同时在两个耳机中播放该实时环境声音时,可直接中断当前最终播放的声音内容,转而播放该实时环境声音;也可以在当前播放的声音内容的基础上叠加播放实时环境声音,也可以在当前播放的声音的基础上增加一段提示音之后,再叠加播放实时环境声音,以便提醒佩戴者注意。
在本实施例中,当耳机具体为无线耳机时,还可对耳机的电量进行检查,然后基于电路检测结果调整耳机状态,进而保障耳机的正常工作。具体的,当某个耳机的电量过低于另外一个电量时,如电量差超过10%。主机(手机、平板电脑、电脑)发送一个请求(request),让两个耳机分别响应自己的电 量和工作状态是否为master耳机(主耳机),如果电量低的那个耳机为master,主机迅速将本地存储的耳机状态信息进行切换,并将切换后的状态返回给两个耳机。如此,便可切换主从耳机状态,可以避免那个电量低的master耳机电量不足时,会导致另一个耳机同时会暂停工作,避免了音乐的暂停播放和HFP(Hands-free Profile,蓝牙传输控制,即让蓝牙设备可以控制电话,如接听、挂断、拒接、语音拨号等,拒接、语音拨号要视蓝牙耳机及电话是否支持,是让蓝牙耳机进入高保真通话的一种可设置模式)的暂停。可以避免在TWS耳机的任意一个耳机电量不足时,会导致HFP传输中断和音乐播放暂停,给使用者造成不便的问题。
应用本发明实施例所提供的方法,具有播放功能的电子设备采集实时环境声音;对实时环境声音进行内容检测,得到声音内容;若声音内容与播放标志匹配,则播放实时环境声音或关闭音频播放功能。
在该方法中,可以预先在电子设备中设置播放标志。电子设备采集实时环境声音。然后对实时环境声音进行内容检测,得到对应的声音内容。在发现声音内容与播放标志匹配的情况下,便可对实时环境声音进行播放或关闭音频播放功能。若对实时环境声音进行播放,则电子设备的使用者便可以听见所播放的实时环境声音,避免错误重要的环境声音;若关闭音频播放功能,则电子设备的使用者可以在无音频播放的干扰下,直接听见环境声音。也就是说,该方法中,电子设备通过对环境声音的采集,识别以及在满足特定条件下进行环境声音的播放或直接关闭音频播放功能,能够使得耳机佩戴者能够在正常使用耳机的情况下,也能够听见与播放标志对应的环境声音。
需要说明的是,基于上述实施例,本发明实施例还提供了相应的改进方案。在优选/改进实施例中涉及与上述实施例中相同步骤或相应步骤之间可相互参考,相应的有益效果也可相互参照,在本文的优选/改进实施例中不再一一赘述。
在本发明的一种具体实施方式中,为了更好地模仿真实的实时环境声音,即为了呈现立体声效应(双耳效应),该电子设备可以具体包括间隔设备的至少两个采集端。例如,可在佩戴者的左右两侧都采集实时环境声音,以便 在播放实时环境声音时,可以在两个耳机中播放相应地的实时环境声音。
也就是说,实时环境声音包括左侧实时环境声音和右侧实时环境声音,上述步骤S101,可以具体包括:利用至少两个采集端分别采集对应的实时环境声音;相应地,若声音内容与播放标志匹配,则比对两个采集端分别采集的声音内容的音频大小,并在完成比对后播放高音频对应的采集端采集的实时环境声音。
当然,在实际应用中,还可专门设置一个声音方向判决端,然后根据方向开启对应方向播放端。或者,当具有两个采集端,采用另一侧的播放端播放高频采集端的声音,近端的暂停、或开启通透模式或播放,如此,可达到传递远端环境声音的作用。
以耳机应用场景为例:耳机处于佩戴状态下,利用左麦克风采集左侧实时环境声音,利用右麦克风采集右侧实时环境声音。如此,所采集的左侧实时环境声音和右侧实时环境声音均实时环境声音。在需要播放实时环境声音时,左侧耳机播放左侧实时环境声音,右侧耳机播放实时环境声音,便可接近未佩戴耳机的情况下,正常听环境声音的效果,有利于佩戴者快速定位声音来源方向,做出快速反应。
在本发明的一种具体实施方式中,该电子设备包括与两个采集端对应设置的两个播放端,若声音内容与播放标志匹配,则利用高音频对应采集端所对应的播放端播放高音频对应采集端所采集的实时环境声音。
以耳机应用场景为例:步骤S103若声音内容与播放标志匹配,则播放实时环境声音,可以具体包括:
步骤一、若声音内容与播放标志匹配,则比对左侧实时环境声音和右侧实时环境声音的音频大小,从左侧和右侧中确定出高音频侧;
步骤二、利用高音频侧对应的耳机,播放对应的实时环境声音。
为便于描述,下面将上述两个步骤结合起来进行说明。
确定声音内容中与播放标志匹配的情况下,可以比对一下左侧实时环境声音和右侧实时环境声音的音频大小,然后选出高音音频侧。即,若左侧实时环境声音的音频比右侧实时环境声音的音频大,则确定左侧为高音频侧;若左侧实时环境声音的音频比右侧实时环境声音的音频小,则确定右侧为高 音频侧;若左侧实时环境声音的音频与右侧实时环境声音的音频相同,则可将左侧或右侧确定为高音频侧,当然,也可以将左侧和右侧均确定为高音频侧。
在确定出高音频侧之后,便可直接利用高音频侧对应的耳机,来播放实时环境声音。具体的,若左侧为高音频侧,则利用左侧的耳机来播放左侧实时环境声音,若右侧为高音频侧,则利用右侧的耳机来播放右侧实时环境声音。如此,便可仅在声音较大的一侧来播放相应的实时环境声音,能够基本维持佩戴者的双耳效应,并且同时还可利用另一个耳机继续播放原声音。
在本发明的一种具体实施方式中,在执行步骤S103若声音内容与播放标志匹配,则播放实时环境声音之后,还可以执行环境声音播放退出操作。环境声音播放退出操作,具体包括:
步骤一、采集运动数据;
步骤二、利用运动数据识别当前运动状态;
步骤三、在当前运动状态与预设运动状态之间的运动状态变化满足预设外音停播条件的情况下,停止播放实时环境声音。
为便于描述,下面将上述三个步骤结合起来进行说明。
具体的,可以利用电子设备内部或外部的运动传感器来采集运动数据。其中,运动传感器可以具体为集成在耳机中的运动传感器,也可以为耳机佩戴者单独佩戴的运动传感器。该运动传感器可以具体为能够采集佩戴者细微运动的运动数据,如头部的转动;也可以为能够采集佩戴者运动幅度家电的运动数据,如跑步机。在本实施例中,对运动传感器的具体型号和类型不作限定。
获取到运动数据进行识别,从而获得当前运动状态。具体的,对于如何识别运动数据可以根据具体的运动传感器的具体原理和使用方案策略进行识别,在此不再一一赘述。
其中预设运动状态可以具体为上一次采集的运动状态,也可以具体为设置的一个标准静态姿势对应的状态(如站立状态)。当然,还可以根据实际需求进行自定义设置,在此不再一一列举。
当识别得到当前运动状态与预设运动状态之间的运动状态变化满足预设 外音停播条件的情况下,停止播放实时环境声音。例如,预设外音停播条件可具体为用户的头部转动角度变化在+75度或-75度,头部前行的距离在2米之外,认定交谈结束,停止播放实时环境声音;反之,可确定交谈还在继续,可以继续播放实时环境声音。
可见,在本具体实施方式中,能够基于用户的运动状态的变化情况,推断出交谈状态是否结束,在交谈结束后自动退出播放环境声音的状态,可以根据实际应用情况自由切换电子设备的状态。
在本发明的一种具体实施方式中,在执行步骤S103若声音内容与播放标志匹配,则播放实时环境声音之后,还可以执行环境声音播放退出操作,具体包括:
步骤一、对实时环境声音进行语义识别,得到语义识别结果;
步骤二、若语义识别结果中包括预设交谈结束标志语,则停止播放实时环境声音。
为便于描述,下面将上述三个步骤结合起来进行说明。
在播放实时环境声音的过程中,可以对实时环境声音进行语义识别,得到语义识别结果。当发现语义识别结果中包括了预设交谈结束标志语(如预先设置的再见、Bye,bye等字句),便可立即停止播放实时环境声音,或者等待一段时间(如5秒)之后,停止播放实时环境声音,并恢复至正常的音乐播放或通话声音播放。进行语义识别的具体过程可以参照上文的语义识别说明,在此不再一一赘述。
可见,在本具体实施方式中,能够基于用户的对话内容,自动退出播放环境声音的状态,可以根据实际应用情况自由切换耳机的状态。
需要注意的是,上述基于语义停止播放实时环境声音以及基于运动停止播放实时环境声音的具体实施方式,在实际应用中可以结合起来进行应用。
此外,对于在声音内容与播放标志匹配的情况下,对应处理操作为关闭音频播放功能的处理操作,在满足基于语义停止播放实时环境声音以及基于运动停止播放实时环境声音的情况下,可以开启音频播放功能。
相应于上面的方法实施例,本发明实施例还提供了一种应用于具有播放 功能的电子设备的声音播放装置,下文描述的声音播放装置与上文描述的声音播放方法可相互对应参照。
参见图2所示,该装置包括以下模块:
声音采集模块101,用于采集实时环境声音;
语音识别模块102,用于对实时环境声音进行内容检测,得到声音内容;
播放控制模块103,用于若声音内容与播放标志匹配,则播放实时环境声音或关闭音频播放功能。
应用本发明实施例所提供的装置,采集实时环境声音;对实时环境声音进行内容检测,得到声音内容;若声音内容与播放标志匹配,则播放实时环境声音或关闭音频播放功能。
在该装置中,可以预先在电子设备中设置播放标志。电子设备采集实时环境声音。然后对实时环境声音进行内容检测,得到对应的声音内容。在发现声音内容与播放标志匹配的情况下,便可对实时环境声音进行播放或关闭音频播放功能。若对实时环境声音进行播放,则电子设备的使用者便可以听见所播放的实时环境声音,避免错误重要的环境声音;若关闭音频播放功能,则电子设备的使用者可以在无音频播放的干扰下,直接听见环境声音。也就是说,该装置,电子设备通过对环境声音的采集,识别以及在满足特定条件下进行环境声音的播放或直接关闭音频播放功能,能够使得耳机佩戴者能够在正常使用耳机的情况下,也能够听见与播放标志对应的环境声音。
在本发明的一种具体实施方式中,播放标志为语义标志或场景标志,语音识别模块102,具体用于对实时环境声音进行语义识别,得到语义识别结果;或,对实时环境声音进行场景识别,得到场景识别结果;
相应地,将语义识别结果或场景识别结果确定为声音内容。
在本发明的一种具体实施方式中,播放控制模块103,具体用于若声音内容为播放标志中的危险信号匹配,则对实时环境声音进行加强播放。
在本发明的一种具体实施方式中,电子设备包括间隔设置的至少两个采集端,声音采集模块101,具体用于利用至少两个采集端分别采集对应的实时环境声音;
相应地,播放控制模块103,具体用于若声音内容与播放标志匹配,则比对两个采集端分别采集的声音内容的音频大小,并在完成比对后播放高音 频对应的采集端采集的实时环境声音。
在本发明的一种具体实施方式中,电子设备包括与两个采集端对应设置的两个播放端,播放控制模块103,具体用于若声音内容与播放标志匹配,则利用高音频对应采集端所对应的播放端播放高音频对应采集端所采集的实时环境声音。还包括:在播放实时环境声音之后,状态恢复模块,用于采集运动数据;利用运动数据识别当前运动状态;在当前运动状态与预设运动状态之间的运动状态变化满足预设外音停播条件的情况下,停止播放实时环境声音。
在本发明的一种具体实施方式中,还包括:状态恢复模块,具体用于在播放实时环境声音之后,对实时环境声音进行语义识别,得到语义识别结果;若语义识别结果中包括预设交谈结束标志语,则停止播放实时环境声音。
相应于上面的方法实施例,本发明实施例还提供了一种电子设备,下文描述的一种电子设备与上文描述的一种声音播放方法可相互对应参照。
参见图3所示,该电子设备包括:
采集端301,用于采集实时环境声音;
播放端302,用于播放声音,声音包括实时环境声音;
存储器303,用于存储计算机程序;
处理器304,用于执行计算机程序时实现上述方法实施例的声音播放方法的步骤。
其中,采集端可以具体为如麦克风等能够采集声音的器件,播放端可具体为如喇叭等具有声音播放功能的器件。
上文所描述的声音播放方法中的步骤可以由电子设备的结构实现。
相应于上面的方法实施例,本发明实施例还提供了一种可读存储介质,下文描述的一种可读存储介质与上文描述的一种声音播放方法可相互对应参照。
一种可读存储介质,可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现上述方法实施例的声音播放方法的步骤。
该可读存储介质具体可以为U盘、移动硬盘、只读存储器(Read-Only  Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可存储程序代码的可读存储介质。
本说明书中各个实施例采用并列或者递进的方式描述,每个实施例重点说明的都是与其它实施例的不同之处,各个实施例之间相同或相似部分互相参见即可。对于实施例公开的装置而言,由于其与实施例公开的方法相对应,所以描述的比较简单,相关之处可参见方法部分说明。
本领域普通技术人员还可以理解,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、计算机软件或者二者的结合来实现,为了清楚地说明硬件和软件的可互换性,在上述说明中已经按照功能一般性地描述了各示例的组成及步骤。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。
结合本文中所公开的实施例描述的方法或算法的步骤可以直接用硬件、处理器执行的软件模块,或者二者的结合来实施。软件模块可以置于随机存储器(RAM)、内存、只读存储器(ROM)、电可编程ROM、电可擦除可编程ROM、寄存器、硬盘、可移动磁盘、CD-ROM、或技术领域内所公知的任意其它形式的存储介质中。
还需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括所述要素的过程、方法、物品或者设备中还存在另外的相同要素。

Claims (10)

  1. 一种声音播放方法,其特征在于,应用于具有播放功能的电子设备中,包括:
    采集实时环境声音;
    对所述实时环境声音进行内容检测,得到声音内容;
    若所述声音内容与播放标志匹配,则播放所述实时环境声音或关闭音频播放功能。
  2. 根据权利要求1所述的声音播放方法,其特征在于,所述播放标志为语义标志或场景标志,对所述实时环境声音进行内容检测,得到声音内容,包括:
    对所述实时环境声音进行语义识别,得到语义识别结果;
    或,对所述实时环境声音进行场景识别,得到场景识别结果;
    相应地,将所述语义识别结果或所述场景识别结果确定为所述声音内容。
  3. 根据权利要求2所述的声音播放方法,其特征在于,若所述声音内容为所述播放标志中的危险信号匹配,则对所述实时环境声音进行加强播放。
  4. 根据权利要求1所述的声音播放方法,其特征在于,所述电子设备包括间隔设置的至少两个采集端,所述采集实时环境声音,包括:
    利用至少两个所述采集端分别采集对应的所述实时环境声音;
    相应地,若所述声音内容与所述播放标志匹配,则比对两个所述采集端分别采集的声音内容的音频大小,并在完成比对后播放高音频对应的采集端采集的实时环境声音。
  5. 根据权利要求4所述的声音播放方法,其特征在于,所述电子设备包括与两个所述采集端对应设置的两个播放端,若所述声音内容与所述播放标志匹配,则利用高音频对应采集端所对应的播放端播放高音频对应采集端所采集的实时环境声音。
  6. 根据权利要求1所述的声音播放方法,其特征在于,在播放所述实时环境声音之后,还包括:
    采集运动数据;
    利用所述运动数据识别当前运动状态;
    在所述当前运动状态与预设运动状态之间的运动状态变化满足预设外音 停播条件的情况下,停止播放所述实时环境声音。
  7. 根据权利要求1所述的声音播放方法,其特征在于,在播放所述实时环境声音之后,还包括:
    对所述实时环境声音进行语义识别,得到语义识别结果;
    若所述语义识别结果中包括预设交谈结束标志语,则停止播放所述实时环境声音。
  8. 一种声音播放装置,其特征在于,应用于具有播放功能的电子设备,包括:
    声音采集模块,用于采集实时环境声音;
    语音识别模块,用于对所述实时环境声音进行内容检测,得到声音内容;
    播放控制模块,用于若所述声音内容与播放标志匹配,则播放所述实时环境声音或关闭音频播放功能。
  9. 一种电子设备,其特征在于,包括:
    采集端,用于采集实时环境声音;
    播放端,用于播放声音,所述声音包括所述实时环境声音;
    存储器,用于存储计算机程序;
    处理器,用于执行所述计算机程序时实现如权利要求1至7任一项所述声音播放方法的步骤。
  10. 一种可读存储介质,其特征在于,所述可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述声音播放方法的步骤。
PCT/CN2021/141559 2021-04-29 2021-12-27 一种声音播放方法、装置、电子设备及可读存储介质 WO2022227655A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110475845.6A CN113194383A (zh) 2021-04-29 2021-04-29 一种声音播放方法、装置、电子设备及可读存储介质
CN202110475845.6 2021-04-29

Publications (1)

Publication Number Publication Date
WO2022227655A1 true WO2022227655A1 (zh) 2022-11-03

Family

ID=76980813

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/141559 WO2022227655A1 (zh) 2021-04-29 2021-12-27 一种声音播放方法、装置、电子设备及可读存储介质

Country Status (2)

Country Link
CN (1) CN113194383A (zh)
WO (1) WO2022227655A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113194383A (zh) * 2021-04-29 2021-07-30 歌尔科技有限公司 一种声音播放方法、装置、电子设备及可读存储介质
CN113766383B (zh) * 2021-09-08 2024-06-18 度小满科技(北京)有限公司 一种控制耳机静音的方法和装置
CN116055932B (zh) * 2022-08-12 2023-09-15 荣耀终端有限公司 一种耳机主副耳切换方法

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090138507A1 (en) * 2007-11-27 2009-05-28 International Business Machines Corporation Automated playback control for audio devices using environmental cues as indicators for automatically pausing audio playback
CN101840700A (zh) * 2010-04-28 2010-09-22 宇龙计算机通信科技(深圳)有限公司 基于移动终端的声音识别方法及移动终端
CN107438209A (zh) * 2016-05-27 2017-12-05 易音特电子株式会社 具有助听特征的有源降噪耳机设备
CN110689882A (zh) * 2018-07-04 2020-01-14 上海博泰悦臻网络技术服务有限公司 车辆及其播放设备和多媒体播放自动控制方法
CN113194383A (zh) * 2021-04-29 2021-07-30 歌尔科技有限公司 一种声音播放方法、装置、电子设备及可读存储介质

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103796125A (zh) * 2013-11-21 2014-05-14 广州视源电子科技股份有限公司 一种基于耳机播放的声音调节方法
CN106162413B (zh) * 2016-09-07 2019-11-19 合肥中感微电子有限公司 具体环境声音提醒模式的耳机装置
CN106601272B (zh) * 2016-11-24 2019-09-17 歌尔股份有限公司 耳机及其语音识别方法
CN107799117A (zh) * 2017-10-18 2018-03-13 倬韵科技(深圳)有限公司 识别关键信息以控制音频输出的方法、装置及音频设备
CN108391206A (zh) * 2018-03-30 2018-08-10 广东欧珀移动通信有限公司 信号处理方法、装置、终端、耳机及可读存储介质
CN110691300B (zh) * 2019-09-12 2022-07-19 连尚(新昌)网络科技有限公司 音频播放设备及用于提供信息的方法
CN110719545B (zh) * 2019-09-12 2022-11-08 连尚(新昌)网络科技有限公司 音频播放设备及用于播放音频的方法
CN111491236A (zh) * 2020-04-23 2020-08-04 歌尔科技有限公司 一种主动降噪耳机及其唤醒方法、装置及可读存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090138507A1 (en) * 2007-11-27 2009-05-28 International Business Machines Corporation Automated playback control for audio devices using environmental cues as indicators for automatically pausing audio playback
CN101840700A (zh) * 2010-04-28 2010-09-22 宇龙计算机通信科技(深圳)有限公司 基于移动终端的声音识别方法及移动终端
CN107438209A (zh) * 2016-05-27 2017-12-05 易音特电子株式会社 具有助听特征的有源降噪耳机设备
CN110689882A (zh) * 2018-07-04 2020-01-14 上海博泰悦臻网络技术服务有限公司 车辆及其播放设备和多媒体播放自动控制方法
CN113194383A (zh) * 2021-04-29 2021-07-30 歌尔科技有限公司 一种声音播放方法、装置、电子设备及可读存储介质

Also Published As

Publication number Publication date
CN113194383A (zh) 2021-07-30

Similar Documents

Publication Publication Date Title
WO2022227655A1 (zh) 一种声音播放方法、装置、电子设备及可读存储介质
US11294619B2 (en) Earphone software and hardware
CN105814913B (zh) 对名字敏感的收听装置
CN106162413B (zh) 具体环境声音提醒模式的耳机装置
KR102513461B1 (ko) 헤드폰 시스템
US7986791B2 (en) Method and system for automatically muting headphones
JP2017211640A (ja) 補聴器機能の能動騷音除去ヘッドセット装置
US11822367B2 (en) Method and system for adjusting sound playback to account for speech detection
CN108551604B (zh) 一种降噪方法、降噪装置及降噪耳机
EP2961195A2 (en) Do-not-disturb system and apparatus
WO2020019820A1 (zh) 麦克风堵孔检测方法及相关产品
CN101840700A (zh) 基于移动终端的声音识别方法及移动终端
CN106210960A (zh) 具有本地通话情况确认模式的耳机装置
EP2430753A1 (en) A method and apparatus for providing information about the source of a sound via an audio device
CN106170108A (zh) 具有分贝提醒模式的耳机装置
WO2020019857A1 (zh) 麦克风堵孔检测方法及相关产品
JP2023542968A (ja) 定位されたフィードバックによる聴力増強及びウェアラブルシステム
CN103106060A (zh) 计算机音量调节方法
CN104469587A (zh) 耳机
KR101693483B1 (ko) 헤드셋의 하울링 및 에코 제거 방법 및 컴퓨터 프로그램
WO2015030642A1 (en) Volume reduction for an electronic device
KR101693482B1 (ko) 하울링 및 에코 제거 기능을 갖는 헤드셋
WO2020019822A1 (zh) 麦克风堵孔检测方法及相关产品
TWI665662B (zh) 可攜式電子裝置及音訊播放方法
CN105657605A (zh) 一种安全智能音响

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21939114

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21939114

Country of ref document: EP

Kind code of ref document: A1