CN111046218A

CN111046218A - Audio acquisition method, device and system based on screen locking state

Info

Publication number: CN111046218A
Application number: CN201911271899.XA
Authority: CN
Inventors: 乔会君; 钱萌
Original assignee: Hongtaizhizao Qingdao Information Technology Co Ltd
Current assignee: Hongtaizhizao Qingdao Information Technology Co Ltd
Priority date: 2019-12-12
Filing date: 2019-12-12
Publication date: 2020-04-21

Abstract

The embodiment of the invention discloses an audio acquisition method, device and system method based on a screen locking state, which comprises the following steps: the method comprises the steps that if the intelligent equipment is judged to be playing audio and in a screen locking state, input information of a touch-sensitive display of the intelligent equipment is detected; generating an identifier according to the input information, matching the identifier with audio identification information prestored in the intelligent equipment, and playing audio corresponding to the preset audio identification information matched with the identifier; judging that the intelligent equipment does not play audio and is in a screen locking state, and recording melody information of all currently received external audio in real time; acquiring a first HOA-based gain and a second VBAP-based gain of a speaker generating each external audio; mixing the melody information of the plurality of external audios to obtain a target external audio; the problem of the intelligent equipment that exists among the prior art can't obtain the inside and outside audio frequency of equipment fast, accurately when being in the lock screen state is solved.

Description

Audio acquisition method, device and system based on screen locking state

Technical Field

The embodiment of the invention relates to the technical field of audio processing, in particular to an audio acquisition method, device and system based on a screen locking state.

Background

With the rapid development of network technology, the functions of the existing intelligent devices are more and more powerful, and users can realize playing functions of videos, audios and the like through the intelligent devices. However, when an application program in the intelligent device is in a screen-locked state during audio playing, a user wants to switch audio, and usually only can complete the switching and pause functions of adjacent audio, and cannot complete searching and selecting specific audio on a screen-locked interface. Meanwhile, a user may occasionally hear some good-sounding songs played around in a subway or other public places, and although song melodies can be obtained in a recording mode in the prior art, song names are searched according to the melodies, the public places are noisy in sound and many in interference factors, so that the sound source positioning effect is poor.

Disclosure of Invention

Therefore, the embodiment of the invention provides an audio acquisition method, device and system based on a screen locking state, so as to solve the problem that the internal and external audio of an intelligent device cannot be quickly and accurately acquired when the intelligent device is in the screen locking state in the prior art.

In order to achieve the above object, the embodiments of the present invention provide the following technical solutions:

an audio acquisition method based on a screen locking state, the method comprising:

the method comprises the steps that if the intelligent equipment is judged to be playing audio and in a screen locking state, input information of a touch-sensitive display of the intelligent equipment is detected;

generating an identifier according to the input information, matching the identifier with audio identification information prestored in the intelligent equipment, and playing audio corresponding to the preset audio identification information matched with the identifier;

judging that the intelligent equipment does not play audio and is in a screen locking state, and recording melody information of all currently received external audio in real time;

acquiring a first HOA-based gain and a second VBAP-based gain of a speaker generating each external audio; configuring mixing weights for the loudspeakers, and determining weight coefficients of the first gain and the second gain according to the mixing weights; determining the mixing gain of each loudspeaker according to the first gain, the second gain and the respective weight coefficient, and performing sound mixing processing on the melody information of a plurality of external audios to obtain a target external audio;

and generating corresponding audio identification information according to the target external audio, acquiring the song where the target external audio is located from a database of a background server system based on the audio identification information, and playing the song.

Further, still include:

if the identifier is successfully matched with the audio identification information prestored in the intelligent equipment, sending the audio corresponding to the preset audio identification information matched with the identifier to a target list; wherein the target list comprises at least one of an audio playlist or an interface display list.

Further, still include:

the mixing gain of each of the loudspeakers is determined according to the following formula:

g_mn(t)＝w_n(t)g_HOAn(t)+(1-w_n(t))g_VBAPn(t)

wherein, g_mn(t) denotes a mixing gain of the nth speaker, w_n(t) represents the mixing weight, g_HOAn(t) denotes a first gain of the nth speaker, g_VBAPn(t) represents a second gain of the nth speaker, and t represents time.

Further, configuring the mixing weights for the respective speakers comprises:

acquiring an audio training sample, and training the audio training sample based on a multilayer convolutional neural network and a full-connection layer network model;

acquiring an input audio of a current loudspeaker, and extracting a multi-channel spectrogram of the input audio;

and inputting the multi-channel spectrogram into a trained model, and taking a result output by the trained model as the mixing weight of the current loudspeaker.

Further, the multi-layered convolutional neural network is configured to extract feature information from the multi-channel speech spectrogram, and convolutional layers and pooling layers in the convolutional neural network are configured to respond to translational invariance of the feature information.

Further, still include:

constructing an index combination comprising the preset audio identification information and the audio information, and storing the index combination into a database of a server system or a storage system in the equipment; wherein the audio information includes at least one of audio title information, audio content information, and audio tune information;

and searching the audio information corresponding to the preset audio identification information by utilizing the corresponding relation between the preset audio identification information in the index combination and the audio identification information generated according to the input information.

Further, the identifier includes at least one of audio initial information, audio keyword information, and audio tune information.

The invention also provides an audio acquisition device based on the screen locking state, which comprises:

the first detection unit is used for detecting input information of a touch-sensitive display of the intelligent equipment if the intelligent equipment is judged to be playing audio and in a screen locking state;

the first control unit is used for generating an identifier according to the input information, matching the identifier with audio identification information prestored in the intelligent equipment and playing audio corresponding to preset audio identification information matched with the identifier;

the second detection unit is used for judging that the intelligent equipment does not play audio and is in a screen locking state, and recording melody information of all currently received external audio in real time;

a gain unit for acquiring a first HOA-based gain and a second VBAP-based gain of a speaker generating each external audio; configuring mixing weights for the loudspeakers, and determining weight coefficients of the first gain and the second gain according to the mixing weights; determining the mixing gain of each loudspeaker according to the first gain, the second gain and the respective weight coefficient, and performing sound mixing processing on the melody information of a plurality of external audios to obtain a target external audio;

and the second control unit is used for generating corresponding audio identification information according to the target external audio, acquiring the song where the target external audio is located from a database of the background server system based on the audio identification information, and playing the song.

The invention also provides an audio acquisition system based on the screen locking state, which comprises: a processor and a memory;

the memory is to store one or more program instructions;

the processor is configured to execute one or more program instructions to perform the method as described above.

The present invention also provides a computer storage medium having one or more program instructions embodied therein for executing the method as described above by an audio capture system based on a lock screen status.

By adopting the audio acquisition method, the device and the system based on the screen locking state, the audio identification information which is established in advance can be quickly and accurately switched to the audio information meeting the user requirement in the equipment frequency locking state, so that the complicated operation steps in the audio switching process in the frequency locking state are avoided, and the use experience of the user is improved; meanwhile, based on the HOA and the object audio technology, an optimal processing mode is selected in a self-adaptive mode according to the external audio content, a plurality of external audios are processed, and the sound source position is accurately positioned under the condition that various audios are mixed. Therefore, the internal audio can be acquired in the screen locking state, and the external audio can be accurately acquired, so that the problem that the internal and external audio of the equipment can not be acquired quickly and accurately when the intelligent equipment in the prior art is in the screen locking state is solved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It should be apparent that the drawings in the following description are merely exemplary, and that other embodiments can be derived from the drawings provided by those of ordinary skill in the art without inventive effort.

The structures, ratios, sizes, and the like shown in the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used to limit the conditions that the present invention can be implemented, so that the present invention has no technical significance, and any structural modifications, changes in the ratio relationship, or adjustments of the sizes, without affecting the effects and the achievable by the present invention, should still fall within the range that the technical contents disclosed in the present invention can cover.

Fig. 1 is a flowchart of an audio obtaining method based on a screen locking state according to an embodiment of the present invention;

fig. 2 is a schematic diagram of an audio acquisition apparatus based on a screen locking state according to an embodiment of the present invention;

fig. 3 is a schematic application diagram of an application program including an audio obtaining method based on a lock screen state according to an embodiment of the present invention.

Description of reference numerals:

100-first detecting unit 200-first control unit 300-second detecting unit

400-gain unit 500-second control unit

Detailed Description

The present invention is described in terms of particular embodiments, other advantages and features of the invention will become apparent to those skilled in the art from the following disclosure, and it is to be understood that the described embodiments are merely exemplary of the invention and that it is not intended to limit the invention to the particular embodiments disclosed. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the foregoing specific embodiment, as shown in fig. 1, the audio obtaining method based on the screen locking state provided by the present invention includes the following steps:

s101: and if the intelligent device is judged to be playing audio and in a screen locking state, detecting input information of a touch-sensitive display of the intelligent device.

In the embodiment of the invention, the intelligent device can be a device for playing audio, such as a smart phone, a tablet computer, a vehicle-mounted audio player and the like, and the intelligent device is generally provided with a touch-sensitive display, so that the function of rapid human-computer interaction can be realized.

Specifically, when the touch-sensitive display enters a screen locking state during the audio playing process of the device, the input information of the user can be acquired through a floating window which is preset on the touch-sensitive display of the device and carries an input area. The input information may be initial information for audio information, keyword information, audio tune information, and the like. It should be noted that the floating window with the input area preset on the touch-sensitive display according to the present invention can be dragged by a finger touch action according to the editing writing habit of the user, so as to change the position of the floating window on the touch-sensitive display and adapt to the usage habit of the user.

S102: and generating an identifier according to the input information, matching the identifier with audio identification information prestored in the intelligent equipment, and playing the audio corresponding to the preset audio identification information matched with the identifier.

In step S101, when the device plays an audio and is in a screen lock state, after detecting input information of a touch-sensitive display of the device, generating an identifier in this step, and performing a data preparation operation on the identifier and preset audio identification information. In step S102, an identifier may be generated according to the input information, and the identifier may be matched with preset audio identification information.

In the embodiment of the present invention, before obtaining the input information of the user, identifiers corresponding to different audio information need to be preset, and the preset identifiers corresponding to different audio information are stored in a storage system of the device or a database of the server system in advance.

The method further comprises the following steps: constructing an index combination comprising the preset audio identification information and the audio information, and storing the index combination into a database of a server system or a storage system in the equipment; wherein the audio information includes at least one of audio title information, audio content information, and audio tune information; and searching the audio information corresponding to the preset audio identification information by utilizing the corresponding relation between the preset audio identification information in the index combination and the audio identification information generated according to the input information. The identifier includes at least one of audio initial information, audio keyword information, and audio tune information.

Specifically, the index combination may be pre-stored in a storage system of the device or a database of the server system by constructing an index combination mode including the preset audio identification information and the audio information. Further, the audio identification information generated based on the input information may be matched with preset audio identification information in the index combination, and by using the correspondence between the two, the audio information corresponding to the preset audio identification information is searched in a database of a storage system or a background server system of the device, so that the audio including complete audio data is obtained correspondingly according to the audio information index.

Wherein the audio information includes at least one of audio title information, audio content information, and audio sound information, such as: the name of the song, a lyric fragment of the song, the author of the song, or a sound fragment of the song. The audio is based on the audio information, and the audio data correspondingly searched and containing complete audio data, such as: a complete song or a set of albums of songs. The audio identification information may refer to an identification character string generated corresponding to audio initial information, audio keyword information, or audio tune information.

It should be noted that the implementation manner of the embodiment of the present invention is not limited to the above-listed contents, and in many cases, the initial information input by the user may be only the initial information of the title of the song part, for example: "A, D, G", "A, D", or "A", etc. At this time, the obtained matching items may include a plurality of items, such as: when the initial information is entered as "a", the matching song may include a list of all audio components with the initial a; when the initial information is entered as "A, D," the matching song is further narrowed to: a list of audio with a first letter A and a second letter D; at this time, the user can directly select a song to be played from the above matching list to play. If the user thinks that the returned list content is still more, the input initial content can be further increased, and then a more accurate result is matched.

Correspondingly, the switching playing mode using the keyword information as input is similar to the switching playing mode using the initial information as input, and the specific implementation process may refer to the technical scheme disclosed in the switching playing mode using the initial information as input, which is not described in detail herein.

After the audio identification information is matched with the preset audio identification information in step S102, a preparation is made for controlling the application program of the smart device to switch to the audio corresponding to the preset audio identification information matched with the audio identification information for playing in this step. In step S102, according to whether the matching is successful, the application program of the smart device may be switched to the audio corresponding to the preset audio identification information matched with the audio identification information for playing.

In the embodiment of the present invention, if the audio identification information generated according to the input information matches corresponding audio information in the database of the server system or the device storage system, the application program for playing the audio of the device is controlled to automatically switch to the audio corresponding to the preset audio identification information matching the audio identification information for playing. In addition, the audio corresponding to the preset audio identification information matched with the audio identification information can be sent to a target list of a screen locking interface.

It should be noted that the target list includes at least one of an audio playlist or an interface display list. The audio playing list is a directory list for storing audio to be played, and the interface display list is a directory list which is returned according to the audio identification information and contains matched audio.

In addition, in many cases in daily life, a user may occasionally hear a surrounding song that is played better, but does not know the name of the song. At this time, the user can also enter tune information of a currently heard song in real time through a floating window which is provided with an input area and is arranged on a mobile phone screen, a processor module in the device can generate corresponding audio identification information according to the tune information, the audio identification information is matched with the audio identification information which is preset in a database of a background server system or a storage system of the device based on the audio identification information, and if the matching is successful, the matched song is switched to the first position of a list which is played currently. It should be noted that, due to the influence of ambient noise, the matching result that may be returned by switching the playing mode using the audio tune information as an input is a plurality of audios, and the audios are presented in the form of a list on the lock screen interface, so that the user may further manually select a song that is desired to be played.

Specifically, in order to realize the reception of external audio and ensure the accuracy of the audio source, the method further comprises the following steps:

s103: judging that the intelligent equipment does not play audio and is in a screen locking state, and recording melody information of all currently received external audio in real time;

s104: acquiring a first HOA-based gain and a second VBAP-based gain of a speaker generating each external audio; configuring mixing weights for the loudspeakers, and determining weight coefficients of the first gain and the second gain according to the mixing weights; determining the mixing gain of each loudspeaker according to the first gain, the second gain and the respective weight coefficient, and performing sound mixing processing on the melody information of a plurality of external audios to obtain a target external audio;

s105: and generating corresponding audio identification information according to the target external audio, acquiring the song where the target external audio is located from a database of a background server system based on the audio identification information, and playing the song.

Specifically, the mixing gain of each of the speakers is determined according to the following formula:

g_mn(t)＝w_n(t)g_HOAn(t)+(1-w_n(t))g_VBAPn(t)

Configuring a mixing weight for each of the speakers includes:

The multi-layer convolutional neural network is used for extracting characteristic information from the multi-channel spectrogram, and convolutional layers and pooling layers in the convolutional neural network are used for responding to translation invariance of the characteristic information.

In the above specific embodiment, by using the audio acquisition method based on the screen locking state provided by the invention, the audio information meeting the user requirements can be quickly and accurately switched through the pre-established audio identification information in the device frequency locking state, so that the complicated operation steps in switching the audio in the frequency locking state are avoided, and the user experience is improved; meanwhile, based on the HOA and the object audio technology, an optimal processing mode is selected in a self-adaptive mode according to the external audio content, a plurality of external audios are processed, and the sound source position is accurately positioned under the condition that various audios are mixed. Therefore, the internal audio can be acquired in the screen locking state, and the external audio can be accurately acquired, so that the problem that the internal and external audio of the equipment can not be acquired quickly and accurately when the intelligent equipment in the prior art is in the screen locking state is solved.

Corresponding to the above method, the present invention further provides an audio obtaining apparatus based on a screen locking state, as shown in fig. 2, the apparatus includes:

the first detection unit 100 is configured to detect input information of a touch-sensitive display of the smart device if it is determined that the smart device is playing an audio and is in a screen-locked state;

the first control unit 200 is configured to generate an identifier according to the input information, match the identifier with audio identification information pre-stored in the smart device, and play an audio corresponding to preset audio identification information matched with the identifier;

the second detection unit 300 is configured to determine that the smart device is not playing audio and is in a screen-locked state, and then record melody information of all currently received external audio in real time;

a gain unit 400 for acquiring a first HOA-based gain and a second VBAP-based gain of speakers generating respective external audio; configuring mixing weights for the loudspeakers, and determining weight coefficients of the first gain and the second gain according to the mixing weights; determining the mixing gain of each loudspeaker according to the first gain, the second gain and the respective weight coefficient, and performing sound mixing processing on the melody information of a plurality of external audios to obtain a target external audio;

and the second control unit 500 is configured to generate corresponding audio identification information according to the target external audio, acquire a song where the target external audio is located from a database of the backend server system based on the audio identification information, and play the song.

In the above specific embodiment, by using the audio acquisition device based on the screen locking state provided by the invention, the audio information meeting the user requirements can be quickly and accurately switched through the pre-established audio identification information in the equipment frequency locking state, so that the complicated operation steps in switching the audio in the frequency locking state are avoided, and the user experience is improved; meanwhile, based on the HOA and the object audio technology, an optimal processing mode is selected in a self-adaptive mode according to the external audio content, a plurality of external audios are processed, and the sound source position is accurately positioned under the condition that various audios are mixed. Therefore, the internal audio can be acquired in the screen locking state, and the external audio can be accurately acquired, so that the problem that the internal and external audio of the equipment can not be acquired quickly and accurately when the intelligent equipment in the prior art is in the screen locking state is solved.

According to a third aspect of the embodiments of the present invention, the present invention further provides an audio acquiring system based on a screen locking state, as shown in fig. 3, the system includes: a processor 201 and a memory 202;

the memory is to store one or more program instructions;

In correspondence with the above embodiments, embodiments of the present invention also provide a computer storage medium containing one or more program instructions therein. Wherein the one or more program instructions are for executing the method as described above by an audio capture system based on a lock screen status.

Although the invention has been described in detail above with reference to a general description and specific examples, it will be apparent to one skilled in the art that modifications or improvements may be made thereto based on the invention. Accordingly, such modifications and improvements are intended to be within the scope of the invention as claimed.

Claims

1. A method for acquiring audio based on a screen locking state is characterized by comprising the following steps:

2. The audio obtaining method based on the screen locking state according to claim 1, further comprising:

3. The audio obtaining method based on the screen locking state according to claim 1, further comprising:

g_mn(t)＝w_n(t)g_HOAn(t)+(1-w_n(t))g_VBAPn(t)

4. The method of claim 3, wherein configuring the mixing weight for each speaker comprises:

5. The lock screen state-based audio acquisition method according to claim 4, wherein the multilayer convolutional neural network is used for extracting feature information from the multi-channel spectrogram, and convolutional layers and pooling layers in the convolutional neural network are used for responding to translational invariance of the feature information.

6. The audio obtaining method based on the screen locking state according to claim 1, further comprising:

7. The lock screen state-based audio acquisition method according to claim 6, wherein the identifier includes at least one of audio initial information, audio keyword information, and audio tune information.

8. An audio acquisition device based on a lock screen state, the device comprising:

9. An audio acquisition system based on a lock screen state, the system comprising: a processor and a memory;

the memory is to store one or more program instructions;

the processor, configured to execute one or more program instructions to perform the method of any of claims 1-7.

10. A computer storage medium comprising one or more program instructions for performing the method of any of claims 1-7 by an audio acquisition system based on a lock screen status.