CN111681630A

CN111681630A - Audio identification method, terminal and storage medium

Info

Publication number: CN111681630A
Application number: CN202010357312.3A
Authority: CN
Inventors: 陈鹏
Original assignee: Nubia Technology Co Ltd
Current assignee: Nubia Technology Co Ltd
Priority date: 2020-04-29
Filing date: 2020-04-29
Publication date: 2020-09-18

Abstract

The invention provides an audio identification method, a terminal and a storage medium, aiming at the problem of low accuracy of the existing audio identification, the audio data in a media file played by the terminal is extracted; filtering the audio data to obtain background audio data; identifying background audio data; the recognition result is prompted through an output unit of the terminal, so that the audio data are filtered after being extracted, background audio data reflecting background music more accurately are obtained, accuracy of follow-up recognition is improved, and user experience is improved.

Description

Audio identification method, terminal and storage medium

Technical Field

The present invention relates to the field of terminal technologies, and in particular, to an audio recognition method, a terminal, and a storage medium.

Background

When a user uses the terminal, if the user watches videos or listens to audios, the user often encounters good-listening background music, the source of the background music is determined or the background music is stored, but the background music is inconvenient to store because of the background sound, and even if the background music is extracted by other editing means, only a piece of impure music is obtained, so that the user feels regret; wanting to know the song name of current background music, can only record background music alone through other equipment then through listening song recognition or with APP on other cell-phones class equipment to record current background music then discern background music, can need more equipment or software cooperation to use like this still troublesome, because some audio content, the inside still is mingled with the people and is also difficult discerning to coming out to white and other scene sound yet, leads to the recognition rate low, influences user experience.

Disclosure of Invention

The technical problem to be solved by the present invention is to solve the problems of low recognition rate of audio recognition modes and poor user experience in the existing terminal, and to solve the technical problem, an audio recognition method is provided, and the audio recognition method includes:

extracting audio data in a media file being played by a terminal;

filtering the audio data to obtain background audio data;

identifying the background audio data;

and prompting the identification result through an output unit of the terminal.

Optionally, the filtering the audio data to obtain background audio data includes:

and filtering the audio data according to a noise reduction algorithm to obtain the background audio data.

Optionally, the filtering the audio data according to the noise reduction algorithm includes:

reducing the loudness of the scene sounds and/or the dialogue sounds in the audio data, and/or,

the scene sound and/or the dialogue sound is at least partially removed.

and cutting the audio data according to the audio type in the audio data, and reserving the background music part in the audio data to obtain the background audio data.

Optionally, the media file includes a pure audio file or a video file with audio content, among the audio data in the media file being played by the extraction terminal.

Optionally, the prompting the recognition result through the output unit of the terminal includes:

and prompting the source information corresponding to the background audio data through an output unit on the terminal.

Optionally, the output unit of the terminal includes at least one of a display unit and an audio playing unit;

the prompting the source information corresponding to the background audio data through an output unit on the terminal comprises:

when the output unit comprises a display unit, directly displaying the source information of the background audio data on the display unit;

and when the output unit comprises an audio playing unit, playing the source information of the background audio data through the audio playing unit.

Optionally, the identifying the background audio data includes:

playing the background audio data through an audio playing unit on the terminal;

and collecting the played background audio data and identifying the background audio data.

The embodiment of the invention also provides a terminal, which comprises a processor, a memory, a disk and a communication bus;

the communication bus is used for realizing the connection communication among the processor, the memory and the disk;

the processor is configured to execute one or more programs stored in the memory to implement the steps of the audio recognition method described above.

An embodiment of the present invention further provides a computer-readable storage medium, where one or more programs are stored in the computer-readable storage medium, and the one or more programs are executable by one or more processors to implement the steps of the audio recognition method described above.

Advantageous effects

Drawings

The invention will be further described with reference to the accompanying drawings and examples, in which:

fig. 1 is a schematic diagram of a hardware structure of an alternative mobile terminal for implementing various embodiments of the present invention;

FIG. 2 is a diagram of a wireless communication system for the mobile terminal shown in FIG. 1;

FIG. 3 is a basic flowchart of an audio recognition method according to a first embodiment of the present invention;

FIG. 4 is a detailed flowchart of an audio recognition method according to a second embodiment of the present invention;

fig. 5 is a schematic composition diagram of a terminal according to a third embodiment of the present invention.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.

The terminal may be implemented in various forms. For example, the terminal described in the present invention may include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and a fixed terminal such as a Digital TV, a desktop computer, and the like.

The following description will be given by way of example of a mobile terminal, and it will be understood by those skilled in the art that the construction according to the embodiment of the present invention can be applied to a fixed type terminal, in addition to elements particularly used for mobile purposes.

Referring to fig. 1, which is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present invention, the mobile terminal 100 may include: RF (Radio Frequency) unit 101, WiFi module 102, audio output unit 103, a/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 1 is not intended to be limiting of mobile terminals, which may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile terminal in detail with reference to fig. 1:

the radio frequency unit 101 may be configured to receive and transmit signals during information transmission and reception or during a call, and specifically, receive downlink information of a base station and then process the downlink information to the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA2000(Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division duplex-Long Term Evolution), and TDD-LTE (Time Division duplex-Long Term Evolution).

WiFi belongs to short-distance wireless transmission technology, and the mobile terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 102, and provides wireless broadband internet access for the user. Although fig. 1 shows the WiFi module 102, it is understood that it does not belong to the essential constitution of the mobile terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 into an audio signal and output as sound when the mobile terminal 100 is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.

The a/V input unit 104 is used to receive audio or video signals. The a/V input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, the Graphics processor 1041 Processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the WiFi module 102. The microphone 1042 may receive sounds (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, or the like, and may be capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.

The mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Specifically, the light sensor includes an ambient light sensor that can adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that can turn off the display panel 1061 and/or a backlight when the mobile terminal 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Specifically, the user input unit 107 may include a touch panel 1071 and other input devices 1072. The touch panel 1071, also referred to as a touch screen, may collect a touch operation performed by a user on or near the touch panel 1071 (e.g., an operation performed by the user on or near the touch panel 1071 using a finger, a stylus, or any other suitable object or accessory), and drive a corresponding connection device according to a predetermined program. The touch panel 1071 may include two parts of a touch detection device and a touch controller. The touch detection device detects the touch direction of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. In addition, the touch panel 1071 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. In particular, other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like, and are not limited to these specific examples.

Further, the touch panel 1071 may cover the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although the touch panel 1071 and the display panel 1061 are shown in fig. 1 as two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the mobile terminal, and is not limited herein.

The interface unit 108 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal 100 and external devices.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required by at least one function (such as a sound playing function, an image playing function, etc.), and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the mobile terminal. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor, which mainly handles operating systems, user interfaces, application programs, etc., and a modem processor, which mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The mobile terminal 100 may further include a power supply 111 (e.g., a battery) for supplying power to various components, and preferably, the power supply 111 may be logically connected to the processor 110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.

Although not shown in fig. 1, the mobile terminal 100 may further include a bluetooth module or the like, which is not described in detail herein.

In order to facilitate understanding of the embodiments of the present invention, a communication network system on which the mobile terminal of the present invention is based is described below.

Referring to fig. 2, fig. 2 is an architecture diagram of a communication Network system according to an embodiment of the present invention, where the communication Network system is an LTE system of a universal mobile telecommunications technology, and the LTE system includes a UE (User Equipment) 201, an E-UTRAN (Evolved UMTS Terrestrial Radio Access Network) 202, an EPC (Evolved Packet Core) 203, and an IP service 204 of an operator, which are in communication connection in sequence.

Specifically, the UE201 may be the terminal 100 described above, and is not described herein again.

The E-UTRAN202 includes eNodeB2021 and other eNodeBs 2022, among others. Among them, the eNodeB2021 may be connected with other eNodeB2022 through backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide the UE201 access to the EPC 203.

The EPC203 may include an MME (Mobility Management Entity) 2031, an HSS (Home Subscriber Server) 2032, other MMEs 2033, an SGW (Serving gateway) 2034, a PGW (PDN gateway) 2035, and a PCRF (Policy and charging functions Entity) 2036, and the like. The MME2031 is a control node that handles signaling between the UE201 and the EPC203, and provides bearer and connection management. HSS2032 is used to provide registers to manage functions such as home location register (not shown) and holds subscriber specific information about service characteristics, data rates, etc. All user data may be sent through SGW2034, PGW2035 may provide IP address assignment for UE201 and other functions, and PCRF2036 is a policy and charging control policy decision point for traffic data flow and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).

The IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem), or other IP services, among others.

Although the LTE system is described as an example, it should be understood by those skilled in the art that the present invention is not limited to the LTE system, but may also be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and future new network systems.

Based on the above mobile terminal hardware structure and communication network system, the present invention provides various embodiments of the method.

First embodiment

Fig. 3 is a basic flowchart of an audio recognition method provided in this embodiment, where the audio recognition method includes:

s301, extracting audio data in a media file played by a terminal;

s302, filtering the audio data to obtain background audio data;

s303, identifying background audio data;

and S304, presenting the identification result through an output unit of the terminal.

The audio identification method in the embodiment relates to a scheme for identifying audio data in a media file to obtain an accurate source of background music. In S301, audio data in a media file being played by the terminal is extracted, and the media file may include a pure audio file or a video file with audio content according to a specific type. The pure Audio file refers to that the Media file itself belongs to an Audio file, and usually mp3(Moving Picture Experts group Audio Layer III), wma (Windows Media Audio, microsoft Audio format), amr (Adaptive Multi-Rate), and wav (waveform sound file) are used as file formats, and the file in the format reflects that the file is an Audio file, and can be played by a specified player, so that a user can listen to Audio contents. A Video file with Audio content means that the Media file is a Video file in nature, and the Video file is not a silent Video file, and is a Video file containing Audio content, which usually takes MPEG (Moving Picture expert group format), AVI (Audio Video Interleaved), WMV (Windows Media Video format), RMVB (real Media Variable bit format), FLV (Flash Video format), etc. as a file format, and the file format reflects that the file is a Video file. It should be noted that not all video files have audio content, and the video files referred to in this application refer to video files having audio content.

Whether a pure audio file or a video file with audio content, audio data can be extracted from the audio file, and the process of extracting audio data means that data other than audio data, such as image data in a video file, is stripped from a media file.

In S302, the audio data is filtered to obtain background audio data. In order to make the result of audio recognition as accurate as possible, the audio content used for recognition should also be as close as possible to the background music bgm (background music) itself, and the other audio content contained therein should be as little as possible, because the more irrelevant audio content, the greater the impact on the accuracy of the recognition result. Therefore, the audio data can be filtered to obtain background audio data. The filtering means is to reduce the influence that other audio contents other than BGM may have on the recognition result.

In some embodiments, the filtering the audio data to obtain the background audio data may specifically include:

and filtering the audio data according to a noise reduction algorithm to obtain background audio data. The sound and noise are different in parameters such as loudness and frequency, and the characteristic that the noise is different from the sound can be used as a means for filtering the audio data.

Specifically, according to the noise reduction algorithm, the filtering the audio data may include:

the loudness of the scene sounds and/or the dialogue sounds in the audio data is reduced, and/or,

the scene sound and/or the dialogue sound is at least partially removed. Generally speaking, the non-BGM part of the audio data may include a scene sound, a character dialogue sound, and the like, where the scene sound may be an environmental sound when the character is recorded in dialogue, and the character dialogue sound refers to a sound of a character speaking, and the character dialogue sound is often used to reflect the essential content of the media file, but in the audio recognition process, the character dialogue becomes burdensome in the audio recognition, and therefore needs to be removed as much as possible. The accuracy of audio identification can be improved to a certain extent for both scene sound and dialogue sound as long as at least one of the scene sound and dialogue sound is processed correspondingly.

The filtering of the scene sound and/or the white sound in the audio data may specifically be to reduce the loudness of the scene sound and/or the white sound, that is, the sound size, so as to make the scene sound and/or the white sound smaller, or to directly remove at least part of the scene sound and/or the white sound through a noise reduction algorithm. The removal means that the scene sound and/or the dialogue sound within a certain interval are directly removed completely.

In some embodiments, filtering the audio data to obtain the background audio data may specifically include:

and cutting the audio data according to the audio type in the audio data, and reserving the background music part in the audio data to obtain the background audio data. For BGM, there must be a certain duration; the audio identification usually does not need full music identification, namely, only needs to identify a part of audio content to identify the source of the whole audio data; then, the audio data can be cut directly according to the distribution of the audio data, and only the part of the audio data, in which the BGM is used for identification, is reserved, namely the background audio data; the audio data containing other audio contents is directly removed, including the BGM part thereof. Therefore, the obtained background audio data corresponds to the audio content of pure BGM, and the accuracy of audio identification can be improved.

In some embodiments, prompting the recognition result through an output unit of the terminal may include:

and prompting the source information corresponding to the background audio data through an output unit on the terminal. The source information may be a single piece of information or a combination of information such as a download link address, a corresponding BGM name, singer information, and album information, and specifically, the source information may be any information that can determine a data source corresponding to the background audio data. The output unit of the terminal can comprise a display unit, an audio playing unit and the like according to the type of the output unit, and the output unit in the embodiment comprises at least one of the above units; the prompting method is different according to different types of the output unit, and specifically, the prompting of the source information corresponding to the background audio data through the output unit on the terminal may include:

when the output unit comprises a display unit, directly displaying the source information of the background audio data on the display unit; the display unit refers to a display screen on the terminal, and can directly display the source information of the background audio data on the display screen of the terminal so as to prompt a user.

When the output unit comprises the audio playing unit, the source information of the background audio data is played through the audio playing unit. The audio playing unit may be a speaker or a receiver on the terminal, or other audio peripherals connected to the audio playing unit through an audio connection interface on the terminal, such as a wired earphone, a wireless earphone, a sound box, and the like, and prompts the source information of the background audio data to the user through a sound form.

In some embodiments, identifying the background audio data specifically includes:

and collecting the played background audio data and identifying the background audio data. The identification mode of the background audio data can be realized by externally playing the background audio data and then calling audio identification software on the terminal to identify the played audio, and the identification is based on the internet or a music library on a server for matching, so that the source information is obtained.

The embodiment provides an audio identification method, which aims at the problem of low accuracy of the existing audio identification, and extracts audio data in a media file played by a terminal; filtering the audio data to obtain background audio data; identifying background audio data; the recognition result is prompted through an output unit of the terminal, so that the audio data are filtered after being extracted, background audio data reflecting background music more accurately are obtained, accuracy of follow-up recognition is improved, and user experience is improved.

Second embodiment

Fig. 4 is a detailed flowchart of an audio recognition method according to a second embodiment of the present invention, where the audio recognition method includes:

s401, the user starts a video playing application on the terminal and plays the video file.

The user can apply the audio identification method in the embodiments of the application on any video playing application, and the audio identification method in the embodiments of the application has no limitation on software; the video file should have audio content.

S402, when the user needs to know the background music in the video file, a triggering instruction is input to the terminal.

The input time of the trigger instruction is judged by the user, namely when the user is interested in the background music played in the video, the user inputs the trigger instruction to the terminal. The triggering instruction may exist in the software interaction logic of the terminal in a preset manner, such as: may be achieved by a specific sliding operation; the method can be realized by specific combination operation of keys or a touch screen on the terminal; can be realized by voice control instructions; or a specific UI interface is provided on the terminal for the user to trigger, so that the input of the trigger instruction is realized.

S403, the terminal receives the trigger instruction and extracts the audio data in the video file being played.

When the terminal receives the trigger instruction, that is, it is understood that the user wants to know the background music corresponding to the currently played video file, the audio identification process in the embodiments of the present application is started, that is, the audio data corresponding to the video file is extracted from the video file.

S404, filtering the extracted audio data to obtain background audio data.

The audio data is filtered to 'purify' the background music, and the obtained background audio data is closer to the playing effect of the background music to a certain extent compared with the original audio data. The filtering operation at this time may be to reduce the loudness of the non-background music, or to directly remove the non-background music, or to directly cut off the audio data portion containing the non-background music in the audio data, and only keep the portion containing the background music as the background audio data.

S405, identifying the background audio data to obtain the source information of the background audio data.

The identification method of the background audio data may be to play the background audio data through an audio output unit on the terminal, for example, through a speaker, so as to obtain the audio content, identify the audio content, and obtain the source information.

And S406, displaying the source information on a display screen of the terminal.

In a display mode, a user can visually know the source information of the background music, and the source information can be the name, author, belonging album, download link and the like of the background music.

Third embodiment

Fig. 5 is a schematic diagram of a terminal according to a third embodiment of the present invention, where the terminal includes a processor 51, a memory 52, a disk 53 and a communication bus 54;

the communication bus 54 is used for realizing connection communication among the processor 51, the memory 52 and the disk 53;

the processor 51 is configured to execute one or more programs stored in the memory 52 to implement the steps of the audio recognition method in the foregoing embodiments, which are not described herein again.

Fourth embodiment

The present embodiment also provides a computer-readable storage medium, where one or more computer programs are stored in the computer-readable storage medium, and the one or more computer programs may be executed by one or more processors to implement the steps of the audio identification method in the foregoing embodiments, which are not described herein again.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

While the present invention has been described with reference to the embodiments shown in the drawings, the present invention is not limited to the embodiments, which are illustrative and not restrictive, and it will be apparent to those skilled in the art that various changes and modifications can be made therein without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An audio recognition method, comprising:

extracting audio data in a media file being played by a terminal;

filtering the audio data to obtain background audio data;

identifying the background audio data;

and prompting the identification result through an output unit of the terminal.

2. The audio recognition method of claim 1, wherein the filtering the audio data to obtain background audio data comprises:

3. The audio recognition method of claim 2, wherein the filtering the audio data according to a noise reduction algorithm comprises:

the scene sound and/or the dialogue sound is at least partially removed.

4. The audio recognition method of claim 1, wherein the filtering the audio data to obtain background audio data comprises:

5. The audio identification method according to any of claims 1-4, characterized in that the media file being played by the terminal is extracted from audio data in the media file, and the media file comprises a pure audio file or a video file with audio content.

6. The audio recognition method of any one of claims 1-4, wherein the prompting the recognition result through an output unit of the terminal comprises:

7. The audio recognition method of claim 6, wherein the output unit of the terminal comprises at least one of a display unit, an audio playing unit;

8. The audio recognition method of any of claims 1-4, wherein the recognizing the background audio data comprises:

9. A terminal, comprising a processor, a memory, a disk, and a communication bus;

the processor is configured to execute one or more programs stored in the memory to implement the steps of the audio recognition method of any of claims 1 to 8.

10. A computer readable storage medium, characterized in that the computer readable storage medium stores one or more programs which are executable by one or more processors to implement the steps of the audio recognition method according to any one of claims 1 to 8.