CN106157970B - Audio identification method and terminal - Google Patents

Audio identification method and terminal Download PDF

Info

Publication number
CN106157970B
CN106157970B CN201610455918.4A CN201610455918A CN106157970B CN 106157970 B CN106157970 B CN 106157970B CN 201610455918 A CN201610455918 A CN 201610455918A CN 106157970 B CN106157970 B CN 106157970B
Authority
CN
China
Prior art keywords
audio
identification
unit
pulse code
code modulation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610455918.4A
Other languages
Chinese (zh)
Other versions
CN106157970A (en
Inventor
李光宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
SHANGHAI MICROPHONE CULTURE MEDIA Co.,Ltd.
Original Assignee
Shanghai Microphone Culture Media Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Microphone Culture Media Co ltd filed Critical Shanghai Microphone Culture Media Co ltd
Priority to CN201610455918.4A priority Critical patent/CN106157970B/en
Publication of CN106157970A publication Critical patent/CN106157970A/en
Application granted granted Critical
Publication of CN106157970B publication Critical patent/CN106157970B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality

Abstract

The invention provides an audio recognition method, which comprises the following steps: receiving an audio identification request; if the current terminal is detected to play the audio clip; capturing pulse code modulation data through a hardware abstraction layer; and uploading the pulse code modulation data to an application program interface for identification. The present invention also provides a terminal, comprising: a request receiving unit for receiving an audio recognition request; the audio detection unit is used for detecting whether the current terminal plays the audio clip; the pulse code modulation data acquisition unit is used for detecting that the current terminal plays the audio clip; capturing pulse code modulation data through a hardware abstraction layer; and the data uploading unit is used for uploading the pulse code modulation data to an application program interface for identification. The invention can realize the switching and the rapid identification of the internal audio stream and the external environment sound, and can be rapidly called under any interface of the system, thereby enhancing the user experience.

Description

Audio identification method and terminal
Technical Field
The invention relates to the field of mobile terminals, in particular to an audio identification method and a terminal.
Background
Some existing application programs (application program licensing) can conveniently recognize a piece of music or humming melody based on advanced methods such as frequency feature comparison, frequency formant recognition and hidden Markov statistical models. The protocol is mature and can give results in a few seconds. But in some scenarios it does appear to be ineffective. For example, if a user wants to identify a sound clip or a video played in a local microblog, a WeChat or other application program scenes, the user often needs to prepare another mobile phone, open an application program with a song listening and song recognition function, and aim the second mobile phone at a loudspeaker of the first mobile phone for identification. Is very inconvenient and complicated in operation, and belongs to the problem of pain of users.
Disclosure of Invention
The invention mainly aims to provide an audio recognition method and a terminal applying the audio recognition method, and aims to solve the problem that whether a sound fragment comes from the inside of the terminal or the surrounding environment, the sound fragment can be directly recognized.
In order to achieve the above object, the present invention provides an audio recognition method, comprising: receiving an audio identification request; if the current terminal is detected to play the audio clip; capturing Pulse Code Modulation (PCM) data through a hardware abstraction layer (Hal); and uploading the pulse code modulation data to an Application Program Interface (API) for identification.
Preferably, if the current terminal is not detected to play the audio clip, collecting environmental sound to generate environmental data; and uploading the environment data to the application program interface for identification.
Preferably, the method further comprises the following steps: and displaying the identified identification result through a message prompt box.
Preferably, the method further comprises the following steps: storing the recognized recognition result in a text form; or executing a preset operation; the preset operation comprises the step of sending the identification result into a browser search box for searching, or sending the identification result into a preset Application program (APP) for inquiring or downloading.
Preferably, whether the current terminal is playing a sound clip is detected by a field in an audio focus (AudioFocus).
In addition, to achieve the above object, the present invention further provides a terminal, including: a request receiving unit for receiving an audio recognition request; the audio detection unit is used for detecting whether the current terminal plays the audio clip; the pulse code modulation data acquisition unit is used for detecting that the current terminal plays the audio clip; capturing pulse code modulation data through a hardware abstraction layer; and the data uploading unit is used for uploading the pulse code modulation data to an application program interface for identification.
Preferably, the system further comprises an environment data generating unit, configured to collect environment sound and generate environment data if it is not detected that the current terminal is playing an audio clip; the data uploading unit is also used for uploading the environment data to the application program interface for identification.
Preferably, the mobile terminal further comprises a display unit, which is used for displaying the identified identification result through a message prompt box.
Preferably, the device further comprises a processing unit, configured to save the recognized recognition result in a text form; or executing a preset operation; and the preset operation comprises the step of sending the identification result into a browser search box for searching, or sending the identification result into a preset application program for inquiring or downloading.
Preferably, the audio detecting unit is configured to detect whether the current terminal is playing a sound clip through a field in an audio focus.
The audio recognition method and the terminal applying the audio recognition method provided by the invention realize switching and rapid recognition of internal audio streams and external environment sounds based on real-time judgment of the mobile terminal. Meanwhile, the method is not limited by any application program and can be quickly called under any interface of the system. The pain point of the user is solved, and the user experience is enhanced.
Drawings
Fig. 1 is a schematic diagram of a hardware structure of a mobile terminal implementing various embodiments of the present invention;
FIG. 2 is a diagram of a wireless communication system for the mobile terminal shown in FIG. 1;
fig. 3 is a schematic structural diagram of a terminal according to a first embodiment of the present invention;
fig. 4 is a schematic structural diagram of a terminal according to a second embodiment of the present invention;
fig. 5 is a schematic structural diagram of a terminal according to a third embodiment of the present invention;
FIG. 6 is a flowchart illustrating the operation of a UI-based audio recognition method according to a fourth embodiment of the present invention;
fig. 7 is an operation flowchart of a UI-based audio recognition method according to a fifth embodiment of the present invention;
fig. 8 is an operation flowchart of a UI-based audio recognition method according to a sixth embodiment of the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
A mobile terminal implementing various embodiments of the present invention will now be described with reference to the accompanying drawings. In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in themselves. Thus, "module" and "component" may be used in a mixture.
The mobile terminal may be implemented in various forms. For example, the terminal described in the present invention may include a mobile terminal such as a mobile phone, a smart phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a navigation device, and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. In the following, it is assumed that the terminal is a mobile terminal. However, it will be understood by those skilled in the art that the configuration according to the embodiment of the present invention can be applied to a fixed type terminal in addition to elements particularly used for moving purposes.
Fig. 1 is a schematic hardware configuration of a mobile terminal implementing various embodiments of the present invention.
The mobile terminal 100 may include a wireless communication unit 110, an a/V (audio/video) input unit 120, a user input unit 130, a sensing unit 140, an output unit 150, a memory 160, an interface unit 170, a controller 180, and a power supply unit 190, etc. Fig. 1 illustrates a mobile terminal having various components, but it is to be understood that not all illustrated components are required to be implemented. More or fewer components may alternatively be implemented. Elements of the mobile terminal will be described in detail below.
The wireless communication unit 110 typically includes one or more components that allow radio communication between the mobile terminal 100 and a wireless communication system or network. For example, the wireless communication unit may include at least one of a broadcast receiving module 111, a mobile communication module 112, a wireless internet module 113, a short-range communication module 114, and a location information module 115.
The broadcast receiving module 111 receives a broadcast signal and/or broadcast associated information from an external broadcast management server via a broadcast channel. The broadcast channel may include a satellite channel and/or a terrestrial channel. The broadcast management server may be a server that generates and transmits a broadcast signal and/or broadcast associated information or a server that receives a previously generated broadcast signal and/or broadcast associated information and transmits it to a terminal. The broadcast signal may include a TV broadcast signal, a radio broadcast signal, a data broadcast signal, and the like. Also, the broadcast signal may further include a broadcast signal combined with a TV or radio broadcast signal. The broadcast associated information may also be provided via a mobile communication network, and in this case, the broadcast associated information may be received by the mobile communication module 112. The broadcast signal may exist in various forms, for example, it may exist in the form of an Electronic Program Guide (EPG) of Digital Multimedia Broadcasting (DMB), an Electronic Service Guide (ESG) of digital video broadcasting-handheld (DVB-H), and the like. The broadcast receiving module 111 may receive a signal broadcast by using various types of broadcasting systems. In particular, the broadcast receiving module 111 may receive digital broadcasting by using a digital broadcasting system such as a data broadcasting system of multimedia broadcasting-terrestrial (DMB-T), digital multimedia broadcasting-satellite (DMB-S), digital video broadcasting-handheld (DVB-H), forward link media (MediaFLO @), terrestrial digital broadcasting integrated service (ISDB-T), and the like. The broadcast receiving module 111 may be constructed to be suitable for various broadcasting systems that provide broadcast signals as well as the above-mentioned digital broadcasting systems. The broadcast signal and/or broadcast associated information received via the broadcast receiving module 111 may be stored in the memory 160 (or other type of storage medium).
The mobile communication module 112 transmits and/or receives radio signals to and/or from at least one of a base station (e.g., access point, node B, etc.), an external terminal, and a server. Such radio signals may include voice call signals, video call signals, or various types of data transmitted and/or received according to text and/or multimedia messages.
The wireless internet module 113 supports wireless internet access of the mobile terminal. The module may be internally or externally coupled to the terminal. The wireless internet access technology to which the module relates may include WLAN (wireless LAN) (Wi-Fi), Wibro (wireless broadband), Wimax (worldwide interoperability for microwave access), HSDPA (high speed downlink packet access), and the like.
The short-range communication module 114 is a module for supporting short-range communication. Some examples of short-range communication technologies include bluetooth (TM), Radio Frequency Identification (RFID), infrared data association (IrDA), Ultra Wideband (UWB), zigbee (TM), and the like.
The location information module 115 is a module for checking or acquiring location information of the mobile terminal. A typical example of the location information module is a GPS (global positioning system). According to the current technology, the GPS module 115 calculates distance information and accurate time information from three or more satellites and applies triangulation to the calculated information, thereby accurately calculating three-dimensional current location information according to longitude, latitude, and altitude. Currently, a method for calculating position and time information uses three satellites and corrects an error of the calculated position and time information by using another satellite. In addition, the GPS module 115 can calculate speed information by continuously calculating current position information in real time.
The a/V input unit 120 is used to receive an audio or video signal. The a/V input unit 120 may include a camera 121 and a microphone 122, and the camera 121 processes image data of still pictures or video obtained by an image capturing apparatus in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 151. The image frames processed by the cameras 121 may be stored in the memory 160 (or other storage medium) or transmitted via the wireless communication unit 110, and two or more cameras 121 may be provided according to the construction of the mobile terminal. The microphone 122 may receive sounds (audio data) via the microphone in a phone call mode, a recording mode, a voice recognition mode, or the like, and can process such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the mobile communication module 112 in case of a phone call mode. The microphone 122 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.
The user input unit 130 may generate key input data according to a command input by a user to control various operations of the mobile terminal. The user input unit 130 allows a user to input various types of information, and may include a keyboard, dome sheet, touch pad (e.g., a touch-sensitive member that detects changes in resistance, pressure, capacitance, and the like due to being touched), scroll wheel, joystick, and the like. In particular, when the touch pad is superimposed on the display unit 151 in the form of a layer, a touch screen may be formed.
The sensing unit 140 detects a current state of the mobile terminal 100 (e.g., an open or closed state of the mobile terminal 100), a position of the mobile terminal 100, presence or absence of contact (i.e., touch input) by a user with the mobile terminal 100, an orientation of the mobile terminal 100, acceleration or deceleration movement and direction of the mobile terminal 100, and the like, and generates a command or signal for controlling an operation of the mobile terminal 100. For example, when the mobile terminal 100 is implemented as a slide-type mobile phone, the sensing unit 140 may sense whether the slide-type phone is opened or closed. In addition, the sensing unit 140 can detect whether the power supply unit 190 supplies power or whether the interface unit 170 is coupled with an external device. The sensing unit 140 may include a proximity sensor 141 as will be described below in connection with a touch screen.
The interface unit 170 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The identification module may store various information for authenticating a user using the mobile terminal 100 and may include a User Identity Module (UIM), a Subscriber Identity Module (SIM), a Universal Subscriber Identity Module (USIM), and the like. In addition, a device having an identification module (hereinafter, referred to as an "identification device") may take the form of a smart card, and thus, the identification device may be connected with the mobile terminal 100 via a port or other connection means. The interface unit 170 may be used to receive input (e.g., data information, power, etc.) from an external device and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal and the external device.
In addition, when the mobile terminal 100 is connected with an external cradle, the interface unit 170 may serve as a path through which power is supplied from the cradle to the mobile terminal 100 or may serve as a path through which various command signals input from the cradle are transmitted to the mobile terminal. Various command signals or power input from the cradle may be used as signals for recognizing whether the mobile terminal is accurately mounted on the cradle. The output unit 150 is configured to provide output signals (e.g., audio signals, video signals, alarm signals, vibration signals, etc.) in a visual, audio, and/or tactile manner. The output unit 150 may include a display unit 151, an audio output module 152, an alarm unit 153, and the like.
The display unit 151 may display information processed in the mobile terminal 100. For example, when the mobile terminal 100 is in a phone call mode, the display unit 151 may display a User Interface (UI) or a Graphical User Interface (GUI) related to a call or other communication (e.g., text messaging, multimedia file downloading, etc.). When the mobile terminal 100 is in a video call mode or an image capturing mode, the display unit 151 may display a captured image and/or a received image, a UI or GUI showing a video or an image and related functions, and the like.
Meanwhile, when the display unit 151 and the touch pad are overlapped with each other in the form of a layer to form a touch screen, the display unit 151 may serve as an input device and an output device. The display unit 151 may include at least one of a Liquid Crystal Display (LCD), a thin film transistor LCD (TFT-LCD), an Organic Light Emitting Diode (OLED) display, a flexible display, a three-dimensional (3D) display, and the like. Some of these displays may be configured to be transparent to allow a user to view from the outside, which may be referred to as transparent displays, and a typical transparent display may be, for example, a TOLED (transparent organic light emitting diode) display or the like. Depending on the particular desired implementation, the mobile terminal 100 may include two or more display units (or other display devices), for example, the mobile terminal may include an external display unit (not shown) and an internal display unit (not shown). The touch screen may be used to detect a touch input pressure as well as a touch input position and a touch input area.
The audio output module 152 may convert audio data received by the wireless communication unit 110 or stored in the memory 160 into an audio signal and output as sound when the mobile terminal is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output module 152 may provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output module 152 may include a speaker, a buzzer, and the like.
The alarm unit 153 may provide an output to notify the mobile terminal 100 of the occurrence of an event. Typical events may include call reception, message reception, key signal input, touch input, and the like. In addition to audio or video output, the alarm unit 153 may provide output in different ways to notify the occurrence of an event. For example, the alarm unit 153 may provide an output in the form of vibration, and when a call, a message, or some other incoming communication (incomingmunication) is received, the alarm unit 153 may provide a tactile output (i.e., vibration) to inform the user thereof. By providing such a tactile output, the user can recognize the occurrence of various events even when the user's mobile phone is in the user's pocket. The alarm unit 153 may also provide an output notifying the occurrence of an event via the display unit 151 or the audio output module 152.
The memory 160 may store software programs and the like for processing and controlling operations performed by the controller 180, or may temporarily store data (e.g., a phonebook, messages, still images, videos, and the like) that has been or will be output. Also, the memory 160 may store data regarding various ways of vibration and audio signals output when a touch is applied to the touch screen.
The memory 160 may include at least one type of storage medium including a flash memory, a hard disk, a multimedia card, a card-type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a read-only memory (ROM), an electrically erasable programmable read-only memory (EEPROM), a programmable read-only memory (PROM), a magnetic memory, a magnetic disk, an optical disk, and the like. Also, the mobile terminal 100 may cooperate with a network storage device that performs a storage function of the memory 160 through a network connection.
The controller 180 generally controls the overall operation of the mobile terminal. For example, the controller 180 performs control and processing related to voice calls, data communications, video calls, and the like. In addition, the controller 180 may include a multimedia module 1810 for reproducing (or playing back) multimedia data, and the multimedia module 1810 may be constructed within the controller 180 or may be constructed separately from the controller 180. The controller 180 may perform a pattern recognition process to recognize a handwriting input or a picture drawing input performed on the touch screen as a character or an image.
The power supply unit 190 receives external power or internal power and provides appropriate power required to operate various elements and components under the control of the controller 180.
The various embodiments described herein may be implemented in a computer-readable medium using, for example, computer software, hardware, or any combination thereof. For a hardware implementation, the embodiments described herein may be implemented using at least one of an Application Specific Integrated Circuit (ASIC), a Digital Signal Processor (DSP), a Digital Signal Processing Device (DSPD), a Programmable Logic Device (PLD), a Field Programmable Gate Array (FPGA), a processor, a controller, a microcontroller, a microprocessor, an electronic unit designed to perform the functions described herein, and in some cases, such embodiments may be implemented in the controller 180. For a software implementation, the implementation such as a process or a function may be implemented with a separate software module that allows performing at least one function or operation. The software codes may be implemented by software applications (or programs) written in any suitable programming language, which may be stored in the memory 160 and executed by the controller 180.
Up to this point, mobile terminals have been described in terms of their functionality. Hereinafter, a slide-type mobile terminal among various types of mobile terminals, such as a folder-type, bar-type, swing-type, slide-type mobile terminal, and the like, will be described as an example for the sake of brevity. Accordingly, the present invention can be applied to any type of mobile terminal, and is not limited to a slide type mobile terminal.
The mobile terminal 100 as shown in fig. 1 may be configured to operate with communication systems such as wired and wireless communication systems and satellite-based communication systems that transmit data via frames or packets.
A communication system in which a mobile terminal according to the present invention is operable will now be described with reference to fig. 2.
Such communication systems may use different air interfaces and/or physical layers. For example, the air interface used by the communication system includes, for example, Frequency Division Multiple Access (FDMA), Time Division Multiple Access (TDMA), Code Division Multiple Access (CDMA), and Universal Mobile Telecommunications System (UMTS) (in particular, Long Term Evolution (LTE)), global system for mobile communications (GSM), and the like. By way of non-limiting example, the following description relates to a CDMA communication system, but such teachings are equally applicable to other types of systems.
Referring to fig. 2, the CDMA wireless communication system may include a plurality of mobile terminals 100, a plurality of Base Stations (BSs) 270, Base Station Controllers (BSCs) 275, and a Mobile Switching Center (MSC) 280. The MSC280 is configured to interface with a Public Switched Telephone Network (PSTN) 290. The MSC280 is also configured to interface with a BSC275, which may be coupled to the base station 270 via a backhaul. The backhaul may be constructed according to any of several known interfaces including, for example, E1/T1, ATM, IP, PPP, frame Relay, HDSL, ADSL, or xDSL. It will be understood that a system as shown in fig. 2 may include multiple BSCs 2750.
Each BS270 may serve one or more sectors (or regions), each sector covered by a multi-directional antenna or an antenna pointing in a particular direction being radially distant from the BS 270. Alternatively, each partition may be covered by two or more antennas for diversity reception. Each BS270 may be configured to support multiple frequency allocations, with each frequency allocation having a particular frequency spectrum (e.g., 1.25MHz,5MHz, etc.).
The intersection of partitions with frequency allocations may be referred to as a CDMA channel. The BS270 may also be referred to as a Base Transceiver Subsystem (BTS) or other equivalent terminology. In such a case, the term "base station" may be used to generically refer to a single BSC275 and at least one BS 270. The base stations may also be referred to as "cells". Alternatively, each sector of a particular BS270 may be referred to as a plurality of cell sites.
As shown in fig. 2, a Broadcast Transmitter (BT)295 transmits a broadcast signal to the mobile terminal 100 operating within the system. A broadcast receiving module 111 as shown in fig. 1 is provided at the mobile terminal 100 to receive a broadcast signal transmitted by the BT 295. In fig. 2, several Global Positioning System (GPS) satellites 300 are shown. The satellite 300 assists in locating at least one of the plurality of mobile terminals 100.
In fig. 2, a plurality of satellites 300 are depicted, but it is understood that useful positioning information may be obtained with any number of satellites. The GPS module 115 as shown in fig. 1 is generally configured to cooperate with satellites 300 to obtain desired positioning information. Other techniques that can track the location of the mobile terminal may be used instead of or in addition to GPS tracking techniques. In addition, at least one GPS satellite 300 may selectively or additionally process satellite DMB transmission.
As a typical operation of the wireless communication system, the BS270 receives reverse link signals from various mobile terminals 100. The mobile terminal 100 is generally engaged in conversations, messaging, and other types of communications. Each reverse link signal received by a particular base station 270 is processed within the particular BS 270. The obtained data is forwarded to the associated BSC 275. The BSC provides call resource allocation and mobility management functions including coordination of soft handoff procedures between BSs 270. The BSCs 275 also route the received data to the MSC280, which provides additional routing services for interfacing with the PSTN 290. Similarly, the PSTN290 interfaces with the MSC280, the MSC interfaces with the BSCs 275, and the BSCs 275 accordingly control the BS270 to transmit forward link signals to the mobile terminal 100.
Based on the above mobile terminal hardware structure and communication system, the present invention provides various embodiments of the method.
As shown in fig. 3, the terminal according to the first embodiment of the present invention further includes: a request receiving unit 11 for receiving an audio recognition request; an audio detecting unit 12, configured to detect whether an audio clip is being played by a current terminal; a pulse code modulation data obtaining unit 13, configured to, if it is detected that the current terminal is playing an audio clip; capturing pulse code modulation data through a hardware abstraction layer; and a data uploading unit 14, configured to upload the pulse code modulation data to an application program interface for identification.
The requesting unit 11 may receive a request for audio recognition, such as "listen to song" initiated by a gesture, a key or other shortcut. The audio detecting unit 12 may detect whether the current terminal is playing a sound clip through a field in an audio focus (AudioFocus); in particular, it may be detected by an mFocusStack field within the audio focus, when the mFocusStack field is empty, e.g. a field of "mFocusStack. Otherwise, the current terminal plays the sound.
If it is detected that the current terminal is playing a sound segment, the Pulse code Modulation data obtaining unit 13 is configured to initiate an identification request from an internal sound segment of the current terminal, and capture Pulse code Modulation (Pulse code Modulation) data through a hardware abstraction layer (hardware abstraction layer). It should be noted that the encoding format of the pulse code modulation data is generally wav, and at this time, the pulse code modulation data obtaining unit 13 may directly perform corresponding format conversion according to different requirements of the application program interface, or perform the format conversion operation by the application program interface.
In addition, in another embodiment of the present invention, a display unit 15 may be further included, configured to display the identified recognition result through a message prompt box, and may directly display the recognition result in any interface or globally.
In the embodiment, whether the current terminal plays the sound clip is judged through the field in the audio focus, so that the method is convenient and quick; when detecting that the current terminal plays the audio clip; and capturing pulse code modulation data through a hardware abstraction layer and uploading the pulse code modulation data to an application program interface for identification.
Referring to fig. 4, a second embodiment of the present invention provides a terminal, which also includes a request receiving unit 11 for receiving an audio identification request; an audio detecting unit 12, configured to detect whether an audio clip is being played by a current terminal; a pulse code modulation data obtaining unit 13, configured to, if it is detected that the current terminal is playing an audio clip; capturing pulse code modulation data through a hardware abstraction layer; a data uploading unit 14, configured to upload the pulse code modulation data to an application program interface for identification; and the display unit 15 is used for displaying the identified identification result through a message prompt box, and the identification result can be directly displayed under any interface or globally.
The requesting unit 11 may receive a request for audio recognition, such as "listen to song" initiated by a gesture, a key or other shortcut. The audio detection unit 12 can determine whether the current terminal is playing a sound clip through a field in the audio focus; in particular, it may be detected by an mFocusStack field within the audio focus, when the mFocusStack field is empty, e.g. a field of "mFocusStack. Otherwise, the current terminal plays the sound.
If the current terminal is playing a sound segment, the Pulse code Modulation data obtaining unit 13 is configured to initiate an identification request from an internal sound segment of the current terminal, and capture Pulse code Modulation (Pulse code Modulation) data through a hardware abstraction layer (hardware abstraction layer). It should be noted that the encoding format of the pulse code modulation data is generally wav, and at this time, the pulse code modulation data obtaining unit 13 may directly perform corresponding format conversion according to different requirements of the application program interface, or perform the format conversion operation by the application program interface.
Different from the previous embodiment, the present embodiment further includes a processing unit 16, configured to store the recognized recognition result in a text form; or executing a preset operation; the preset operation comprises the step of sending the identification result into a browser search box for searching, or sending the identification result into a preset application program, such as a music-related application program, for inquiring or downloading.
In the embodiment, whether the current terminal plays the sound clip is judged through the field in the audio focus, so that the method is convenient and quick; when detecting that the current terminal plays the audio clip; capturing pulse code modulation data through a hardware abstraction layer and uploading the pulse code modulation data to an application program interface for identification; loading the recognition result into a search box of a browser through a processing unit to directly search or inquire or download in a preset application program; subsequent manual operation of the user on the identification result is omitted, and user experience is improved.
Referring to fig. 5, a third embodiment of the present invention provides a terminal, which also includes: a request receiving unit 11 for receiving an audio recognition request; an audio detecting unit 12, configured to detect whether an audio clip is being played by a current terminal; a pulse code modulation data obtaining unit 13, configured to, if it is detected that the current terminal is playing an audio clip; capturing pulse code modulation data through a hardware abstraction layer; a data uploading unit 14, configured to upload the pulse code modulation data to an application program interface for identification; and the display unit 15 is used for displaying the identified identification result through a message prompt box, and the identification result can be directly displayed under any interface or globally.
The requesting unit 11 may receive a request for audio recognition, such as "listen to song" initiated by a gesture, a key or other shortcut. The audio detection unit 12 can determine whether the current terminal is playing a sound clip through a field in the audio focus; in particular, it may be detected by an mFocusStack field within the audio focus, when the mFocusStack field is empty, e.g. a field of "mFocusStack. Otherwise, the current terminal plays the sound.
If the current terminal is playing a sound segment, the Pulse code Modulation data obtaining unit 13 is configured to initiate an identification request from an internal sound segment of the current terminal, and capture Pulse code Modulation (Pulse code Modulation) data through a hardware abstraction layer (hardware abstraction layer). It should be noted that the encoding format of the pulse code modulation data is generally wav, and at this time, the pulse code modulation data obtaining unit 13 may directly perform corresponding format conversion according to different requirements of the application program interface, or perform the format conversion operation by the application program interface.
Different from the two embodiments, the present embodiment further includes an environment data generating unit 18, configured to collect environment sound and generate environment data if it is not detected that the current terminal plays an audio clip; at this time, the data uploading unit 14 is further configured to upload the environment data to the application program interface for identification.
As in the previous embodiment, in this embodiment, the device may further include a processing unit 16, configured to store the recognized recognition result in a text form; or executing a preset operation; the preset operation comprises the step of sending the identification result into a browser search box for searching, or sending the identification result into a preset application program, such as a music-related application program, for inquiring or downloading.
In the embodiment, whether the current terminal plays the sound clip is judged through the field in the audio focus, so that the method is convenient and quick; different audio data acquisition modes are selected according to the detection result, so that switching and rapid identification of the internal audio stream and the external environment sound are realized; loading the recognition result into a search box of a browser through a processing unit to directly search or inquire or download in a preset application program; subsequent manual operation of the user on the identification result is omitted, and user experience is improved.
Referring to fig. 6, a fourth embodiment of the present invention further provides an audio recognition method, including the steps of: s1, receiving an audio identification request; s2, if it is detected that the current terminal plays the audio clip; capturing pulse code modulation data through a hardware abstraction layer; and S3, sending the pulse code modulation data to an application program interface for identification.
In step S1, a request for audio recognition, such as "listen to songs" initiated by a gesture, key or other shortcut, may be received. In step S2, it can be determined whether the current terminal is playing a sound clip through a field in the audio focus; in particular, it may be detected by an mFocusStack field within the audio focus, when the mFocusStack field is empty, e.g. a field of "mFocusStack. Otherwise, the current terminal plays the sound.
If the current terminal is playing a sound clip, in step S2, an identification request may be initiated from the internal sound clip of the current terminal, and Pulse code Modulation (pcm) data may be captured by a hardware abstraction layer (hardware abstraction layer). It should be noted that the encoding format of the pulse code modulation data is generally wav, and in this case, the corresponding format conversion may be directly performed in step S2 according to different requirements of the api, or in step S3, the format conversion operation is performed by the api.
In the embodiment, whether the current terminal plays the sound clip is judged through the field in the audio focus, so that the method is convenient and quick; when detecting that the current terminal plays the audio clip; and capturing pulse code modulation data through a hardware abstraction layer and uploading the pulse code modulation data to an application program interface for identification.
Referring to fig. 7, a fifth embodiment of the present invention provides an audio recognition method, which comprises the following steps: s1, receiving an audio identification request; s2, if it is detected that the current terminal plays the audio clip; capturing pulse code modulation data through a hardware abstraction layer; and S3, sending the pulse code modulation data to an application program interface for identification.
In step S1, a request for audio recognition, such as "listen to songs" initiated by a gesture, key or other shortcut, may be received. In step S2, it can be determined whether the current terminal is playing a sound clip through a field in the audio focus; in particular, it may be detected by an mFocusStack field within the audio focus, when the mFocusStack field is empty, e.g. a field of "mFocusStack. Otherwise, the current terminal plays the sound.
If the current terminal is playing a sound clip, in step S2, an identification request may be initiated from the internal sound clip of the current terminal, and Pulse code Modulation (pcm) data may be captured by a hardware abstraction layer (hardware abstraction layer). It should be noted that the encoding format of the pulse code modulation data is generally wav, and in this case, the corresponding format conversion may be directly performed in step S2 according to different requirements of the api, or in step S3, the format conversion operation is performed by the api.
In this embodiment, after step S3, the method further includes: step S4, displaying the recognized recognition result through a message prompt box, and directly displaying the recognition result under any interface or globally; step S5, storing the recognized recognition result in a text form; or executing a preset operation; the preset operation comprises the step of sending the identification result into a browser search box for searching, or sending the identification result into a preset application program, such as a music-related application program, for inquiring or downloading.
In the embodiment, whether the current terminal plays the sound clip is detected through the field in the audio focus, so that the method is convenient and fast; when detecting that the current terminal plays the audio clip; capturing pulse code modulation data through a hardware abstraction layer and uploading the pulse code modulation data to an application program interface for identification; loading the recognition result into a search box of a browser through a processing unit to directly search or inquire or download in a preset application program; subsequent manual operation of the user on the identification result is omitted, and user experience is improved.
Referring to fig. 8, a sixth embodiment of the present invention provides an audio recognition method, including: s11, receiving an audio identification request; s12, detecting whether the current terminal plays the audio clip; if the current terminal is detected to play the audio clip; step S13 is entered, and the pulse code modulation data is captured by the hardware abstraction layer; if the current terminal is not detected to play the audio clip; step S14 is executed to collect the environmental sound and generate the environmental data; finally, as shown in step S15, the pulse code modulation data or the environment data is uploaded to an application program interface for identification.
In step S11, a request for audio recognition, such as "listen to songs" initiated by a gesture, key or other shortcut, may be received. In step S12, it can be determined whether the current terminal is playing a sound clip through a field in the audio focus; in particular, it may be detected by an mFocusStack field within the audio focus, when the mFocusStack field is empty, e.g. a field of "mFocusStack. Otherwise, the current terminal plays the sound.
If the current terminal is playing a sound clip, in step S13, an identification request may be initiated from the internal sound clip of the current terminal, and Pulse code Modulation (pcm) data may be captured by a hardware abstraction layer (hardware abstraction layer). It should be noted that the encoding format of the pulse code modulation data is generally wav, and in this case, the corresponding format conversion may be directly performed in step S13 according to different requirements of the api, or in step S15, the format conversion operation is performed by the api.
Similar to the previous embodiment, after step S15, the method further includes: step S16, displaying the recognized recognition result through a message prompt box, and directly displaying the recognition result under any interface or globally; step S17, storing the recognized recognition result in a text form; or executing a preset operation; the preset operation comprises the step of sending the identification result into a browser search box for searching, or sending the identification result into a preset application program, such as a music-related application program, for inquiring or downloading.
In the embodiment, whether the current terminal plays the sound clip is judged through the field in the audio focus, so that the method is convenient and quick; different audio data acquisition modes are selected according to the detection result, so that switching and rapid identification of the internal audio stream and the external environment sound are realized; loading the recognition result into a search box of a browser through a processing unit to directly search or inquire or download in a preset application program; subsequent manual operation of the user on the identification result is omitted, and user experience is improved.
The invention provides an audio recognition method and a terminal applying the audio recognition method, based on a UI application layer, after receiving an audio recognition request initiated by a gesture or other shortcuts, determining whether a mobile terminal is currently playing an audio stream or not by detecting whether the mobile terminal is currently playing the audio stream or not, and quickly calling a related application program on any interface to realize the function of 'listening to songs and recognizing songs'. The recognition result can also be displayed under any interface and finally stored in a text form. And filling in the search box and sending the search box to the application program search box according to the setting of the user. The invention solves the problem of the application range of the existing 'listening to songs and identifying songs' function. The function is integrated into a UI layer and is quickly called through a shortcut preset by a user. Based on real-time judgment of the mobile terminal, switching and quick identification of the internal audio stream and the external environment sound are achieved. Meanwhile, the method is not limited by any application program and can be quickly called under any interface of the system; the pain point of the user is solved, and the user experience is enhanced.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes at least two instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (4)

1. An audio recognition method, comprising the steps of:
receiving an audio identification request, and detecting whether a current terminal plays a sound clip through an mFocusStack field in an audio focus;
if detecting that the mfocusstack field is not empty, indicating that the current terminal plays an audio clip, capturing pulse code modulation data through a hardware abstraction layer, and uploading the pulse code modulation data to an application program interface for identification;
if detecting that the mfocusstack field is empty, indicating that the current terminal is not detected to play the audio clip, and collecting environmental sound to generate environmental data; and uploading the environment data to the application program interface for identification, and displaying the identified identification result through a message prompt box.
2. The audio recognition method of claim 1, further comprising the steps of: storing the recognized recognition result into a text form; or executing a preset operation; and the preset operation comprises the step of sending the identification result into a browser search box for searching, or sending the identification result into a preset application program for inquiring or downloading.
3. A terminal, comprising:
a request receiving unit for receiving an audio recognition request;
the audio detection unit is used for detecting whether the current terminal plays the sound clip or not through an mFocusStack field in an audio focus; if detecting that mfocusstack is empty, indicating that the current terminal does not play the audio clip, otherwise, indicating that the current terminal plays the audio clip;
the pulse code modulation data acquisition unit is used for detecting that the current terminal plays the audio clip; capturing pulse code modulation data through a hardware abstraction layer;
the data uploading unit is used for uploading the pulse code modulation data to an application program interface for identification;
the environment data generating unit is used for collecting environment sound and generating environment data if the current terminal is not detected to play the audio clip; the data uploading unit is also used for uploading the environment data to the application program interface for identification;
and the display unit is used for displaying the identified identification result through the message prompt box.
4. The terminal according to claim 3, further comprising a processing unit, configured to save the recognized recognition result in a text form; or executing a preset operation; and the preset operation comprises the step of sending the identification result into a browser search box for searching, or sending the identification result into a preset application program for inquiring or downloading.
CN201610455918.4A 2016-06-22 2016-06-22 Audio identification method and terminal Active CN106157970B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610455918.4A CN106157970B (en) 2016-06-22 2016-06-22 Audio identification method and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610455918.4A CN106157970B (en) 2016-06-22 2016-06-22 Audio identification method and terminal

Publications (2)

Publication Number Publication Date
CN106157970A CN106157970A (en) 2016-11-23
CN106157970B true CN106157970B (en) 2020-12-04

Family

ID=57353059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610455918.4A Active CN106157970B (en) 2016-06-22 2016-06-22 Audio identification method and terminal

Country Status (1)

Country Link
CN (1) CN106157970B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108922537B (en) * 2018-05-28 2021-05-18 Oppo广东移动通信有限公司 Audio recognition method, device, terminal, earphone and readable storage medium
CN110719553B (en) * 2018-07-13 2021-08-06 国际商业机器公司 Smart speaker system with cognitive sound analysis and response
US10832672B2 (en) 2018-07-13 2020-11-10 International Business Machines Corporation Smart speaker system with cognitive sound analysis and response
CN111381798B (en) * 2018-12-28 2021-05-14 广州市百果园信息技术有限公司 Audio processing method, device, terminal and storage medium
CN110060685B (en) 2019-04-15 2021-05-28 百度在线网络技术(北京)有限公司 Voice wake-up method and device
CN111768782A (en) * 2020-06-30 2020-10-13 广州酷狗计算机科技有限公司 Audio recognition method, device, terminal and storage medium
CN112162783B (en) * 2020-09-27 2021-09-14 珠海格力电器股份有限公司 Music playing application keep-alive processing method and system, storage medium and electronic equipment
CN114005469A (en) * 2021-10-20 2022-02-01 广州市网星信息技术有限公司 Audio playing method and system capable of automatically skipping mute segment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741975A (en) * 2009-12-18 2010-06-16 上海华勤通讯技术有限公司 Method for processing music fragment to obtain song information by using mobile phone and mobile phone thereof
CN104267924A (en) * 2014-09-19 2015-01-07 青岛海信移动通信技术股份有限公司 Mobile terminal and audio processing method thereof
US9076450B1 (en) * 2012-09-21 2015-07-07 Amazon Technologies, Inc. Directed audio for speech recognition
CN105183446A (en) * 2015-07-16 2015-12-23 贵阳语玩科技有限公司 Audio management system
CN105447199A (en) * 2015-12-29 2016-03-30 小米科技有限责任公司 Audio information acquisition method and device

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101741975A (en) * 2009-12-18 2010-06-16 上海华勤通讯技术有限公司 Method for processing music fragment to obtain song information by using mobile phone and mobile phone thereof
US9076450B1 (en) * 2012-09-21 2015-07-07 Amazon Technologies, Inc. Directed audio for speech recognition
CN104267924A (en) * 2014-09-19 2015-01-07 青岛海信移动通信技术股份有限公司 Mobile terminal and audio processing method thereof
CN105183446A (en) * 2015-07-16 2015-12-23 贵阳语玩科技有限公司 Audio management system
CN105447199A (en) * 2015-12-29 2016-03-30 小米科技有限责任公司 Audio information acquisition method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
《Android音频焦点》;笑傲江湖曲;《CSDN》;20140515;第1-3页 *

Also Published As

Publication number Publication date
CN106157970A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
CN106157970B (en) Audio identification method and terminal
CN105468158B (en) Color adjustment method and mobile terminal
CN106911850B (en) Mobile terminal and screen capturing method thereof
CN104731480B (en) Image display method and device based on touch screen
CN106302651B (en) Social picture sharing method and terminal with social picture sharing system
CN106210286B (en) Parameter adjusting method and device for double-screen mobile terminal
CN106527933B (en) Control method and device for edge gesture of mobile terminal
CN106778176B (en) Information processing method and mobile terminal
CN105898703B (en) Management method and device for identified pseudo base station
CN106598538B (en) Instruction set updating method and system
CN106024013B (en) Voice data searching method and system
CN106534560B (en) Mobile terminal control device and method
CN106453863B (en) Method and system for controlling terminal and earphone
CN106648324B (en) Hidden icon control method and device and terminal
CN106341554B (en) Method and device for quickly searching data content and mobile terminal
CN106371704B (en) Application shortcut layout method of screen locking interface and terminal
CN106453883B (en) Intelligent terminal and message notification processing method thereof
CN105791541B (en) Screenshot method and mobile terminal
CN107241497B (en) Mobile terminal and loudness output adjusting method
CN107197084B (en) Method for projection between mobile terminals and first mobile terminal
CN106951350B (en) Method and device for checking mobile terminal disk
CN106385494B (en) Method and device for acquiring dynamic home page of mobile terminal application
CN106569670B (en) Device and method for processing application
CN107027113B (en) SIM card activation method and mobile terminal
CN106385502B (en) Photo arrangement method and mobile terminal

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20201103

Address after: Room 201-3, 1028 Panyu Road, Xuhui District, Shanghai 200030

Applicant after: SHANGHAI MICROPHONE CULTURE MEDIA Co.,Ltd.

Address before: 518000 Guangdong Province, Shenzhen high tech Zone of Nanshan District City, No. 9018 North Central Avenue's innovation building A, 6-8 layer, 10-11 layer, B layer, C District 6-10 District 6 floor

Applicant before: NUBIA TECHNOLOGY Co.,Ltd.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant