CN108198573A - Audio identification methods and device, storage medium and electronic equipment - Google Patents

Audio identification methods and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN108198573A
CN108198573A CN201711486757.6A CN201711486757A CN108198573A CN 108198573 A CN108198573 A CN 108198573A CN 201711486757 A CN201711486757 A CN 201711486757A CN 108198573 A CN108198573 A CN 108198573A
Authority
CN
China
Prior art keywords
audio
frequency
identified
fingerprint
frequency fingerprint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201711486757.6A
Other languages
Chinese (zh)
Other versions
CN108198573B (en
Inventor
黄瑛
胡明清
王涛
杨琛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing QIYI Century Science and Technology Co Ltd
Original Assignee
Beijing QIYI Century Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing QIYI Century Science and Technology Co Ltd filed Critical Beijing QIYI Century Science and Technology Co Ltd
Priority to CN201711486757.6A priority Critical patent/CN108198573B/en
Publication of CN108198573A publication Critical patent/CN108198573A/en
Application granted granted Critical
Publication of CN108198573B publication Critical patent/CN108198573B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Library & Information Science (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of audio identification methods, including:According to preset first selection rule, target audio sample is chosen in audio to be identified;Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;When non-successful match, in the audio to be identified, new target audio sample is chosen again, until successful match, to realize the identification to audio to be identified.In recognition methods provided by the invention, a section audio is chosen first as target audio sample, audio-frequency fingerprint is extracted from the target audio sample, it is matched with the fingerprint in the audio-frequency fingerprint library pre-established, in non-successful match, a section audio is selected, until completing the identification to the audio to be identified, to improve the discrimination to audio to be identified as new target audio sample again in audio to be identified.

Description

Audio identification methods and device, storage medium and electronic equipment
Technical field
The present invention relates to audio frequency identification technique field, more particularly to a kind of audio identification methods and device, storage medium and Electronic equipment.
Background technology
In recent years, the application of audio image documentation equipment more and more occurs in people’s lives.People broadcast using music It puts that equipment listens to music or image application equipment is changed during seeing film, needs to watch the lyrics or subtitle.Therefore, audio is known The application in every field of other technology is more and more extensive.
Generally using the technology of audio-frequency fingerprint identification, technical staff has found existing audio frequency identification technique by research and development, existing During some audio identifications, such as song recognition, same first song might have multiple and different versions, office between different editions Portion there are audio it is inconsistent the phenomenon that.It such as can there are the sound that the cheer of mass viewer audiences or people are spoken in live version audio Sound, therefore the fingerprint frequency range extracted in identification process can reduce the discrimination of audio if there is the sound other than music.
Invention content
The technical problems to be solved by the invention are to provide a kind of audio identification methods, in audio identification process using multiple The mode that acquisition audio sample is identified, to promote the discrimination of audio.
The present invention also provides a kind of speech recognizing device, to ensure the realization and application of the above method in practice.
A kind of audio identification methods, including:
According to preset first selection rule, target audio sample is chosen in audio to be identified;
Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;
The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;The built in advance Vertical audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, and the first audio-frequency fingerprint extraction is close Degree is less than second audio-frequency fingerprint and extracts density;
When non-successful match, in the audio to be identified, choose new target audio sample again, until matching into Work(, to realize the identification to audio to be identified.
Above-mentioned method, it is optionally, described according to preset selection rule, target audio sample is chosen in audio to be identified This, including:
Determine first time length;
The audio of a length of first time length is as target audio sample when being randomly selected in the audio to be identified This.
Above-mentioned method, it is optionally, described according to preset selection rule, target audio sample is chosen in audio to be identified Originally include:
Audio power detection is carried out to each audio fragment in the audio to be identified;
In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen as mesh Mark audio sample.
Above-mentioned method, optionally, multiple audio-frequency fingerprints by extraction and the finger in pre-established audio-frequency fingerprint library Line carries out matching and includes:
Down-sampled processing is carried out to multiple audio-frequency fingerprints of the extraction;
By the fingerprint progress in the multiple audio-frequency fingerprint Jing Guo down-sampled processing, with pre-established audio-frequency fingerprint library Match.
Above-mentioned method, it is optionally, described in audio to be identified, new target audio sample is chosen again to be included:
According to preset second selection rule, new target audio sample, institute are chosen again in the audio to be identified The second selection rule is stated different from first selection rule.
Above-mentioned method optionally, further includes:
When successful match, the corresponding audio-frequency information of the audio to be identified is obtained, and the audio-frequency information is fed back to Client.
A kind of speech recognizing device, including:
First selection unit, for according to preset first selection rule, target audio sample to be chosen in audio to be identified This;
For pressing the first audio-frequency fingerprint extraction density, multiple audios are extracted in the target audio sample for extraction unit Fingerprint;
Matching unit, for the fingerprint in the multiple audio-frequency fingerprint extracted and pre-established audio-frequency fingerprint library to be carried out Matching;The pre-established audio-frequency fingerprint library by the second audio-frequency fingerprint extraction density carry out audio-frequency fingerprint extraction, described first Audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density;
Second selection unit, for when non-successful match, in the audio to be identified, choosing new target sound again Frequency sample, until successful match, to realize the identification to audio to be identified.
Above-mentioned device optionally, further includes:
Feedback unit, for when successful match, obtaining the corresponding audio-frequency information of the audio to be identified, and by the sound Frequency information feeds back to client.
A kind of storage medium, the storage medium include the program of storage, wherein, when described program is run described in control Equipment where storage medium performs above-mentioned audio identification methods.
A kind of electronic equipment, including memory and one either one of them or one of more than one program with Upper program is stored in memory, and is configured to perform above-mentioned video identification side by one or more than one processor Method.
Compared with prior art, the present invention includes advantages below:
The present invention provides a kind of audio identification methods, preset first selection rule are first depending on, in audio to be identified Middle selection target audio sample;And density is extracted by the first audio-frequency fingerprint, extract multiple audios in the target audio sample Fingerprint;The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;When not matching into During work(, in the audio to be identified, new target audio sample is chosen again, audio is extracted in new target audio sample Fingerprint continues to match, until the Audio Matching success to be identified.In audio identification methods provided by the invention, using point The method of section extraction audio sample, in the case where the last period audio sample fails identification, extracts new audio sample again Originally continue to identify, improve the discrimination of audio to be identified.
In addition, in audio identification methods provided by the invention, density is extracted by the first audio-frequency fingerprint in target audio sample Audio-frequency fingerprint is extracted, and the audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library.In the present invention, The pre-established audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density when establishing, also, The first audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density.Using the fingerprint extraction side of differentiation Formula, when audio-frequency fingerprint is compared, you can to keep corresponding recognition rate, it is also ensured that the identification essence of audio-frequency fingerprint Degree, realizes the balance to recognition rate and precision.
Certainly, it implements any of the products of the present invention and does not necessarily require achieving all the advantages described above at the same time.
Description of the drawings
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, the accompanying drawings in the following description is only some embodiments of the present invention, for For those of ordinary skill in the art, without having to pay creative labor, it can also be obtained according to these attached drawings His attached drawing.
Fig. 1 is a kind of method flow diagram of audio identification methods provided by the invention;
Fig. 2 is a kind of another method flow diagram of audio identification methods provided by the invention;
Fig. 3 is a kind of another method flow diagram of audio identification methods provided by the invention;
Fig. 4 is a kind of another method flow diagram of audio identification methods provided by the invention;
Fig. 5 is a kind of principle Organization Chart of audio identification methods provided by the invention;
Fig. 6 is a kind of structure diagram of speech recognizing device provided by the invention;
Fig. 7 is a kind of another structure diagram of speech recognizing device provided by the invention;
Fig. 8 is the structure diagram of a kind of electronic equipment provided by the invention.
Specific embodiment
Below in conjunction with the attached drawing in the embodiment of the present invention, the technical solution in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other without making creative work Embodiment shall fall within the protection scope of the present invention.
The present invention can be used in numerous general or special purpose computing device environment or configuration.Such as:Personal computer, service Device computer, handheld device or portable device, laptop device, multi-processor device, including any of the above device or equipment Distributed computing environment etc..
The present invention provides a kind of audio identification methods, the executive agent of this method can be the processing in playback equipment Device, method flow diagram is as shown in Figure 1, including step:
S101:According to preset first selection rule, target audio sample is chosen in audio to be identified;
In the present invention, for audio to be identified, chosen in the audio to be identified according to preset first selection rule Target audio sample.First selection rule can be random rule, or the spy set by the audio to be identified Set pattern is then.
S102:Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;
In the present invention, the audio-frequency fingerprint is the numerical characteristic in audio, and the numerical characteristic is only in the audio One without two feature that can characterize acoustic characteristic, and the audio-frequency fingerprint involved in the present invention i.e. will be in the target audio sample Numerical characteristic is extracted in the form of identifier.The extraction density of sound intermediate frequency fingerprint of the present invention refers to carry in a section audio The number of numerical characteristic taken.The numerical characteristic extracted in a section audio more multilist sign audio-frequency fingerprint extraction density is bigger.
S103:The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;Institute State the extraction that pre-established audio-frequency fingerprint library is carried out audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, first audio-frequency fingerprint It extracts density and is less than second audio-frequency fingerprint extraction density;
In the present invention, audio-frequency fingerprint library is pre-established, when establishing the audio-frequency fingerprint library, is carried according to the second audio-frequency fingerprint Density extraction audio-frequency fingerprint is taken, and then forms the audio-frequency fingerprint library;In the present invention, density is extracted by the first audio-frequency fingerprint, Multiple audio-frequency fingerprints are extracted in the target audio sample;It will be in multiple audio-frequency fingerprints of extraction and pre-established audio-frequency fingerprint library Fingerprint matched.
S104:When non-successful match, in the audio to be identified, new target audio sample is chosen again, until Successful match, to realize the identification to audio to be identified.
In the present invention, when choosing new target audio sample again, the new target audio sample is chosen with last Target audio sample it is different, for the target audio sample chosen again when choosing, the selection rule of use can also be with upper one The selection rule that the target audio sample of secondary selection uses is different.
Audio fingerprint techniques refer to numerical characteristic unique in a section audio through specific algorithm with indications Form extract, for identifying the position of the sample sound of magnanimity or track and localization sample in database.The present invention Inventor finding that during audio identification, audio to be identified can cause due to different versions after numerous studies Recognition failures.Meanwhile in the existing knowledge method for distinguishing using audio-frequency fingerprint, using long audio fragment and big fingerprint density energy Discrimination is enough improved, but since bent library is larger, operation efficiency can be influenced, improves the index time according to short identification segment With small fingerprint density, speed can be fast, but can reduce discrimination again.
Therefore in above-mentioned audio identification methods provided by the invention, using the audio identification strategy of segmentation classification, first When secondary determining target audio sample is unidentified successful, new target audio sample is redefined, using stage extraction audio sample This method, in the case where the last period audio sample fails identification, extracts new audio sample and continues to know again Not, the discrimination of audio to be identified is improved.
In audio identification methods provided by the invention, Density extraction sound is extracted by the first audio-frequency fingerprint in target audio sample Frequency fingerprint, and the audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library.It is described pre- in the present invention The audio-frequency fingerprint library of foundation is carried out the extraction of audio-frequency fingerprint, also, described the by the second audio-frequency fingerprint extraction density when establishing One audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density.Using the fingerprint extraction mode of differentiation, in sound When frequency fingerprint is compared, you can to keep corresponding recognition rate, it is also ensured that the accuracy of identification of audio-frequency fingerprint realizes To the balance of recognition rate and precision.
In the present invention, a section audio can be randomly selected as target audio sample, the process of selection can be setting choosing Duration is taken, which can be first time length, and in audio to be identified, it is described first to randomly select a period of time length As target audio sample, fingerprint extraction is carried out to the target audio sample for the audio of time span.
In the present invention, a section audio can be randomly selected as target audio sample, method that can also be as shown in Figure 2 is selected Target audio sample is taken, including step:
S201:Audio power detection is carried out to each audio fragment in the audio to be identified;
S202:In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen As target audio sample.
In the present invention, when choosing target audio sample, the audio power of each audio fragment can be examined first It surveys, can be using higher segment as target audio sample by audio, the audio sample feature chosen in this way is apparent, and then is promoted and known Other efficiency.
With reference to figure 3, multiple audio-frequency fingerprints by extraction are matched with the fingerprint in pre-established audio-frequency fingerprint library Process include step:
S301:Down-sampled processing is carried out to multiple audio-frequency fingerprints of the extraction;
S302:By the fingerprint in the multiple audio-frequency fingerprint Jing Guo down-sampled processing, with pre-established audio-frequency fingerprint library It is matched.
In the present invention, for multiple audio-frequency fingerprints of extraction, down-sampled processing can also be further carried out, from the more of extraction The apparent audio-frequency fingerprint of a further selected characteristic of audio-frequency fingerprint, is matched with the fingerprint in pre-established audio-frequency fingerprint library.
It, can be according to preset second selection rule, in the audio to be identified in recognition methods provided by the invention Again new target audio sample is chosen, second selection rule is different from first selection rule.Phase can also be used Same selection rule is chosen.
On the basis of Fig. 1, as shown in figure 4, the audio identification methods further include:S105:When successful match, obtain The corresponding audio-frequency information of the audio to be identified, and the audio-frequency information is fed back into client.
Fig. 5 shows the schematic diagram of sound intermediate frequency recognition methods of the present invention, in Figure 5, refers in offline establish in offline structure The line library stage establishes fingerprint base, and store singer's version album number of song for Qu Ku according to certain fingerprint extraction density A Etc. information;Wherein fingerprint corresponds to certain of a kind of audio frequency characteristics (such as spectrum information or sub-belt energy information) or audio frequency characteristics Kind transformation (such as Hash transformation).
The online recognition stage, for music segment L1 (such as preceding 30s) according to fingerprint extraction density B (B<A), sound is extracted Frequency fingerprint;Wherein fingerprint extraction density, it is possible to understand that the window into audio feature extraction phases moves size;It is small close that big window moves correspondence Degree, small window move corresponding big density;
Centainly down-sampled can also be carried out to fingerprint in fingerprint generation phase, only partial fingerprints are used for being retrieved; Player sends fingerprint segment to identification service interface, obtains recognition result.
If display song recognition mistake or recognition failures, by player to the segment L2 of music (L1 is different from L2, Such as the 30s since the 2nd minute) taking the fingerprint again is identified.
The situation of specific identification by stages, may be used random sampling, can also use certain critical segment selection strategy, Such as choose the higher segment of audio power or than more continuous segment.
It is corresponding with the audio identification methods described in Fig. 1, in the embodiment of the present invention, additionally provide a kind of audio identification dress Put, structure diagram as shown in fig. 6, including:
First chooses single 401, for according to preset first selection rule, target audio sample to be chosen in audio to be identified This;
For pressing the first audio-frequency fingerprint extraction density, multiple sounds are extracted in the target audio sample for extraction unit 402 Frequency fingerprint;
Matching unit 403, for by the fingerprint in the multiple audio-frequency fingerprint extracted and pre-established audio-frequency fingerprint library It is matched;The pre-established audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, described First audio-frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density;
Second selection unit 404, for when non-successful match, in the audio to be identified, choosing new mesh again Audio sample is marked, until successful match, to realize the identification to audio to be identified.
In speech recognizing device provided by the invention, using the method for stage extraction audio sample, in previous section audio sample Originally it fails in the case of identification, extracts new audio sample again and continue to identify, improve the knowledge of audio to be identified Not rate.
On the basis of Fig. 6, as shown in fig. 7, first selection single 401 includes:
Detection sub-unit 405, for carrying out audio power detection to each audio fragment in the audio to be identified;
Subelement 406 is chosen, in each audio fragment of the audio power more than preset audio energy threshold, choosing One audio fragment is as target audio sample.
The matching unit 403 includes:
Subelement 407 is handled, for carrying out down-sampled processing to multiple audio-frequency fingerprints of the extraction;
Coupling subelement 408 for that will pass through the multiple audio-frequency fingerprint of down-sampled processing, refers to pre-established audio Fingerprint in line library is matched.
In the identification device, further include:
Feedback unit 409, for when successful match, obtaining the corresponding audio-frequency information of the audio to be identified, and by institute It states audio-frequency information and feeds back to client.
The embodiment of the present invention additionally provides a kind of storage medium, and the storage medium includes the program of storage, wherein, in institute It states the equipment where the storage medium is controlled during program operation and performs above-mentioned audio identification methods, the method specifically includes:
According to preset first selection rule, target audio sample is chosen in audio to be identified;
Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;
The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;The built in advance Vertical audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, and the first audio-frequency fingerprint extraction is close Degree is less than second audio-frequency fingerprint and extracts density;
When non-successful match, in the audio to be identified, choose new target audio sample again, until matching into Work(, to realize the identification to audio to be identified.
Above-mentioned method, it is optionally, described according to preset selection rule, target audio sample is chosen in audio to be identified This, including:
Determine first time length;
The audio of a length of first time length is as target audio sample when being randomly selected in the audio to be identified This.
Above-mentioned method, it is optionally, described according to preset selection rule, target audio sample is chosen in audio to be identified Originally include:
Audio power detection is carried out to each audio fragment in the audio to be identified;
In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen as mesh Mark audio sample.
Above-mentioned method, optionally, multiple audio-frequency fingerprints by extraction and the finger in pre-established audio-frequency fingerprint library Line carries out matching and includes:
Down-sampled processing is carried out to multiple audio-frequency fingerprints of the extraction;
By the fingerprint progress in the multiple audio-frequency fingerprint Jing Guo down-sampled processing, with pre-established audio-frequency fingerprint library Match.
Above-mentioned method, it is optionally, described in audio to be identified, new target audio sample is chosen again to be included:
According to preset second selection rule, new target audio sample, institute are chosen again in the audio to be identified The second selection rule is stated different from first selection rule.
Above-mentioned method optionally, further includes:
When successful match, the corresponding audio-frequency information of the audio to be identified is obtained, and the audio-frequency information is fed back to Client.
The embodiment of the present invention additionally provides a kind of electronic equipment, and structure diagram is as shown in figure 8, specifically include memory 501 and one either more than one program 502 one of them or more than one program 502 be stored in memory 501 In, and be configured to by one or more than one processor 503 perform the one or more programs 502 include use In the instruction for carrying out following operation:
According to preset first selection rule, target audio sample is chosen in audio to be identified;
Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;
The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;The built in advance Vertical audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, and the first audio-frequency fingerprint extraction is close Degree is less than second audio-frequency fingerprint and extracts density;
When non-successful match, in the audio to be identified, choose new target audio sample again, until matching into Work(, to realize the identification to audio to be identified.
It should be noted that each embodiment in this specification is described by the way of progressive, each embodiment weight Point explanation is all difference from other examples, and just to refer each other for identical similar part between each embodiment. For device class embodiment, since it is basicly similar to embodiment of the method, so description is fairly simple, related part is joined See the part explanation of embodiment of the method.
Finally, it is to be noted that, herein, relational terms such as first and second and the like be used merely to by One entity or operation are distinguished with another entity or operation, without necessarily requiring or implying these entities or operation Between there are any actual relationship or orders.Moreover, term " comprising ", "comprising" or its any other variant meaning Covering non-exclusive inclusion, so that process, method, article or equipment including a series of elements not only include that A little elements, but also including other elements that are not explicitly listed or further include for this process, method, article or The intrinsic element of equipment.In the absence of more restrictions, the element limited by sentence "including a ...", is not arranged Except also there are other identical elements in the process, method, article or apparatus that includes the element.
For convenience of description, it is divided into various units during description apparatus above with function to describe respectively.Certainly, implementing this The function of each unit is realized can in the same or multiple software and or hardware during invention.
As seen through the above description of the embodiments, those skilled in the art can be understood that the present invention can It is realized by the mode of software plus required general hardware platform.Based on such understanding, technical scheme of the present invention essence On the part that the prior art contributes can be embodied in the form of software product in other words, the computer software product It can be stored in storage medium, such as ROM/RAM, magnetic disc, CD, be used including some instructions so that a computer equipment (can be personal computer, server either network equipment etc.) performs the certain of each embodiment of the present invention or embodiment Method described in part.
A kind of audio identification methods provided by the present invention and device are described in detail above, it is used herein Specific case is expounded the principle of the present invention and embodiment, and the explanation of above example is only intended to help to understand this The method and its core concept of invention;Meanwhile for those of ordinary skill in the art, thought according to the present invention, specific There will be changes in embodiment and application range, in conclusion the content of the present specification should not be construed as to the present invention's Limitation.

Claims (10)

1. a kind of audio identification methods, which is characterized in that including:
According to preset first selection rule, target audio sample is chosen in audio to be identified;
Density is extracted by the first audio-frequency fingerprint, multiple audio-frequency fingerprints are extracted in the target audio sample;
The multiple audio-frequency fingerprint of extraction is matched with the fingerprint in pre-established audio-frequency fingerprint library;It is described pre-established Audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint by the second audio-frequency fingerprint extraction density, and the first audio-frequency fingerprint extraction density is small Density is extracted in second audio-frequency fingerprint;
When non-successful match, in the audio to be identified, new target audio sample is chosen again, until successful match, To realize the identification to audio to be identified.
2. according to the method described in claim 1, it is characterized in that, described according to preset selection rule, in audio to be identified Middle selection target audio sample, including:
Determine first time length;
The audio of a length of first time length is as target audio sample when being randomly selected in the audio to be identified.
3. according to the method described in claim 1, it is characterized in that, described according to preset selection rule, in audio to be identified Middle selection target audio sample includes:
Audio power detection is carried out to each audio fragment in the audio to be identified;
In each audio fragment for being more than preset audio energy threshold in audio power, an audio fragment is chosen as target sound Frequency sample.
4. according to the method described in claim 1, it is characterized in that, multiple audio-frequency fingerprints by extraction and pre-established sound Fingerprint in frequency fingerprint base carries out matching and includes:
Down-sampled processing is carried out to multiple audio-frequency fingerprints of the extraction;
By the multiple audio-frequency fingerprint Jing Guo down-sampled processing, matched with the fingerprint in pre-established audio-frequency fingerprint library.
5. according to the method described in claim 1, it is characterized in that, described in audio to be identified, new target is chosen again Audio sample includes:
According to preset second selection rule, new target audio sample is chosen again in the audio to be identified, described Two selection rules are different from first selection rule.
6. it according to the method described in claim 1, it is characterized in that, further includes:
When successful match, the corresponding audio-frequency information of the audio to be identified is obtained, and the audio-frequency information is fed back into client End.
7. a kind of speech recognizing device, which is characterized in that including:
First selection unit, for according to preset first selection rule, target audio sample to be chosen in audio to be identified;
For pressing the first audio-frequency fingerprint extraction density, multiple audio-frequency fingerprints are extracted in the target audio sample for extraction unit;
A matching unit, for the fingerprint in the multiple audio-frequency fingerprint extracted and pre-established audio-frequency fingerprint library to be carried out Match;The pre-established audio-frequency fingerprint library is carried out the extraction of audio-frequency fingerprint, first sound by the second audio-frequency fingerprint extraction density Frequency fingerprint extraction density is less than second audio-frequency fingerprint and extracts density;
Second selection unit, for when non-successful match, in the audio to be identified, choosing new target audio sample again This, until successful match, to realize the identification to audio to be identified.
8. device according to claim 7, which is characterized in that further include:
Feedback unit for when successful match, obtaining the corresponding audio-frequency information of the audio to be identified, and the audio is believed Breath feeds back to client.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein, it is run in described program When the equipment where the storage medium is controlled to perform the audio identification methods as described in claim 1~6 any one.
10. a kind of electronic equipment, which is characterized in that including memory and one or more than one program, one of them Either more than one program is stored in memory and is configured to be wanted by one or more than one processor execution such as right Seek the video frequency identifying method described in 1~6 any one.
CN201711486757.6A 2017-12-29 2017-12-29 Audio recognition method and device, storage medium and electronic equipment Active CN108198573B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711486757.6A CN108198573B (en) 2017-12-29 2017-12-29 Audio recognition method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711486757.6A CN108198573B (en) 2017-12-29 2017-12-29 Audio recognition method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN108198573A true CN108198573A (en) 2018-06-22
CN108198573B CN108198573B (en) 2021-04-30

Family

ID=62587013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711486757.6A Active CN108198573B (en) 2017-12-29 2017-12-29 Audio recognition method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN108198573B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723235A (en) * 2019-03-19 2020-09-29 百度在线网络技术(北京)有限公司 Music content identification method, device and equipment
CN112380382A (en) * 2020-11-23 2021-02-19 北京达佳互联信息技术有限公司 Audio classification method and device and storage medium

Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1403783A2 (en) * 2002-09-24 2004-03-31 Matsushita Electric Industrial Co., Ltd. Audio signal feature extraction
US20090307273A1 (en) * 2008-06-06 2009-12-10 Tecsys Development, Inc. Using Metadata Analysis for Monitoring, Alerting, and Remediation
CN102308295A (en) * 2009-02-10 2012-01-04 索尼爱立信移动通讯有限公司 Music profiling
CN102314875A (en) * 2011-08-01 2012-01-11 北京百度网讯科技有限公司 Audio file identification method and device
CN103021440A (en) * 2012-11-22 2013-04-03 腾讯科技(深圳)有限公司 Method and system for tracking audio streaming media
CN103403710A (en) * 2011-02-10 2013-11-20 雅虎公司 Extraction and matching of characteristic fingerprints from audio signals
CN103455514A (en) * 2012-06-01 2013-12-18 腾讯科技(深圳)有限公司 Updating method and updating device for audio file
CN103778174A (en) * 2012-10-19 2014-05-07 索尼公司 Apparatus and method for scene change detection-based trigger for audio fingerprinting analysis
US8805560B1 (en) * 2011-10-18 2014-08-12 Google Inc. Noise based interest point density pruning
CN104077336A (en) * 2013-05-09 2014-10-01 腾讯科技(深圳)有限公司 Method and device for dragging audio file to retrieve audio file information
CN104184697A (en) * 2013-05-20 2014-12-03 百度在线网络技术(北京)有限公司 Audio fingerprint extraction method and system thereof
CN104317967A (en) * 2014-11-17 2015-01-28 北京航空航天大学 Two-layer advertisement audio retrieval method based on audio fingerprints
US20150254342A1 (en) * 2011-05-30 2015-09-10 Lei Yu Video dna (vdna) method and system for multi-dimensional content matching
CN104915403A (en) * 2015-06-01 2015-09-16 腾讯科技(北京)有限公司 Information processing method and server
CN105138541A (en) * 2015-07-08 2015-12-09 腾讯科技(深圳)有限公司 Audio fingerprint matching query method and device
CN105874732A (en) * 2014-01-07 2016-08-17 高通股份有限公司 Method and device for identifying a piece of music in audio stream
CN105975568A (en) * 2016-04-29 2016-09-28 腾讯科技(深圳)有限公司 Audio processing method and apparatus
CN106055615A (en) * 2016-05-25 2016-10-26 广州酷狗计算机科技有限公司 Method, device and system for obtaining music information
US20170024441A1 (en) * 2015-03-10 2017-01-26 Compact Disc Express, Inc. Systems and methods for continuously detecting and identifying songs in a continuous audio stream
KR20170027649A (en) * 2015-09-02 2017-03-10 레이 왕 Method and apparatus for synchronous putting of real-time mobile advertisement based on audio fingerprint
CN106802960A (en) * 2017-01-19 2017-06-06 湖南大学 A kind of burst audio search method based on audio-frequency fingerprint

Patent Citations (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1403783A2 (en) * 2002-09-24 2004-03-31 Matsushita Electric Industrial Co., Ltd. Audio signal feature extraction
US20090307273A1 (en) * 2008-06-06 2009-12-10 Tecsys Development, Inc. Using Metadata Analysis for Monitoring, Alerting, and Remediation
CN102308295A (en) * 2009-02-10 2012-01-04 索尼爱立信移动通讯有限公司 Music profiling
CN103403710A (en) * 2011-02-10 2013-11-20 雅虎公司 Extraction and matching of characteristic fingerprints from audio signals
US20150254342A1 (en) * 2011-05-30 2015-09-10 Lei Yu Video dna (vdna) method and system for multi-dimensional content matching
CN102314875A (en) * 2011-08-01 2012-01-11 北京百度网讯科技有限公司 Audio file identification method and device
US8805560B1 (en) * 2011-10-18 2014-08-12 Google Inc. Noise based interest point density pruning
CN103455514A (en) * 2012-06-01 2013-12-18 腾讯科技(深圳)有限公司 Updating method and updating device for audio file
CN103778174A (en) * 2012-10-19 2014-05-07 索尼公司 Apparatus and method for scene change detection-based trigger for audio fingerprinting analysis
CN103021440A (en) * 2012-11-22 2013-04-03 腾讯科技(深圳)有限公司 Method and system for tracking audio streaming media
CN104077336A (en) * 2013-05-09 2014-10-01 腾讯科技(深圳)有限公司 Method and device for dragging audio file to retrieve audio file information
CN104184697A (en) * 2013-05-20 2014-12-03 百度在线网络技术(北京)有限公司 Audio fingerprint extraction method and system thereof
CN105874732A (en) * 2014-01-07 2016-08-17 高通股份有限公司 Method and device for identifying a piece of music in audio stream
CN104317967A (en) * 2014-11-17 2015-01-28 北京航空航天大学 Two-layer advertisement audio retrieval method based on audio fingerprints
US20170024441A1 (en) * 2015-03-10 2017-01-26 Compact Disc Express, Inc. Systems and methods for continuously detecting and identifying songs in a continuous audio stream
CN104915403A (en) * 2015-06-01 2015-09-16 腾讯科技(北京)有限公司 Information processing method and server
CN105138541A (en) * 2015-07-08 2015-12-09 腾讯科技(深圳)有限公司 Audio fingerprint matching query method and device
KR20170027649A (en) * 2015-09-02 2017-03-10 레이 왕 Method and apparatus for synchronous putting of real-time mobile advertisement based on audio fingerprint
CN105975568A (en) * 2016-04-29 2016-09-28 腾讯科技(深圳)有限公司 Audio processing method and apparatus
CN106055615A (en) * 2016-05-25 2016-10-26 广州酷狗计算机科技有限公司 Method, device and system for obtaining music information
CN106802960A (en) * 2017-01-19 2017-06-06 湖南大学 A kind of burst audio search method based on audio-frequency fingerprint

Non-Patent Citations (5)

* Cited by examiner, † Cited by third party
Title
HAITSMA J, KALKER T.2: "A highly robust audio fingerprinting system", 《ISMIR》 *
KIM S, UNAL E, NARAYANAN S.: "Music fingerprint extraction for classical music cover song identification", 《2008 IEEE INTERNATIONAL CONFERENCE ON MULTIMEDIA AND EXPO》 *
聂华: "基于音频指纹的广告检测技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
郭永帅: "基于音频指纹和版本识别的音乐检索技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
韩威: "压缩域音频指纹及其鲁棒性研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111723235A (en) * 2019-03-19 2020-09-29 百度在线网络技术(北京)有限公司 Music content identification method, device and equipment
CN111723235B (en) * 2019-03-19 2023-09-26 百度在线网络技术(北京)有限公司 Music content identification method, device and equipment
CN112380382A (en) * 2020-11-23 2021-02-19 北京达佳互联信息技术有限公司 Audio classification method and device and storage medium
CN112380382B (en) * 2020-11-23 2024-03-12 北京达佳互联信息技术有限公司 Audio classification method, device and storage medium

Also Published As

Publication number Publication date
CN108198573B (en) 2021-04-30

Similar Documents

Publication Publication Date Title
Lee et al. Music similarity-based approach to generating dance motion sequence
Yang et al. Toward multi-modal music emotion classification
US20120054238A1 (en) Music search apparatus and method using emotion model
CN105956053B (en) A kind of searching method and device based on the network information
JP2009123124A (en) Musical composition search system, method and program
CN104598502A (en) Method, device and system for obtaining background music information in played video
CN109493879B (en) Music rhythm analysis and extraction method and device
CN105335414A (en) Music recommendation method, device and terminal
CN109558500A (en) Multimedia sequence generation method, medium, device and calculating equipment
Wang et al. Towards time-varying music auto-tagging based on cal500 expansion
CN110047515A (en) A kind of audio identification methods, device, equipment and storage medium
CN109117622B (en) Identity authentication method based on audio fingerprints
CN105679324A (en) Voiceprint identification similarity scoring method and apparatus
CN107679196A (en) A kind of multimedia recognition methods, electronic equipment and storage medium
Rocha et al. Segmentation and timbre-and rhythm-similarity in Electronic Dance Music
CN108198573A (en) Audio identification methods and device, storage medium and electronic equipment
CN109756628A (en) Method and device for playing function key sound effect and electronic equipment
Grekow Audio features dedicated to the detection of four basic emotions
CN104978380B (en) A kind of audio-frequency processing method and device
CN109802987B (en) Content push method for display device, push device and display equipment
CN108777804B (en) Media playing method and device
CN103870589B (en) A kind of voice data switching method and electronic equipment
CN105225664A (en) The generation method and apparatus of Information Authentication method and apparatus and sample sound
CN112270929B (en) Song identification method and device
Bargaje Emotion recognition and emotion based classification of audio using genetic algorithm-an optimized approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant